Update History

v22.9.1

Added a compiler pass to flatten ≥6D input/output tensors into simpler ones and to avoid unsupported dimension errors in moDNN
moreh-smi --reset now works correctly when the worker process is already terminated but the GPU resources are not released
Enabled torch.nn.BCEWithLogitsLoss to accept pos_weight of a different type than input
Resolved a potential performance issue of Softmax

v22.9.0

Supported 6- and 7-dimensional input/output tensors in elemwise operations
Supported PyTorch tensor resizing
Added the algorithm selection rule for grouped 3D convolutions
Improved the behavior of the moreh-smi --reset command to allow users to recover database errors
Correctly closed pipe file descriptors in WorkerAgent
Fixed some errors

v22.8.3

Hotfix for heartbeat thread issues

v22.8.2

Corrected the behavior of pytorch_sample.py bundled in the HAC VM image
Supported software update on VMs not containing the moreh-switch-model command

v22.8.1

Shorten the communication latency between an application process and a worker process.
Supported PyTorch DP/DDP functions.
Improved floating-point arithmetic accuracy for fp16 matrix multiplications.

v22.8.0

Supported the relaxed fp32 mode that performs fp32 matrix multiplications in bfloat16 (torch.moreh.options.allow_relaxed_fp32)
The DataParallel compiler pass will be safely bypassed if it fails to parallelize the source graph, instead of raising an exception.

v22.7.2

Supported fallback to an NVIDIA GPU for unsupported operations
Ensured Tensile GEMM kernels are not crashed for narrow-shaped tensors

v22.7.1

Fixed a precision issue in the SELU activation function
Removed an unnecessary error message

v22.7.0

Bug fixes for KT HAC reference models

v22.6.1

Improved PyTorch portability

v0.10.1

The DeviceUsage API returns min/max/average percentages

v0.10.0

Introduced the graph executor running on GPU nodes to reduce inter-node packets
- A user process can offload an entire computational graph instead of individual operations
Improved PyTorch API portability and performance
Supported AMD gfx908/gfx90a architectures (incl. MI100 and MI250 GPUs) and utilized their matrix core instructions

v0.9.10

Fixed torch.jit.trace to work
DeviceUsageInfo API support that does not specify a token
Corrected inplaceness check in the IR constructor

v0.9.9

Fixed the parallelization scheme of unique()
Correctly handled variable-length operations with outermost size smaller than # of GPUs
Resolved a potential GPU memory object leak in the storage allocator

v0.9.8

Supported show usage command in moreh_smclient

v0.9.7

Fixed a bug in torch.nn.functional.binary_cross_entropy_with_logits
Fixed a message parsing error between frontend and worker

v0.9.6

Fixed a bug in torch.meshgrid

v0.9.5

Fixed some bugs in the PyTorch driver

v0.9.4

Fixed a bug of Tensor.__getitem__()

v0.9.3

Resolved a potential memory access fault in Convolution3d
Fixed a bug in the memory allocator

v0.9.2

Fixed a GPU memory allocation issue

v0.9.1

Improved performance of grouped convolutions
Fixed to connect to multiple moreh_workers at the correct timing
Improved PyTorch portability

v0.9.0

Improved performance of some frequently used operations
Improved PyTorch portability

v0.8.3

Supported backward computation of evaluation-mode batchnorm and dropout

v0.8.2

Supported boolean arithmetic operations
Supported normal_, pairwise_distance, and triplet_margin_with_distance_loss

v0.8.1

Fixed a bug in the BatchNorm layer
Fixed torch.Tensor.type_as() to correctly move data between devices
Other bug fixes in SDAManager