Update History
v22.9.1
- Added a compiler pass to flatten ≥6D input/output tensors into simpler ones and to avoid unsupported dimension errors in moDNN
moreh-smi --reset now works correctly when the worker process is already terminated but the GPU resources are not released
- Enabled
torch.nn.BCEWithLogitsLoss to accept pos_weight of a different type than input
- Resolved a potential performance issue of Softmax
v22.9.0
- Supported 6- and 7-dimensional input/output tensors in elemwise operations
- Supported PyTorch tensor resizing
- Added the algorithm selection rule for grouped 3D convolutions
- Improved the behavior of the
moreh-smi --reset command to allow users to recover database errors
- Correctly closed pipe file descriptors in WorkerAgent
- Fixed some errors
v22.8.3
- Hotfix for heartbeat thread issues
v22.8.2
- Corrected the behavior of pytorch_sample.py bundled in the HAC VM image
- Supported software update on VMs not containing the moreh-switch-model command
v22.8.1
- Shorten the communication latency between an application process and a worker process.
- Supported PyTorch DP/DDP functions.
- Improved floating-point arithmetic accuracy for fp16 matrix multiplications.
v22.8.0
- Supported the relaxed fp32 mode that performs fp32 matrix multiplications in bfloat16 (torch.moreh.options.allow_relaxed_fp32)
- The DataParallel compiler pass will be safely bypassed if it fails to parallelize the source graph, instead of raising an exception.
v22.7.2
- Supported fallback to an NVIDIA GPU for unsupported operations
- Ensured Tensile GEMM kernels are not crashed for narrow-shaped tensors
v22.7.1
- Fixed a precision issue in the SELU activation function
- Removed an unnecessary error message
v22.7.0
- Bug fixes for KT HAC reference models
v22.6.1
- Improved PyTorch portability
v0.10.1
- The DeviceUsage API returns min/max/average percentages
v0.10.0
- Introduced the graph executor running on GPU nodes to reduce inter-node packets
- A user process can offload an entire computational graph instead of individual operations
- Improved PyTorch API portability and performance
- Supported AMD gfx908/gfx90a architectures (incl. MI100 and MI250 GPUs) and utilized their matrix core instructions
v0.9.10
- Fixed
torch.jit.trace to work
- DeviceUsageInfo API support that does not specify a token
- Corrected inplaceness check in the IR constructor
v0.9.9
- Fixed the parallelization scheme of
unique()
- Correctly handled variable-length operations with outermost size smaller than # of GPUs
- Resolved a potential GPU memory object leak in the storage allocator
v0.9.8
- Supported
show usage command in moreh_smclient
v0.9.7
- Fixed a bug in torch.nn.functional.binary_cross_entropy_with_logits
- Fixed a message parsing error between frontend and worker
v0.9.6
- Fixed a bug in torch.meshgrid
v0.9.5
- Fixed some bugs in the PyTorch driver
v0.9.4
- Fixed a bug of
Tensor.__getitem__()
v0.9.3
- Resolved a potential memory access fault in Convolution3d
- Fixed a bug in the memory allocator
v0.9.2
- Fixed a GPU memory allocation issue
v0.9.1
- Improved performance of grouped convolutions
- Fixed to connect to multiple moreh_workers at the correct timing
- Improved PyTorch portability
v0.9.0
- Improved performance of some frequently used operations
- Improved PyTorch portability
v0.8.3
- Supported backward computation of evaluation-mode batchnorm and dropout
v0.8.2
- Supported boolean arithmetic operations
- Supported
normal_, pairwise_distance, and triplet_margin_with_distance_loss
v0.8.1
- Fixed a bug in the BatchNorm layer
- Fixed
torch.Tensor.type_as() to correctly move data between devices
- Other bug fixes in SDAManager