Megatron Memory Estimator

Note: This estimator only measures the GPU memory directly managed by PyTorch when running Megatron. It does not include extra consumption from NCCL communication buffers, kernel fusion, overlap optimizations, CUDA Graphs, etc. Please use the "Overhead per GPU" option below to account for these additional costs.

Configuration

Model Config (Editable)

History

Model Weight Optimizer (GB) Activation (GB) Total (GB/GPU) Actions