ClusterMax® SuperG | NVIDIA® A100 GPU Cluster
The fastest, most efficient application performance: scalable compute, networking, storage, power, and cooling with compute clusters powered by NVIDIA® A100
Specifications:
- Incorporate the latest 3rd Generation Intel® Xeon® Scalable Processors Family or AMD EPYC™ 7003 Series Processors
- Delivers up to 72 NVIDIA A100 SXM4 40GB/80GB GPUs, 497,664 FP32 CUDA Cores / INT32 Cores, 248,832 FP64 Cores, 31,104 Tensor Cores, 698 Teraflops of peak FP64 performance, 1,404 Teraflops of peak FP64 Tensor Core performance, 1,404 Teraflops of peak FP32 Performance per 42U cluster
- Up to 1,280GB GPU memory
- Supports HDR InfiniBand fabric & real time InfiniBand diagnostics
- Cluster management and GPU monitoring software, including GPU temperature monitoring, fan speed, and power, providing exclusive access to GPUs in a cluster
The ClusterMax® SuperG GPU computing clusters are powered by the NVIDIA® GPU computing platforms, based on NVIDIA® A100 Tensor Core, delivers unprecedented acceleration at every scale, and powering the world’s highest-performing elastic data centers for AI, data analytics, and HPC. These cluster solutions provides up to 20X higher performance over the prior NVIDIA Volta™ generation, allowing researchers to deliver real-world results and deploy solutions into production at scale.
Complete Cluster Assembly and Setup Services:
- Fully integrated and pre-packaged turnkey HPC solution, including HPC professional services and support, expert installation and setup of rack-optimized cluster nodes, cabling, rails, and other peripherals
- Configuration of cluster nodes and the network
- Installation of applications and client computers to offer a comprehensive solution for your IT needs
- Rapid deployment
- Server management options include Standards-based IPMI or AMAX remote server management
- Seamless standard and custom application integration and cluster installation
- Cluster management options include a choice of commercial and open source software solutions
- Supports a variety of UPS and PDU configuration and interconnect options, including Infiniband (EDR/HDR), Fibre channel, and Ethernet (Gigabit, 10GbE, 40GbE, 25GbE, 100GbE, 200GbE)
- Energy efficient cluster cabinets, high performance UPS and power distribution units for expert installation and setup of rack-optimized nodes, cabling, rails, and other peripherals
Rack Level Verification
- Performance and Benchmark Testing (HPL)
- ATA rack level stress test
- Rack Level Serviceability
- Ease of Deployment Review
- MPI jobs over IB for HPC
- GPU stress test using CUDA
- Cluster management
Large Scale Rack Deployment Review
- Scalability Process
- Rack to Rack Connectivity
- Multi-Cluster Testing
- Software/Application Load
Optional Cluster System Software Installed:
- Microsoft Windows Server 2019
- Bright Computing Cluster Manager
- SuSE / Red Hat Enterprise Linux,
- C-based software development tools, CUDA Toolkit and SDK, and various libraries for CPU GPU clusters
- Deep learning software
ClusterMax® SuperG NVIDIA A100 GPU Computing Cluster Specifications, with 3rd Generation Intel® Xeon® Scalable Processors:
Model # | ClusterMax® SuperG-142.X100S |
ClusterMax® SuperG-244.X100S |
ClusterMax® SuperG-426.X100S |
ClusterMax® SuperG-42U9.X100S |
Rack Height | 14U | 24U | 42U | 42U |
# of 4U 8x A100 SXM4 GPU Nodes per rack | 2 | 4 | 6 | 9 |
# of A100 SXM4 GPUs per Rack (8x GPU per Node) | 16 | 32 | 48 | 72 |
GPU Memory Capacity per Rack (40GB per GPU) | 640GB | 1,280GB | 1,920GB | 2,880GB |
GPU Memory Capacity per Rack (80GB per GPU) | 1,280GB | 2,560GB | 3,840GB | 5,760GB |
GPU Node Processor Support | 2x 3rd Generation Intel® Xeon® Scalable Processors per node | 2x 3rd Generation Intel® Xeon® Scalable Processors per node | 2x 3rd Generation Intel® Xeon® Scalable Processors per node | 2x 3rd Generation Intel® Xeon® Scalable Processors per node |
# of Processors per Rack ( 2 Processors per node) | 4 | 8 | 12 | 18 |
Maximum # of CPU Cores per Rack (40 cores per Processor) | 160 Cores | 320 Cores | 480 Cores | 720 Cores |
Maximum Compute Node Memory Capacity per Rack (8TB per system) | 16TB | 32TB | 48TB | 72TB |
# of FP32 CUDA Cores per Rack (6,912 cores per GPU) | 110,592 Cores | 221,184 Cores | 331,776 Cores | 497,664 Cores |
# of FP64 Cores per Rack (3,456 cores per GPU) | 55,296 Cores | 110,592 Cores | 165,888 Cores | 248,832 Cores |
# of INT32 Cores per Rack (6,912 cores per GPU) | 110,592 Cores | 221,184 Cores | 331,776 Cores | 497,664 Cores |
# of Tensor Cores per Rack (432 cores per GPU) | 6,912 Cores | 13,824 Cores | 30,720 Cores | 31,104 Cores |
Peak FP64 Performance per Rack (9.7 TF per GPU) | 155 TFLOPS | 310 TFLOPS | 466 TFLOPS | 698 TFLOPS |
Peak FP64 Tensor Core Performance per Rack (19.5 TF per GPU) | 312 TFLOPS | 624 TFLOPS | 936 TFLOPS | 1,404 TFLOPS |
Peak FP32 Performance per Rack (19.5 TF per GPU) | 312 TFLOPS | 624 TFLOPS | 936 TFLOPS | 1,404 TFLOPS |
Tensor Float 32 (TF32) Performance per Rack (156 TF per GPU) | 2,496 TFLOPS | 4,992 TFLOPS | 7,488 TFLOPS | 11,232 TFLOPS |
Tensor Float 32 (TF32) Performance per Rack, with Sparsity (312 TF per GPU) | 4,992 TFLOPS | 9,984 TFLOPS | 14,976 TFLOPS | 22,464 TFLOPS |
Peak BFLOAT16 / FP16 tensor Core Performance per Rack (312 TF per GPU) | 4,992 TFLOPS | 9,984 TFLOPS | 14,976 TFLOPS | 22,464 TFLOPS |
Peak BFLOAT16 / FP16 tensor Core Performance per Rack, with Sparsity (624 TF per GPU) | 9,984 TFLOPS | 19,968 TFLOPS | 29,952 TFLOPS | 44,928 TFLOPS |
Peak INT8 tensor Core Performance per Rack (624 TOPs per GPU) | 9,984 TOPs | 19,968 TOPs | 29,952 TOPs | 44,928 TOPs |
Peak INT8 tensor Core Performance per Rack, with Sparsity (1,248 TOPs per GPU) | 19,968 TOPs | 39,936 TOPs | 59,904 TOPs | 89,856 TOPs |
Peak INT4 tensor Core Performance per Rack (1,248 TOPs per GPU) | 19,968 TOPs | 39,936 TOPs | 59,904 TOPs | 89,856 TOPs |
Peak INT4 tensor Core Performance per Rack, with Sparsity (2,496 TOPs per GPU) | 39,936 TOPs | 79,872 TOPs | 119,808 TOPs | 179,712 TOPs |
GPU Nodes Interconnectivity | 10GbE | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand |
GPU Node Storage | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays |
Storage Node | None | 1x 1U Storage Node | 1x 1U Storage Node | 1x 1U Storage Node |
Storage Node Processor Support | – | 2x 3rd Generation Intel® Xeon® Scalable Processors | 2x 3rd Generation Intel® Xeon® Scalable Processors | 2x 3rd Generation Intel® Xeon® Scalable Processors |
Storage Node Memory Support | – | 8TB Registered ECC DDR4 3200MHz memory | 8TB Registered ECC DDR4 3200MHz memory | 8TB Registered ECC DDR4 3200MHz memory |
Storage Node Drive Bays | – | 12x hot-swap 2.5″ U.2 NVMe drive bays | 12x hot-swap 2.5″ U.2 NVMe drive bays | 12x hot-swap 2.5″ U.2 NVMe drive bays |
Storage Node Interconnectivity | – | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand |
Network Switch | 1x 24-port 10GbE Gigabit Ethernet | 1x 24-port 10GbE Ethernet 1x HDR InfiniBand |
1x 52-port 10GbE Ethernet 1x 40-Port EDR/HDR InfiniBand |
1x 52-port 10GbE Ethernet 1x 40-port EDR/HDR InfiniBand |
Cluster Management Software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software |
Software Options
Bright Cluster Manager software automates the process of building and managing modern high-performance Linux clusters, eliminating complexity and enabling flexibility.
NVMesh enables shared NVMe across any network and supports any local or distributed file system. The solution features an intelligent management layer that abstracts underlying hardware with CPU offload, creates logical volumes with redundancy, and provides centralized, intelligent management and monitoring.
QuantaStor’s unique Storage Grid architecture organizations are able to manage multiple clusters across sites as a unified storage platform that’s easily configured and maintained through the web user interface and automated via advanced CLI and REST APIs
Enabling data centers to easily transform themselves into a flexible cloud infrastructure with the performance and reliability needed to run enterprise applications.
ClusterMax® SuperG NVIDIA A100 GPU Computing Cluster Specifications, with AMD EPYC™ 7003 Series Processors:
Model # | ClusterMax® SuperG-142.A100S |
ClusterMax® SuperG-244.A100S |
ClusterMax® SuperG-426.A100S |
ClusterMax® SuperG-42U9.A100S |
Rack Height | 14U | 24U | 42U | 42U |
# of 4U 8x A100 SXM4 GPU Nodes per rack | 2 | 4 | 6 | 9 |
# of A100 SXM4 GPUs per Rack (8x GPU per Node) | 16 | 32 | 48 | 72 |
GPU Memory Capacity per Rack (40GB per GPU) | 640GB | 1,280GB | 1,920GB | 2,880GB |
GPU Memory Capacity per Rack (80GB per GPU) | 1,280GB | 2,560GB | 3,840GB | 5,760GB |
GPU Node Processor Support | 2x AMD EPYC™ 7003 Processors per node | 2x AMD EPYC™ 7003 Processors per node | 2x AMD EPYC™ 7003 Processors per node | 2x AMD EPYC™ 7003 Processors per node |
# of Processors per Rack ( 2 Processors per node) | 4 | 8 | 12 | 18 |
Maximum # of CPU Cores per Rack (64 cores per Processor) | 256 Cores | 512 Cores | 768 Cores | 1,152 Cores |
Maximum Compute Node Memory Capacity per Rack (8TB per system) | 16TB | 32TB | 48TB | 72TB |
# of FP32 CUDA Cores per Rack (6,912 cores per GPU) | 110,592 Cores | 221,184 Cores | 331,776 Cores | 497,664 Cores |
# of FP64 Cores per Rack (3,456 cores per GPU) | 55,296 Cores | 110,592 Cores | 165,888 Cores | 248,832 Cores |
# of INT32 Cores per Rack (6,912 cores per GPU) | 110,592 Cores | 221,184 Cores | 331,776 Cores | 497,664 Cores |
# of Tensor Cores per Rack (432 cores per GPU) | 6,912 Cores | 13,824 Cores | 30,720 Cores | 31,104 Cores |
Peak FP64 Performance per Rack (9.7 TF per GPU) | 155 TFLOPS | 310 TFLOPS | 466 TFLOPS | 698 TFLOPS |
Peak FP64 Tensor Core Performance per Rack (19.5 TF per GPU) | 312 TFLOPS | 624 TFLOPS | 936 TFLOPS | 1,404 TFLOPS |
Peak FP32 Performance per Rack (19.5 TF per GPU) | 312 TFLOPS | 624 TFLOPS | 936 TFLOPS | 1,404 TFLOPS |
Tensor Float 32 (TF32) Performance per Rack (156 TF per GPU) | 2,496 TFLOPS | 4,992 TFLOPS | 7,488 TFLOPS | 11,232 TFLOPS |
Tensor Float 32 (TF32) Performance per Rack, with Sparsity (312 TF per GPU) | 4,992 TFLOPS | 9,984 TFLOPS | 14,976 TFLOPS | 22,464 TFLOPS |
Peak BFLOAT16 / FP16 tensor Core Performance per Rack (312 TF per GPU) | 4,992 TFLOPS | 9,984 TFLOPS | 14,976 TFLOPS | 22,464 TFLOPS |
Peak BFLOAT16 / FP16 tensor Core Performance per Rack, with Sparsity (624 TF per GPU) | 9,984 TFLOPS | 19,968 TFLOPS | 29,952 TFLOPS | 44,928 TFLOPS |
Peak INT8 tensor Core Performance per Rack (624 TOPs per GPU) | 9,984 TOPs | 19,968 TOPs | 29,952 TOPs | 44,928 TOPs |
Peak INT8 tensor Core Performance per Rack, with Sparsity (1,248 TOPs per GPU) | 19,968 TOPs | 39,936 TOPs | 59,904 TOPs | 89,856 TOPs |
Peak INT4 tensor Core Performance per Rack (1,248 TOPs per GPU) | 19,968 TOPs | 39,936 TOPs | 59,904 TOPs | 89,856 TOPs |
Peak INT4 tensor Core Performance per Rack, with Sparsity (2,496 TOPs per GPU) | 39,936 TOPs | 79,872 TOPs | 119,808 TOPs | 179,712 TOPs |
GPU Nodes Interconnectivity | 10GbE | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand |
GPU Node Storage | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays | 6x U.2 NVMe bays & 2 x M.2 NVMe bays |
Storage Node | None | 1x 1U Storage Node | 1x 1U Storage Node | 1x 1U Storage Node |
Storage Node Proessor Support | – | 2x AMD EPYC™ 7003 Processors | 2x AMD EPYC™ 7003 Processors | 2x AMD EPYC™ 7003 Processors |
Storage Node Memory Support | – | 8TB Registered ECC DDR4 3200MHz memory | 8TB Registered ECC DDR4 3200MHz memory | 8TB Registered ECC DDR4 3200MHz memory |
Storage Node Drive Bays | – | 12x hot-swap 2.5″ U.2 NVMe drive bays | 12x hot-swap 2.5″ U.2 NVMe drive bays | 12x hot-swap 2.5″ U.2 NVMe drive bays |
Storage Node Interconnectivity | – | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand |
Network Switch | 1x 24-port 10GbE Gigabit Ethernet | 1x 24-port 10GbE Ethernet 1x HDR InfiniBand |
1x 52-port 10GbE Ethernet 1x 40-Port EDR/HDR InfiniBand |
1x 52-port 10GbE Ethernet 1x 40-port EDR/HDR InfiniBand |
Cluster Management Software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software | Optional Bright Cluster Manager software |
Software Options
Bright Cluster Manager software automates the process of building and managing modern high-performance Linux clusters, eliminating complexity and enabling flexibility.
NVMesh enables shared NVMe across any network and supports any local or distributed file system. The solution features an intelligent management layer that abstracts underlying hardware with CPU offload, creates logical volumes with redundancy, and provides centralized, intelligent management and monitoring.
QuantaStor’s unique Storage Grid architecture organizations are able to manage multiple clusters across sites as a unified storage platform that’s easily configured and maintained through the web user interface and automated via advanced CLI and REST APIs
Enabling data centers to easily transform themselves into a flexible cloud infrastructure with the performance and reliability needed to run enterprise applications.