Glossary term
Glossary term
Infrastructure and Serving
Group of GPU-enabled servers used for training or inference.
xAI's Colossus training cluster - 100,000 NVIDIA H100 GPUs connected by InfiniBand - was built in 122 days in Memphis and used to train Grok 2 and Grok 3.
Microsoft's Azure OpenAI service runs on a purpose-built supercomputer with 285,000 AMD MI300X and NVIDIA H100 GPUs, serving inference for GPT-4o, DALL-E 3, and Whisper at global scale.
A pharmaceutical company runs a 256-GPU H100 private cluster on AWS UltraClusters, training drug-discovery models on genomics data that cannot leave the organisation's AWS VPC for compliance reasons.