currentscurrents t1_jdn0opn wrote on March 25, 2023 at 4:46 PM

The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:

>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.

Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.