Submitted by mrx-ai t3_121q6nk in MachineLearning
currentscurrents t1_jdn0opn wrote
The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:
>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.
Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.
Viewing a single comment thread. View all comments