Viewing a single comment thread. View all comments

RetroPenguin_ t1_jad51qy wrote

For the >10B closed source models, I’d be really curious how many of those weights are zero with fp16 precision.

23

7734128 t1_jaemc4b wrote

Doesn't really change anything, does it? A zero still has an effect, so it has to be there, so I assume you mean that it could use less memory, right? But is that technically feasible to do in a practical manner? I can't imagine a practical way to have a tensor of split precision weights without ruinous reprocessing when trying to use the weights.

6

karius85 t1_jaeoyq7 wrote

Sparse matrices, but you would need quite a lot of zeros.

2