Submitted by minimaxir t3_11fbccz in MachineLearning
currentscurrents t1_jajfjr5 wrote
Reply to comment by lostmsu in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
Problem is we don't actually know how big ChatGPT is.
I strongly doubt they're running the full 175B model, you can prune/distill a lot without affecting performance.
MysteryInc152 t1_jal7d3p wrote
Distillation doesn't work for token predicting language models for some reason.
currentscurrents t1_jalajj3 wrote
DistillBERT worked though?
MysteryInc152 t1_jalau7e wrote
Sorry i meant the really large scale models. Nobody has gotten a gpt-3/chinchilla etc scale model to actually distill properly.
Viewing a single comment thread. View all comments