WigglyHypersurface t1_j4gzjvd wrote
Reply to comment by GasZealousideal8691 in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
If you're messing with the weights that deeply and directly I'm not sure. But it smells like a bug to me.
GasZealousideal8691 OP t1_j4hk6kz wrote
Im fairly certain it’s something with the model. Like even fine tuning is giving these weird errors, when it had no problems for GPT-Neo.
We also ran this stuff on T5, obviously had to configure the rest of the code differently but it was doing fine for that as well.
Viewing a single comment thread. View all comments