Submitted by GasZealousideal8691 t3_10c9287 in MachineLearning
Hey guys. I'm running some experiments as part of a research project; it was initially implemented for GPT-Neo 1.3B, but there is one baseline we want to use that only supports GPT2-XL, so I implemented that into our code (i.e., just included a clause that was like "if model_name='gpt2', model=GPT2LMHeadModel.frompretrained('gpt2-xl')").
The issue is, GPT2 is giving absolutely absurd results that are clearly very incorrect. It is difficult to explain this without an in-depth explanation of my code, but basically I have a bunch of functions that do things like, for example, computing the probability of certain labels in a multiple-choice test.
So my question is, is there any fundamental difference in how these two models are setup in hugging face, that would result in such errors? I myself am not too familiar with hugging face models, so I'm not entirely sure. But the fact that the code runs but produces bad errors is weird; I would think that if something was wrong, there would be some sort of tensor-size doesn't match error somewhere...
CKtalon t1_j4enpew wrote
GPT2 was trained on a different dataset, with little code (other than those obtained from the CommonCrawl). GPT Neo uses The Pile which contains a lot of code.