GasZealousideal8691
GasZealousideal8691 OP t1_j4gst0f wrote
Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
Dont think there is a causal version for GPT2
GasZealousideal8691 OP t1_j4gs8gc wrote
Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
But would it affect it to this extent? To be clear, this is not just "bad performance", or "horrendous performance". Our project is loosely investigating the performance of different editing methods on LMs given some datasets we made, and none of the editing methods, from fine-tuning to gradient-methods, change the performance at all.
Furthermore, GPT2 outputs an equal accuracy and specificity values (specificity is basically the degree to which it "remembers" other unrelated facts; the goal here is to minimize catastrophic forgetting), which makes absolutely 0 sense, because they aren't even measured on the same scale. Accuracy is usually >0, <1 and specificity is usually ~26 based on our measures.
It doesn't have anything to do with the way accuracy/specificity are computed, because the code for GPT-Neo is identical minus the model= and tokenizer= statements, and it works fine for GPT-Neo. So there is something fundamentally crazy going on with GPT2...
GasZealousideal8691 OP t1_j4gpu8j wrote
Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
GPT Neo is GPTNeoForCausalLM, and GPT2 is GPT2LMHeadModel. Like I said, I am not 100% familiar with these, but the huggingface docs listed both as “GPT-neo/GPT2 with an LM head”, so I figured they were analogous.
GasZealousideal8691 OP t1_j4g8djf wrote
Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
No, both use the GPT2 tokenizer. GPT-Neo uses GPT2Tokenizer.from_pretrained(‘EleutherAI/gpt-neo-1.3B)”, and GPT2 uses GPT2Tokenizer.from_pretrained(‘gpt2-xl’).
GasZealousideal8691 OP t1_j4eo0ov wrote
Reply to comment by CKtalon in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
Oh sorry if that wasn’t clear, but the stuff I’m training on isn’t code, it’s natural language.
GasZealousideal8691 t1_ircpfwd wrote
Reply to comment by ispeakdatruf in [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue
I mean practically speaking it doesn’t seem to achieve much more than that, but I don’t think that’s the point of the paper. The point here is that it’s actually rewriting the source code itself each time, which is potentially useful because it can (theoretically) achieve something more novel than just changing hyper parameters.
It would be more interesting if they showed actually nontrivial code changes for sure, if those are even possible. But I don’t think it’s entirely useless; it’s possible, for example, that we may be able to use something similar to deprecate the transformer eventually, in the not so near future.
GasZealousideal8691 t1_ir3ut7i wrote
Reply to comment by [deleted] in [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue
The paper actually implements a self programming AI system though, even if it doesn’t really do anything incredible
GasZealousideal8691 OP t1_j4hk6kz wrote
Reply to comment by WigglyHypersurface in [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
Im fairly certain it’s something with the model. Like even fine tuning is giving these weird errors, when it had no problems for GPT-Neo.
We also ran this stuff on T5, obviously had to configure the rest of the code differently but it was doing fine for that as well.