Submitted by matthkamis t3_126kzb6 in MachineLearning
ChuckSeven t1_jeab590 wrote
Reply to comment by ZestyData in [D] Can large language models be applied to language translation? by matthkamis
It's funny how you mention unrelated stuff, like RLHF, which has nothing to do with the point of discussion. A bit like an LLM I reckon.
See, Google translate models are (as far as publicly known) trained on a parallel corpus. This is supervised data since it provides the same text in different languages. The model is trained to model, e.g. p(y=German|x=English). There is much less supervised data available which means that the models you train will be significantly smaller. Note that translation models are usually only auto-regressive in the decoding part. The encoder part, which usually makes up about 50% of the parameters, is not auto-regressive.
LLMs tend to be >>1B parameter models trained on billions or trillions of tokens. The vast amount of data is believed to be necessary to train such large models. The models are modelling p(x) which in some cases is monolingual or virtually so. An LLM that is trained on a vast but only English corpus will not be capable of translating at all. LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.
Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).
ZestyData t1_jeagwa8 wrote
Thank you for repeating half of what I said back to me, much like ChatGPT you catch on quick to new information:
So, let's be clear here then. Contrary to your incorrect first comment; Google translate is an LLM, it is autoregressive, and it is pretrained. At least to the definition of pre-training given in the GPT paper, which was the parallel I first used in my own comment for OP who was coming into this thread with the knowledge of the latest GPT3+ and ChatGPT products.
​
>It's funny how you mention unrelated stuff, like RLHF
I did so because I had naively assumed you were also a newcomer to the field who knew nothing outside of ChatGPT, given how severely wrong your first comment was. I'll grant you that it wasn't related, except to lend an olive branch and reasonable exit-plan if that were the case for you. Alas.
​
>LLMs tend to be >>1B parameter models
Again, no. Elmo was 94 million, GPT was 120 milliom, GPT-2 was 1.5 billion. BERT has ~300 million parameters. These are all Large Language Models and have been called so for years.There is no hard definition on what constitutes "large". 2018's large is nearly today's consumer-hardware level. Google Translate (and its search) are a few of the most well-used LLMs actually out there.
Man. Why do you keep talking about things that you don't understand, even when corrected?
​
>Lastly, modelling p(y|x) is significantly easier and thus less general than modelling p(x).
Sure! It is easier! But that's not what you said. You'd initially brought up P(Y|X) as a justification that Translation isn't pre-trained. Those are two unrelated concepts. Its ultimate modelling goal is P(Y|X) but in both GPT (Generative Pre-training) and Google translate, they both pretrain their ability to predict P(X|context) in the decoder, just like any hot new LLM of today, hence my correction for you. The application towards ultimate P(Y|X) is not connected to the pretraining of their decoders.
[deleted] t1_jeduf55 wrote
[removed]
MysteryInc152 t1_jeanb01 wrote
>LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.
No lol. You would know this if you've ever actually tried to translate with GPT-4 and the like. They re far superior to current sota
https://github.com/ogkalu2/Human-parity-on-machine-translations
ChuckSeven t1_jedsgz5 wrote
I know about this post. It is interesting but the results here are far from conclusive. The BLOOM papers also did translation experiments and they say "... In the one-shot setting, BLOOM can, with the right prompt, perform competent translation, although it is behind dedicated (supervised) models such as M2M-100".
So let's maybe use some quantifiable measures instead of just looking at a few cherry-picked examples and claim otherwise?
MysteryInc152 t1_jee5zba wrote
It's not cherry picked lol.
Wild how everyone will just use that word even when they've clearly not tested the supposed model themselves. I'm just showing you what anyone who's actually used these models for translation will tell you
ChuckSeven t1_jeeae4o wrote
Look, it doesn't matter. You can't claim that LLM are better if you don't demonstrate it on an established benchmark with a large variety of translations. How should I know if those Japanese anime translations are correct? For what its worth it might be just "prettier" text but a wrong translation.
It's sad to get downvoted on this subreddit for insisting on very basic academic principles.
MysteryInc152 t1_jeecbeq wrote
I didn't downvote you but it's probably because you're being obtuse. anyway whatever. if you don't want to take evidence at plain sight then don't. the baseline human comparisons are right there. Frankly it's not my problem If you're so suspicious of results and not bilingual to test it yourself. It's not really my business if you believe me or not.
ChuckSeven t1_jeenkvs wrote
I'm happy to take evidence into account. Your results indicate that LLM can be beneficial for translation. As I said previously, it looks interesting. But you claim, and I quote: "They re far superior to current sota" solely based on your personal and human comparison. This is an over-generalisation and not scientific. Like a flat earther claiming the earth is flat because .. just look at it "evidence at plain sight".
Viewing a single comment thread. View all comments