Submitted by Buck-Nasty t3_10ozflx in singularity
FirstEbb2 t1_j6hlp4k wrote
My God, they don't even have high-quality Chinese training materials. Do they want to do AI in English?
Melancholy-Zebra t1_j6iclje wrote
You'd be surprised.
starstruckmon t1_j6izowe wrote
I would be very surprised. They have technically speaking ( as per benchmarks ), one of the best text-to-image generators right now, yet the practical output is far below what we have in quality due to the limited dataset.
It would probably be even worse for text. Wikipedia, reddit, all the code forums like stackoverflow, documentations and manuals, vast majority of scientific papers. They'd be leaving so much out.
visarga t1_j6jut51 wrote
No, AI doesn't work that way. You just put into it text in any language, all of them together, and it figures out an inter-language representation. So you can ask in Chinese what it learns in English.
But there's also plenty of Chinese text. GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English). GPT-3 was trained on 300B tokens mostly English.
starstruckmon t1_j6jw3kl wrote
It seems like you're talking about a model that has been trained in both languages. However, there are two issues with this. Firstly, the Chinese generally prefer to train models solely on Chinese data or with a limited amount of English data included. Secondly, multi-language models currently perform significantly worse compared to models that are trained on a single language.
_Just7_ t1_j6kv8fs wrote
Hate to be that guy, but source on models in single languages being better? I thought more data = better modeling. Why would it perform worse if you also include the Spanish and Chinese parts of the internet?
starstruckmon t1_j6kygds wrote
I can't really speculate on that topic. It's currently an active area of research.
To be honest, this problem is so widely known that I hadn't considered finding sources to support the claim. Here is the best authoritative source I could quickly find
https://arxiv.org/abs/2012.15613
It may seem counter-intuitive to link to a paper that supposedly fixes this issue, but this is obviously the most likely scenario in which a paper would discuss it. Also, if you read it carefully, you'll see that while the authors managed to reduce the gap, it still persists.
[deleted] t1_j6maw1g wrote
[deleted]
[deleted] t1_j6loc29 wrote
[deleted]
FirstEbb2 t1_j6iidfh wrote
It's hard for me to imagine - I can imagine Germany rising again after being bombed to rubble because fixing the buildings would help Germany's stability, but fixing the Chinese network that was made into a shithole by the fireproof Great Wall and silly apps wouldn't do the bureaucrats any good.
I don't believe in Buddhism, but I do believe that retribution has manifested itself in the Chinese government and they will shamefully lose this AI war unless they fundamentally get rid of their backward system.
Crit0r t1_j6iqgji wrote
Do you ever have visited china? The big cities are pretty advanced and the internet connection is superb.
The apps they use are also pretty good and convenient. China is using the internet and apps to tighten its grip on the general population.
Germany is stuck in the 1990s if we compare it to china in this regard.
We germans can't even get rid of faxing machines lmfao.
Viewing a single comment thread. View all comments