starstruckmon t1_j6izowe wrote on January 30, 2023 at 5:55 PM

I would be very surprised. They have technically speaking ( as per benchmarks ), one of the best text-to-image generators right now, yet the practical output is far below what we have in quality due to the limited dataset.

It would probably be even worse for text. Wikipedia, reddit, all the code forums like stackoverflow, documentations and manuals, vast majority of scientific papers. They'd be leaving so much out.

visarga t1_j6jut51 wrote on January 30, 2023 at 9:08 PM

No, AI doesn't work that way. You just put into it text in any language, all of them together, and it figures out an inter-language representation. So you can ask in Chinese what it learns in English.

But there's also plenty of Chinese text. GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English). GPT-3 was trained on 300B tokens mostly English.

starstruckmon t1_j6jw3kl wrote on January 30, 2023 at 9:16 PM

It seems like you're talking about a model that has been trained in both languages. However, there are two issues with this. Firstly, the Chinese generally prefer to train models solely on Chinese data or with a limited amount of English data included. Secondly, multi-language models currently perform significantly worse compared to models that are trained on a single language.

_Just7_ t1_j6kv8fs wrote on January 31, 2023 at 1:13 AM

Hate to be that guy, but source on models in single languages being better? I thought more data = better modeling. Why would it perform worse if you also include the Spanish and Chinese parts of the internet?

starstruckmon t1_j6kygds wrote on January 31, 2023 at 1:37 AM

I can't really speculate on that topic. It's currently an active area of research.

To be honest, this problem is so widely known that I hadn't considered finding sources to support the claim. Here is the best authoritative source I could quickly find

https://arxiv.org/abs/2012.15613

It may seem counter-intuitive to link to a paper that supposedly fixes this issue, but this is obviously the most likely scenario in which a paper would discuss it. Also, if you read it carefully, you'll see that while the authors managed to reduce the gap, it still persists.

[deleted] t1_j6maw1g wrote on January 31, 2023 at 9:48 AM

[deleted]

[deleted] t1_j6loc29 wrote on January 31, 2023 at 5:05 AM

[deleted]

FirstEbb2 t1_j6iidfh wrote on January 30, 2023 at 4:06 PM

It's hard for me to imagine - I can imagine Germany rising again after being bombed to rubble because fixing the buildings would help Germany's stability, but fixing the Chinese network that was made into a shithole by the fireproof Great Wall and silly apps wouldn't do the bureaucrats any good.
I don't believe in Buddhism, but I do believe that retribution has manifested itself in the Chinese government and they will shamefully lose this AI war unless they fundamentally get rid of their backward system.

Crit0r t1_j6iqgji wrote on January 30, 2023 at 4:58 PM

Do you ever have visited china? The big cities are pretty advanced and the internet connection is superb.

The apps they use are also pretty good and convenient. China is using the internet and apps to tighten its grip on the general population.

Germany is stuck in the 1990s if we compare it to china in this regard.

We germans can't even get rid of faxing machines lmfao.

Chinese Search Giant Baidu to Launch ChatGPT-Style Bot

FirstEbb2 t1_j6hlp4k wrote on January 30, 2023 at 11:33 AM

Melancholy-Zebra t1_j6iclje wrote on January 30, 2023 at 3:27 PM