Submitted by Pro_RazE t3_11aunt2 in singularity
Comments
YobaiYamete t1_j9uga58 wrote
> LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B
As per always with these claims lately, "I'll believe it when I can talk to it"
There's so many trying to make these big claims but then, the only we can actually talk to is ChatGPT and Bing.
MysteryInc152 t1_j9uhssy wrote
I think peer-reviewed research papers are a bit more than just "claims".
As much as i'd like all the SOTA research models to be usable by the public, research is research and not every research project is done with the interest of making a viable commercial product. Inference with these models are expensive. That's valid too.
Also seems like this will be released under a non commercial license like the OPT models.
[deleted] t1_j9uie6g wrote
[deleted]
[deleted] t1_j9zissn wrote
[deleted]
9985172177 t1_j9ziz50 wrote
That's not true at all. You.com's chat is very strong, more than comparable to chatgpt, and it is even more open, as in you don't need to provide a phone number to use it. Plus there are other models like Bloom and so on that are far more open, as in you can download them and run them yourself and integrate them into other software.
YobaiYamete t1_j9zwgdi wrote
You.com is okay, but it definitely not on par with ChatGPT lol. It's running on a weaker version of GPT and you can't just talk to it the same way
CharacterAI was smarter than ChatGPT until they nerfed it into the ground, but that's issue. Everywhere that has a decent AI suddenly nerfs it until it's too useless to use
WarAndGeese t1_ja7z1kn wrote
Are you sure that YouChat is running on a version of GPT? (Presumably you mean openai's software.) I was speaking to a founder of a company that had some partnership with You.com and he was saying they roll their own machine learning stuff, that they (You.com) were already machine learning experts.
Hemanth536 t1_j9u74e3 wrote
Looks like Channels might become new type of blogs for companies and influencers to announce something
Pro_RazE OP t1_j9ua28q wrote
Maybe. It is their latest addition to Instagram, so it makes sense him using it to announce new stuff. As this will inspire some to do the same.
Lawjarp2 t1_j9uf61c wrote
It's around as good as GPT-3(175B) but smaller(65B) like chinchilla. If released publicly like OPT models then it could be really big for open-source. If optimised like flexgen to run on a single GPU or a small rig maybe we could all have our own personal assistant or pair programmer.
TeamPupNSudz t1_j9uih5g wrote
> It's around as good as GPT-3(175B) but smaller(65B) like chinchilla.
Based on their claim, it's way more extreme than that even. They say the 13B model outperforms GPT3 (175B), which seems so extreme its almost outlandish. That's only 7% the size.
blueSGL t1_j9umbty wrote
> which seems so extreme its almost outlandish.
reminder that GPT3 was datastarved as per the Chinchilla scaling laws.
Lawjarp2 t1_j9uj86z wrote
In some tasks the 7B model seems close enough to the orginal gpt-3 175B. With some optimization it probably can be run on a good laptop with a reasonable loss in accuracy.
13B doesn't outperform in everything however 65B one does. But it's kinda weird to see their 13B model be nearly as good their 65B one.
However all their models are worse than the biggest Minerva model.
DuckyBertDuck t1_j9yjdua wrote
It makes sense if you look at the chinchilla findings which suggest that ~10x more data is optimal.
qrayons t1_j9u8z62 wrote
I wonder what he means by "released".
Agreeable-Rooster377 t1_j9u9x54 wrote
I had the same thought. Another AI frontend is alright but we really could use a big SOTA LLM being open sourced. It's Facebook though so I doubt that is the case here
[deleted] t1_j9ub54r wrote
[deleted]
raidedclusteranimd t1_j9ufzry wrote
"Request for access" smh, same thing for Make-A-Video, requested access the day it released and still haven't got it.
I wonder if they're even making models or just making up papers and publishing them.
MechanicalBengal t1_j9xowpm wrote
He’s trying to get that stock price up. Literally the only real purpose of this “news”
beezlebub33 t1_j9ugt9v wrote
They released code to run inference on the model under GPL. they did not release the model and describe the model license as 'Non-commercial bespoke license', so who the hell knows whats in there.
You can apply to get the model. See: https://github.com/facebookresearch/llama but no info about who, when, how, selection criteria, restrictions, etc.
Model card at: https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md
(I'd also like to take this opportunity to remind people that Model Card concept is from this paper: https://arxiv.org/abs/1810.03993. First author is Margaret Mitchell, last author is Timnit Gebru. They were both fired by Google when Google cleared out it's Ethical AI Team.)
TeamPupNSudz t1_j9una4h wrote
> but no info about who, when, how, selection criteria, restrictions, etc.
The blog post says "Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world" which doesn't sound encouraging for individual usage.
beezlebub33 t1_j9uvup4 wrote
thanks, I didn't see that. What is the link to that?
TeamPupNSudz t1_j9uvy79 wrote
[deleted] t1_j9u9yyb wrote
[deleted]
maskedpaki t1_j9ut4v4 wrote
For those wondering about the performance
5 shot performance on MMLU.
Chinchilla 67.5
this new model 68.9
human baseline 89.8
​
so it seems a smidge better than chinchilla on 5 shot MMLU Which many consider to be the important AGI benchmark (its one of the AGI conditions on metaculus)
some nice work by meta.
MysteryInc152 t1_j9v4fru wrote
Flan-Palm hits 75 on MMLU. Instruction finetuning/alignment and COT would improve performance even further.
Tavrin t1_j9vl37m wrote
Flan-Palm is 540B so there's that
maskedpaki t1_j9z7sxs wrote
yes!. the really big breakthrough here is that its on par with the original gpt3 at only 7 billion parameters on a bunch of benchmarks ive seen.
​
that means its gotten 25x more efficient in the last 3 years.
I wonder how efficient these things can get. Like are we going to see a model thats 280 million parameters that rivals original gpt3 in 2026 and a 11 million parameter one in 2029.
Baturinsky t1_j9x7xw7 wrote
It seems that it is close to SOTA on 60-70B models. "Only" big deal is that the smaller LLAMA models show results comparable to much bigger SOTAs.
Borrowedshorts t1_j9var8b wrote
Why are people so skeptical of published results? This is how exponential progress works, smaller models today can perform better than larger models of a couple years ago.
imlaggingsobad t1_j9vmlwb wrote
Meta's AI research is very good, people are sleeping on them. It's definitely a 3 way race between Google, Microsoft and Meta.
czk_21 t1_ja0og2w wrote
its not 3 way race, there are lot more competitors like Nvidia, Baidu, IBM, Amazon,....and if you count smaller startups working on their own chatbots etc it could be tens or more, in europe ppl work on AI too
https://the-decoder.com/a-german-ai-startup-just-might-have-a-gpt-4-competitor-this-year/
Johnny_WakeUp t1_j9yk9v4 wrote
I'm more of an interested outsider, but are any of these smaller companies, like Open AI, on that list?
I was talking with a friend about how ChatGPT could replace Google for search. Are they just keeping research close to the vest?
mertats t1_j9yl8t9 wrote
OpenAI = Microsoft
WarAndGeese t1_j9zi7ft wrote
Yes there are but I don't know all of them. Note though that Stable Diffusion blew Dall-E2 and Imagen out of the water. Because it was free and open source, it was much more widely used. Now Dall-E is probably still going to be used heavily in industry, but the closed tools and expensive tools tend to lose out to the free and open source ones. That's one thing that has happened so far with generative adversarial networks and that's one thing that would likely happen with large language models and other models as well.
imlaggingsobad t1_ja0oygq wrote
imo there are only a few companies that have a real shot at making AGI.
Google / DeepMind / Anthropic
Microsoft / OpenAI
Meta / FAIR
There are smaller companies like Adept, Cohere and Inflection that are doing interesting work.
Others like Amazon, Nvidia, Apple, Tesla, Salesforce, Intel, IBM are capable, but they haven't fully committed to AGI.
94746382926 t1_ja5px6h wrote
The difference seems to be that the other companies are investing in AI to boost their main business (Nvidia for hardware sales, Tesla for Self driving, etc.)
For the big 3 AGI is the business.
94746382926 t1_ja5pvgg wrote
The difference seems to be that the other companies are investing in AI as a supplement to their main business (Nvidia for hardware sales, Tesla for Self driving, etc.)
For the big 3 AGI is the business.
Lawjarp2 t1_j9ui7bd wrote
GitHub link : https://github.com/facebookresearch/llama
Not really fully free to use right away as you have to fill a Google form and they may or may not approve your request to download the trained model. Training the model yourself is expensive anyway.
kindred_asura t1_j9uw4pe wrote
Lol training the model yourself is expensive? You know these models take tens of millions of dollars to train right ? No one except billion dollar companies can do it.
Lawjarp2 t1_j9v0zzv wrote
Yes I know that. That's exactly what I've said above. Did you imagine something else?
FYI the smallest model could possibly be trained for under 50k dollars.
PrivateUser010 t1_j9ynpg1 wrote
It's not just the cost of training. It's the availability of quality data. Meta/Google/Facebook/Microsoft are all on the forefront of this due to the access to data.
QuestionableAI t1_j9utk7e wrote
I suggest they give these things real names instead of bullshit letters and numbers...This last one should be known simply as SAM or FRED and get over themselves about being some super cal destine made of TV shite.
adt t1_j9w062r wrote
It's a llama. It's 65 billion parameters. Seems better than some of the other crazy acronyms (or muppet characters!).
QuestionableAI t1_j9wibg7 wrote
Well, I feel stupid. And, I could not agree with you more.
Kolinnor t1_j9ujwaj wrote
Damn, that sounds quite big ! I'm very impressed with Meta this time, because usually it was a shitshow. I guess there must be different teams, but this is great !
TrainquilOasis1423 t1_j9vyix0 wrote
Didn't even finish the first sentence. WHAT THE HELL IS THIS ACRONYM?
LLaMA (Large Language Model Meta AI)
Franck_Dernoncourt t1_j9v5wwh wrote
Why SOTA? Did they compare against GPT 3.5? Only comparison against GPT 3.5 I found in the LLaMA paper was:
> Despite the simplicity of the instruction finetuning approach used here, we reach 68.9% on MMLU. LLaMA-I (65B) outperforms on MMLU existing instruction finetuned models of moderate sizes, but are still far from the state-of-the-art, that is 77.4 for GPT code-davinci-002 on MMLU (numbers taken from Iyer et al. (2022)).
TemetN t1_j9v40r7 wrote
The size is pretty much the most significant thing at a glance, the benchmarks stick to comparing to older models and ignoring more recent advancements even in those models. I'd be more enthused if they were open sourcing it, but despite them being more open than OpenAI lately it still seems to operate off some sort of weird 'can apply, but you'll never get approved' process.
gelukuMLG t1_j9vfnmg wrote
OPT but better lol. Also will it still require to request access to use it?
KelbyGInsall t1_j9vpud0 wrote
They’re making it available to the community means they hit a wall and hope you’ll bust through it for them.
Brilliant_War4087 t1_j9w0243 wrote
I'll believe it when it can do my homework.
[deleted] t1_j9wxz0a wrote
[removed]
Akimbo333 t1_j9xlvdc wrote
Has anyone tried this? Is it any good?
Ok-Cheek2397 t1_j9xwvc6 wrote
how can I use it. is it have a website or I need to run it myself on my computer ?
[deleted] t1_j9xyum5 wrote
[deleted]
gotyelover44 t1_ja3oz2t wrote
Wow 🤯
Brashendeavours t1_j9vswaa wrote
Here comes another steaming coiler from Zuckerberg.
d00m_sayer t1_j9u8zf3 wrote
can ChatGPT solve math theorems ?
PM_me_PMs_plox t1_j9w87tt wrote
"solving math theorems" is probably a very optimistic way of putting it on zucc's part
datsmamail12 t1_j9ucic8 wrote
Cringe
Pro_RazE OP t1_j9u3sra wrote
Man announced it through Instagram channels lmao. There's no paper or anything else posted yet.
Edit: They posted. Here's the link: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/?utm_source=twitter&utm_medium=organic_social&utm_campaign=llama&utm_content=blog
"Today we're publicly releasing LLAMA, a state-of-the-art foundational LLM, as part of our ongoing commitment to open science, transparency and democratized access to new research.
We trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Our smallest model, LLaMA 7B, is trained on one trillion tokens"
There are 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B
From this tweet (if you want more info) : https://twitter.com/GuillaumeLample/status/1629151231800115202?t=4cLD6Ko2Ld9Y3EIU72-M2g&s=19