Some notes from HN user saurabh20n:

Quick notes from first glance at paper https://research.facebook.com/publications/llama-open-and-ef...:

* All variants were trained on 1T - 1.4T tokens; which is a good compared to their sizes based on the Chinchilla-metric. Code is 4.5% of the training data (similar to others). [Table 2]

* They note the GPU hours as 82,432 (7B model) to 1,022,362 (65B model). [Table 15] GPU hour rates will vary, but let's give a range of $1 to $4. The 7B model would have cost ~$82-329k and the 65B something in the range of ~$1-4M. They also note their total time spent for all models: "we used 2048 A100-80GB for a period of approximately 5 months" [sec 6, pg 10]

* 65B model's performance is broadly comparable to PALM-540B. Not a small feat, but also could indicate the benefits of good model-vs-token size ratios [Tables 3,4,5,6]. Their conjecture for underperforming on MMLU (multitask language understanding) compared to PALM-540B and Chinchilla-70B is smaller fraction of books and academic training data.

* Math and code tasks: Math tasks they are substantially worse than Minerva (comparing their 65B to Minerva 62B; they hands down fail against Minerva 540B) [Table 7]. Code tasks they are broadly competitive with PALM-540B (HumanEval and MBPP evals) [Table 8]

* Surprising that instruction fine tuning takes such a small part of the paper (sec 4, pg. 7)

Announcement link: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

Comments

You must log in or register to comment.

Easyldur t1_j9wpkdf wrote on February 25, 2023 at 2:35 AM

#1,983,771

Thanks for the detailed explanation.

This could follow the path of Stable Diffusion: a smaller, open source model comparable to the bigger Dall-e in performance, which in turn gave birth to the more-than-exceptional Midjourney.

Let's see!

TinyBurbz t1_j9ws0sy wrote on February 25, 2023 at 2:55 AM

#1,984,170

Replying to Easyldur (#1,983,771)

From a creators perspective Midjourney is far from exceptional... let alone acceptable.

YobaiYamete t1_j9wt1i0 wrote on February 25, 2023 at 3:03 AM

#1,984,329

Replying to TinyBurbz (#1,984,170)

Midjourney looks flashy, but any "real prompt engineer" has to use Stable Diffusion imo. MJ is good for making certain images, but the sheer flexibility and power of SD completely dwarfs it by far.

MJ is great for a quick example or for people who aren't tech savvy and can't run SD, but IMO it's more of a toy while SD is a tool

TinyBurbz t1_j9wtdcb wrote on February 25, 2023 at 3:06 AM

#1,984,383

Replying to YobaiYamete (#1,984,329)

I can get decent textures out of SD, which is what I use it for.

I know this is gonna rustle a lot of feathers, but people need to not use these models to produce whole pieces. It hurts how seriously people take real artistic skills, and makes these tools look immoral.

YobaiYamete t1_j9wu8l0 wrote on February 25, 2023 at 3:14 AM

#1,984,523

Replying to TinyBurbz (#1,984,383)

IMO SD and the AI tools are just fantastic compliments to the rest of your artistic kit, just like photoshop and blender etc, but are still just tools in the kit rather than the whole kit

People who think they replace artists are not seeing the real picture. The only artists they replace are the lowest end artists, and all those artists have to do is adapt to the tech and they will still be relevant too.

Even with SD I still run into tuns of situations where I need to use photoshop to tweak something or need to draw something, and I instantly run into the limit of my artistic skill, because I'm not a real artist.

Which IMO, is the gap between an "ai artist" and an actual artist. AI can make some really beautiful stuff (one of my favorites I've seen), but as soon as you need to customize it or make fine tweaks you start having to fight the AI rather than work with it

drums_addict t1_j9xr9wn wrote on February 25, 2023 at 9:14 AM

#1,989,707

Tina eat your dinner!

Fallen-stars123 t1_j9yufow wrote on February 25, 2023 at 4:01 PM

#1,996,791

It seems that the new "idea" will be to train a lot more tokens, than just increasing the number of parameters, it seems that we were undertraining the models.

I imagine that GPT-4 will see a big jump in the amount of tokens trained.

Easyldur t1_ja06q7j wrote on February 25, 2023 at 9:24 PM

#2,005,217

Replying to TinyBurbz (#1,984,170)

You must consider the vast majority of the people who are not creators, illustrators, artists.

For a person like me, who since kindergarten can't draw anything but stick-men, Midjourney is a God-send.

My workflow is: Midjourney for the main picture, Dall-e inpainting for some corrections (eg. hands), GIMP for the tiny details and Topaz Photo AI for the upscale.

With this I can create beautiful pictures for my toddler, things that until 6 month ago I could never imagine.

play_yr_part t1_ja7awyd wrote on February 27, 2023 at 11:15 AM

#2,052,054

Replying to Easyldur (#2,005,217)

bruh

there's got to be an art tutorial or something on youtube that could teach you how to draw more than stickmen that you could watch for half an hour to an hour a day instead of tinkering with AI art in the same timeframe

Maybe your kid will grow up having no distinction between something that is made by an AI or a human, maybe they will appreciate something that is hand drawn over something that took little to no effort to prompt and edit.