Viewing a single comment thread. View all comments

Disastrous_Elk_6375 t1_jdlj4rn wrote

> and uses a different base model and claims it’s a big innovation

Huh? My read of their blog was that they wanted to highlight the fact that you can fine-tune a ~2yo LLM and still get decent results. I don't think they've claimed this is innovative, or that the innovation is theirs to boast...

I've played with GPT-neo (non X) and GPT-J when they were released, and the results were rough. You had to do a ton of prompt engineering work and exploration to find useful cases. This shows that even smaller, older models can be fine-tuned with the method proposed in Alpaca.

14

SeymourBits t1_jdlkln7 wrote

I second this. I was able to extract fairly useful results from Neo but it took a huge amount of prompt trial and error, eventually getting decent/stable results but not in the same ballpark as GPT3+. The dolly training results here seem good, if not expected. I'm now ready to move to a superior model like LLaMA/Alpaca though. What are you running?

7

dreamingleo12 t1_jdll44j wrote

I’ve been experimenting with Alpaca and able to fine-tune it using the dataset provided in 40 minutes with 8 A100s, spot instances. It actually works well.

3

Daveboi7 t1_jdm8aby wrote

What platform are you using for training?

2

dreamingleo12 t1_jdn511a wrote

By platform you mean?

2

Daveboi7 t1_jdnczd9 wrote

My bad. Did you train the model locally on your PC or using cloud?

1

dreamingleo12 t1_jdndszl wrote

I trained the model using cloud

2

Daveboi7 t1_jdndvq0 wrote

With databricks?

1

dreamingleo12 t1_jdndzmt wrote

No I don’t use databricks. I only tried LLaMA and Alpaca.

1

Daveboi7 t1_jdnedrd wrote

But which cloud service did you use to train them?

I tried using databricks to train a model but the setup was too complicated.

I’m wondering is there a more straightforward platform to train on?

1

dreamingleo12 t1_jdnel6b wrote

You can just follow Stanford Alpaca’s github instructions, as long as you have LLaMA weights. It’s straightforward.

2

Daveboi7 t1_jdneqdx wrote

Ah. I’m trying to train the Dolly model created developed databricks.

1

dreamingleo12 t1_jdnewt2 wrote

It’s just Alpaca with a different base model. Databricks boasted too much.

1

Daveboi7 t1_jdnf18o wrote

Yeah but the comparisons I have seen between Dolly and Alpaca look totally different.

Somehow the Dolly answers look much better imo

Edit: spelling

1

dreamingleo12 t1_jdnf4qn wrote

I don’t trust DB’s results tbh. LLaMA is a better model than GPT-J.

2

Daveboi7 t1_jdnf96e wrote

Somebody posted results on Twitter, they looked pretty good. I don’t think he worked for DB either. But who knows really

1

dreamingleo12 t1_jdlkbxl wrote

WSJ:

“Databricks Launches ‘Dolly,’ Another ChatGPT Rival The data-management startup introduced an open-source language model for developers to build their own AI-powered chatbot apps” (Apparently DB paid them)

DB’s blog:

“Democratizing the magic of ChatGPT with open models”

Introduced? ChatGPT rival? Didn’t you just follow Stanford’s approach? You used Stanford’s dataset which was generated by GPT right? huh? This is Stanford’s achievement not DB’s. DB went too far on marketing.

1

Disastrous_Elk_6375 t1_jdllii0 wrote

> https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

This is the blog post that I've read. I can't comment on the WSJ article, and your original message implied a bunch of things that, IMO, were not found in the blog post. If you don't like the WSJ angle, your grief should be with them, not databricks. shrug

From the actual blog:

> We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data.

> Acknowledgments > > This work owes much to the efforts and insights of many incredible organizations. This would have been impossible without EleutherAI open sourcing and training GPT-J. We are inspired by the incredible ideas and data from the Stanford Center for Research on Foundation Models and specifically the team behind Alpaca. The core idea behind the outsized power of small dataset is thanks to the original paper on Self-Instruct. We are also thankful to Hugging Face for hosting, open sourcing, and maintaining countless models and libraries; their contribution to the state of the art cannot be overstated.

More to the point of your original message, I searched for "innovative" "innovation" "inovate" and found 0 results in the blog post. I stand by my initial take, the blog post was fair, informative and pretty transparent in what they've done, how, and why.

7

dreamingleo12 t1_jdllxww wrote

Well if you ever worked with marketing or communication teams you would’ve known that DB co-authored the WSJ article. My point is that the democratization is an achievement of the Stanford Alpaca team, not DB. DB marketed it like they did the major work which is untrue.

−6

Disastrous_Elk_6375 t1_jdlm6qd wrote

That's fair. But you commented out of context, on a post that linked to the blog and not the WSJ article. That's on you.

6

dreamingleo12 t1_jdlmhcq wrote

Well if you have connections you would’ve seen they made a good amount of posts.

−6