mrfreeman93 t1_jcbgwp7 wrote on March 15, 2023 at 5:20 PM

I mean LLaMA was apparently trained on outputs from davinci-003 from OpenAI... the rule is whatever works

Nhabls t1_jcbnr3g wrote on March 15, 2023 at 6:02 PM

That's alpaca, a finetuning on llama and you're just pointing to another of openai's shameless behaviours. Alpaca couldn't be commercial because openai thinks it can forbid usage of outputs from their model to train competing models. Meanwhile they also argue that they can take whatever and any and all copyrighted data from the internet with no permission or compensation needed.

They think they can have it both ways, at this point i'm 100% rooting for them to get screwed as hard as possible in court on their contradiction

crt09 t1_jcbv608 wrote on March 15, 2023 at 6:48 PM

> Alpaca couldn't be commercial because openai thinks it can forbid usage of outputs from their model to train competing models.

I dont think they claimed this anywhere? It seems that the only reason for Alpaca not releasing weights is Meta's policy for releasing Llama weights.

https://crfm.stanford.edu/2023/03/13/alpaca.html

> We have reached out to Meta to obtain guidance on releasing the Alpaca model weights, both for the 7B Alpaca and for fine-tuned versions of the larger LLaMA models.

Plus they already released the data they got from the GPT API, so anyone who has Llama 7B; an ability to implement the finetuning code in Alpaca; and 100 bucks can replicate it.

(EDIT: they released the code. now all you need is a willingness to torrent Llama 7B and 100 bucks)

Nhabls t1_jcc2tg0 wrote on March 15, 2023 at 7:35 PM

It's written right after that

>Second, the instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI

HyperModerate t1_jcd0lnn wrote on March 15, 2023 at 11:12 PM

The way AI is used to launder copyright and licensing is concerning. Copyrighted data is used to train a model. The model’s output, now also licensed, is used to finetune a second model, also separately licensed. Finally, this highly licensed model is considered for public release.

The attitude is basically the same as a pirating but there is no similar legal precedent.

To be clear, I think AI research should be open.