LetGoAndBeReal
LetGoAndBeReal t1_jea1id9 wrote
Reply to comment by elbiot in [D] The best way to train an LLM on company data by jaxolingo
I believe you are referring to this statement from the link: "Ability to train on more examples than can fit in a prompt." Correct?
If so, as I explained, the key word here is "examples." And if you understand why, you will see that there is no contradiction. I will try to clarify why.
There are two methods that we are discussing for extending the capability of an LLM:
- Prompt engineering
- Fine-tuning
There are also different types of capability that might be extended. We are discussing the following two:
- Adding new knowledge/facts to the model
- Improving downstream processing tasks, such as classification, sentiment analysis, etc.
Both of these capabilities are readily done through prompt engineering. Adding new knowledge with prompt engineering involves including that knowledge as context in the prompt. Improving tasks such as classification is done by include examples of the processing you want done in the prompt.
What the article says is that for the case where you want to provide examples in the prompt to make the model perform better, you can alternatively use fine-tuning. The article does not say "Ability to add more knowledge than can fit in a prompt." Examples = downstream processing tasks. Examples != new knowledge.
LetGoAndBeReal t1_je9zfyb wrote
Reply to comment by Goldenier in [D] The best way to train an LLM on company data by jaxolingo
>And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.
It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".
LetGoAndBeReal t1_je9c66v wrote
Reply to comment by WokeAssBaller in [D] The best way to train an LLM on company data by jaxolingo
Instead of insisting that fine-tuning reliably adds new knowledge to an LLM, why not instead show some evidence of this claim. Per my links above, this is a notoriously challenging problem in ML.
Apart from these resources, let's think critically for a second. If the approach were viable at this point, then there would be tons of commercial solutions using fine-tuning instead of RAG for incorporating external knowledge in an LLM application. Can you find even one?
LetGoAndBeReal t1_je9a3hb wrote
Reply to comment by elbiot in [D] The best way to train an LLM on company data by jaxolingo
Of course, that’s what allows RAG to work in the first place. I didn’t say you couldn’t provide new knowledge through the prompt. I only said you cannot provide new knowledge through the fine-tuning data. These are two completely separate things. This distinction is the reason RAG works for this use case and fine-tuning does not.
LetGoAndBeReal t1_je8m6y9 wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Include new factual statements in your training data like “Joe Biden’s cat is named Fluffy.” Ask the model the name of Joe Biden’s cat before and after training and let us know the answers you get back. See if you get reliable answers across a set of data/questions.
LetGoAndBeReal t1_je8j7hw wrote
Reply to comment by elbiot in [D] The best way to train an LLM on company data by jaxolingo
The key word in that OpenAI link is “examples”. It says “more examples” and not “more knowledge”, because it’s referring to few shot training, which is about conditioning rather than providing new data.
In other words, if you want to get the model to classify sentiment of user comments as positive or negative, you can provide several examples in the prompt of both positive and negative comments. Fine-tuning allows you to provide many more such examples to the model than can fit in a prompt.
The key point is that through fine-tuning these examples can condition the model to classify sentiment but do not cause new facts to be absorbed by the model. You cannot get new facts to be readily absorbed through fine-tuning, which is why the OP should not look to fine-tuning to endow the model with the external dataset they want to use for question answering.
LetGoAndBeReal t1_je8akb1 wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
I would agree with that last statement. You think you understand this, but you don’t seem to understand what does and doesn’t happen during fine-tuning or to realize that the problem of adding knowledge to LLMs is a notoriously difficult problem that ongoing research is trying to solve.
Try looking at some of the research: https://openreview.net/forum?id=vfsRB5MImo9
Or read what OpenAI says fine-tuning accomplishes: https://platform.openai.com/docs/guides/fine-tuning
Or, better yet, try actually getting a LLM to learn new facts by fine-tuning it. Then you will understand.
LetGoAndBeReal t1_je7re7y wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Of course the fine-tuning data itself can have knowledge not in the model - that doesn’t prove anything.
What you need to show is that knowledge presumably added during fine-tuning was then retrieved from the model after fine-tuning.
LetGoAndBeReal t1_je7p0l8 wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
In what way does this show that new knowledge was added to a large language model?
LetGoAndBeReal t1_je7n1gc wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Instead of seeing who can talk more loudly about who’s right, why don’t you post a link of a script that does this.
LetGoAndBeReal t1_je7m1tq wrote
Reply to comment by light24bulbs in [D] The best way to train an LLM on company data by jaxolingo
Take a closer look at every script/blog/video related to fine-tuning a model and you will see it doesn’t involve adding new knowledge to the model. If you find an exception I’d be delighted to see it.
LetGoAndBeReal t1_je71r0g wrote
Reply to comment by machineko in [D] The best way to train an LLM on company data by jaxolingo
Fine-tuning can be great for getting better output from the model based on the knowledge that model already contains. I only meant fine-tuning is not viable for getting new data/knowledge into a model. Fine-tuning does not accomplish knowledge absorption.
LetGoAndBeReal t1_je65ffo wrote
The comments here so far have addressed three possible approaches to this. Two of those approaches - ie training your own model and fine-tuning an existing model - are not currently viable. Training your model would require a ridiculous amount of human and compute power and not result in something where data could be easily added. Fine-tuning a model does not result in the model absorbing new data - it only conditions the output patterns from the model using data/knowledge the model gained during initial training.
The only viable approach is to use retrieval augmented generation, where data relating to user questions are retrieved from outside the model and fed to model as part of the prompt. Tools like LangChain can help you build a RAG solution on your own. There are also many services coming out that provide this sort of capability, such as humata.ai.
LetGoAndBeReal t1_j4vz8hv wrote
Reply to [D] Simple Questions Thread by AutoModerator
Companies can fine-tune top performing LLMs to condition the LLMs output, but not to embody the knowledge contained in proprietary data. The current best approach for incorporating this custom knowledge is through data augmented generation techniques and technologies such as what LangChain offers.
I am trying to decide whether to invest time building an expertise in these techniques and technologies. I may not wish to do so if the ability to add custom knowledge properly in the LLMs will arrive in short order.
I would like to know from those steeped in LLM R&D how soon such capabilities might be expected. Is this the right place to ask?
LetGoAndBeReal t1_j4t6yb6 wrote
I'm a bit unclear why this announcement is so significant, and frankly I'm not even sure I understand it. We already have API access to the text-davinci-003 model, and my understanding is that ChatGPT basically uses the same model with a small amount of incremental tuning.
Is this announcement just saying that this marginally revised model will now be available as a model option through the OpenAI API? If so, what benefit does this provide over the API access using text-davinci-003?
LetGoAndBeReal t1_j4n6rfa wrote
Reply to comment by avocadoughnut in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
Wow, that seems awfully ambitious given that GPT3.5 requires something like 700GB of RAM and the apparent unlikeliness that SoTA model sizes will get smaller anytime soon. Interesting project to watch, though.
LetGoAndBeReal t1_j4mihya wrote
Reply to comment by avocadoughnut in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
I looked through their repo, but I'm not understanding something: what is the foundational model that they plan to use and where/how will the model be run?
LetGoAndBeReal t1_j48gyhr wrote
My main comment is that is article was super useful and easy to understand.
My smaller comment is that the pattern of repeating the content is those bordered areas interrupts the flow and is pretty annoying. So, my vote would be to drop that, and you have yourself a near perfect article.
LetGoAndBeReal t1_j3r0p45 wrote
Reply to comment by I-am_Sleepy in [D] Simple Questions Thread by AutoModerator
Thank you for this. It seems this paper could surely help answer my question, if only I could understand it!
A challenge I keep coming up against in my quest to quickly learn about ML/NN is that almost everything I read is either too high level to provide meaningful explanation or too technically dense for me to follow. I guess I will just take note of this paper for now and circle back to it when I'm a bit further along.
LetGoAndBeReal t1_j3oit19 wrote
Reply to [D] Simple Questions Thread by AutoModerator
How should I think about the way a large language model gains new specific knowledge? For example, suppose you have a model trained on hundreds of gigabytes of text and then want to continue its training to gain knowledge of a single specific fact it has not yet encountered such as “Steven Pinker is the author of The Language Instinct.”
I imagine that presenting it with a single sentence such as this embedded in a training set would contribute very little to its ability to subsequently answer the question “Who was the author of The Language Instinct?” Is that correct?
Is there some heuristic for how many exposures a model like GPT3.5 would need to a new fact, as such, before its weights and biases were adjusted enough to embody this fact?
LetGoAndBeReal t1_jef6vjx wrote
Reply to comment by Philpax in [R] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - Yaobo Liang et al Microsoft 2023 by Singularian2501
Right, ReAct seems to be the core pattern that everyone - including LangChain with their agents and OpenAI with their plugins - is using.