Submitted by austintackaberry t3_120usfk in MachineLearning

Databricks shows that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in less than three hours on one machine, using high-quality training data.

They fine tuned GPT-J using the Alpaca dataset.

Blog: https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
Github: https://github.com/databrickslabs/dolly

593

Comments

You must log in or register to comment.

machineko t1_jdjeh6y wrote

We have a similar open-source project focused on personalization of LLMs and efficient fine-tuning: https://github.com/stochasticai/xturing

We actually released code for GPT-J, LLaMA and GPT-2 before these guys but we are a small team. You can run it on any local machines too.

182

SWESWESWEh t1_jdk8rtn wrote

Doing the lords work my friend. Does it work with Apple Silicon Metal shaders? I've trained my own models as both TF and pytorch support it but I've noticed a lot of people use cuda only methods which makes it hard to use open source stuff

17

machineko t1_jdmm43b wrote

Thanks for the comment. Are you looking to run on M2 or smaller edge devices?

3

light24bulbs t1_jdks13d wrote

Question: i notice there's a focus here on fine tuning for instruction following, which is clearly different from the main training where the LLM just reads stuff and tries to predict the next word.

Is there any easy way to continue that bulk part of the training with some additional data? Everyone seems to be trying to get there with injecting embedding chunk text into prompts (my team included) but that approach just stinks for a lot of uses.

8

elbiot t1_jdlgxnz wrote

In my understanding, if you have text, it's not a challenge to train on next word prediction. Just keep the learning rate low. The reason there's a focus on the instruction based fine tuning is because that data is harder to come by.

My only experience is I've done this with a sentence embedding model (using sbert) and I just trained on my new text and the original training data 50/50 and it both got better at embedding my text and didn't forget how to do what it was originally trained on

5

light24bulbs t1_jdlrnll wrote

That's cool, that's exactly what I want to do. I'm hunting around for a ready-made pipeline to do that on top of a good open source model.

3

visarga t1_jdloh24 wrote

Since RLHF finetuning is short, you can continue training your original model and RLHF again.

2

baffo32 t1_jdnppmp wrote

this is the same task as instruction tuning. instruction tuning just uses specific datasets where instructions are followed. it‘s called “finetuning” but nowadays people are using adapters and peft to do this on low end systems.

1

light24bulbs t1_jdntdbb wrote

I'm not hoping to do instruction tuning, i want to do additional pre-training.

1

baffo32 t1_jdo24su wrote

It is the same thing. The alpaca data is just further pretraining data consisting of instructions and responses. Doing this is called finetuning.

1

baffo32 t1_jdrhj77 wrote

I was still confused as to your response, and I’m thinking that if you wanted a model to behave like you had given different pretraining data, you would probably first finetune on the different bulk data, and then after this finetune on the target task such as instruction following.

Instruction following is indeed of course just predicting the next word: on data where the next word is obedient to instructions preceding it.

1

light24bulbs t1_jdrm9kh wrote

That's the part I wasn't getting. I assumed the fine tuning involved a different process. I see now that it is fact just more training data, often templated into a document in such a way that it's framed clearly for the LLM.

The confusing thing is that most of the LLM-as-a-service companies, Open-AI included, will ONLY take data in the question answer format, as if that's the only data you'd want to use to fine tune.

What if i want to feed a book in so we can talk about the book? A set of legal documents? Documentation of my project? Transcriptions of TV shows?

There are so many use cases for training on top of an already pre-trained LLM that aren't just question answering.

I'm into training llama now. I simply took some training code i found, removed the JSON parsing question answer templating stuff, and done.

1

nemorocksharder t1_jdz8kt5 wrote

What you're describing is exactly what I have been looking to do too, and am really surprised I'm not hearing more about it. Have you found any useful approaches to essentially adding to the LLM's Corpus with target material/text? or anyone else trying to do this?

1

ephemeralentity t1_jdm6wkc wrote

Playing around with this. Running BaseModel.create("llama_lora") seems to return "Killed". I'm running it on WSL2 from Windows 11 so I'm not sure if that could be the issue. Running on my RTX 3070 with only 8GB VRAM so maybe that's the issue ...

EDIT - Side note, I first tried directly on Windows 11 but it seems deepspeed dependency is not fully supported: https://github.com/microsoft/DeepSpeed/issues/1769

2

machineko t1_jdnmg8l wrote

Right, 8GB won't be enough for LLaMA 7b. You should try GPT-2 model. That should work on 8GB VRAM.

2

ephemeralentity t1_jdp2pu8 wrote

Thanks looks like gpt2 worked! Sorry, stupid question but how do I save/re-use the results of my model finetune? When I re-finetune for 0:2 epochs it gives a reasonable response but if I try to skip model.finetune, it responds with new lines only (\n\n\n\n\n\n\n\n ...).

1

machineko t1_jdqzmyq wrote

model.save("path/to/your/weights") saves it to the directory
After that, you can load it with
model = BaseModel.create("gpt2", "path/to/your/weights")

Can you share the input text you have used? It is possible that GPT-2 is too small and needs custom generation parameters.

2

ephemeralentity t1_jdt1krp wrote

Thanks a lot! To be honest, I need to spend a bit more time familiarising myself with pytorch / this package. I'll see if I can figure it out from here.

1

machineko t1_jdtv8jv wrote

If you need help, come find us on our discord channel.

2

light24bulbs t1_jdmad5n wrote

Hey, I've been looking at this more and it's very cool. One thing I REALLY like is that I see see self-training using dataset generation on your roadmap. This is essentially the technique that Facebook used to train ToolFormer, if I'm reading their paper correctly.

I'd really love to use your library to try to reimplement toolformers approach someday.

2

RiyazRockz t1_jdnbroi wrote

Hey, I want to fine tune a model to solve a pharma related problem. I want to know if I can fine tune my model with this.. Could you please share your contact details so that I can learn about this more?

1

MasterEpictetus t1_jdjdobi wrote

This type of work needs to get a lot of attention. Great demonstration on getting instruction following behavior without the biggest LLM model that only companies with major resources can train.

154

Civil_Collection7267 t1_jdjce9w wrote

This is really impressive. If anyone's curious, here's a comparison with results from Alpaca Native:

Write a love letter to Edgar Allan Poe.

>My dearest Edgar,
>
>I have been walking in a dream these past few days, unable to believe that you are no longer with us. Your words have been on my mind constantly, and I find myself looking for your presence in every corner of our world. Everywhere I go, I see your influence, from the stars above to the rivers below. Even now, as I write this letter, your spirit is with me.
>
>Your poetry was like a beacon of light in the darkness, illuminating the depths of my soul. Through your words, I discovered a new way of seeing the world, one filled with mystery and beauty. You showed me that life could be full of surprises and that nothing should ever be taken at face value.
>
>You were so ahead of your time, my love, and I know that even after all these years, your teachings will continue to live on. I will never forget everything you’ve given me, and I hope that one day we can meet again in another realm. Until then, I will keep your memory alive in my heart.
>
>Forever yours,
>
>[Name]

Write a tweet announcing Dolly, a large language model from Databricks.

>We're thrilled to announce Dolly, our latest language model from Databricks! Dolly is a large-scale language model with state-of-the-art performance on many tasks, including text classification and question answering. #LangModeling #NLP #AI

I'm selling my Nikon D-750, write a short blurb for my ad.

>I am selling my Nikon D-750 camera body with 16GB memory card and the original box and all original paperwork. The camera has been used lightly and is in excellent condition. It comes with the original lens, a Nikkor 35mm f/1.8G. This camera is perfect for any photographer looking to capture stunning images and videos. It offers an impressive array of features including a 24MP DX CMOS sensor, EXPEED 4 image processing engine, 51 point AF system, 1080p HD video recording, and built-in WiFi. The camera is also dust and splash proof, making it ideal for both amateur and professional photographers alike. I am selling this camera due to upgrading to a newer model, and it is in perfect condition.

Explain to me the difference between nuclear fission and fusion.

>Nuclear fission is a process in which a large atom is split into smaller atoms, releasing energy in the form of heat and radiation. Nuclear fusion is a process in which two or more small atoms are combined to form a larger one, releasing energy in the form of heat and light. Fission occurs naturally in some types of radioactive decay, while fusion can occur when certain elements combine in a controlled reaction. The primary difference between these processes is the type of energy released. Fission releases kinetic energy, while fusion releases less energy but more light. Fission also produces more neutrons, which can be used for power production, while fusion produces fewer neutrons and is not as useful for power generation.

Give me a list of 5 science fiction books I should read next.

>1.Dune by Frank Herbert - This classic science fiction novel tells the story of Paul Atreides, whose family is betrayed and given control of the desert planet Arrakis. As he matures, Paul must face dangerous political intrigues and physical threats as he struggles to save his people.
>
>2.The Hitchhiker's Guide to the Galaxy by Douglas Adams - This comedic sci-fi novel follows Arthur Dent, who is told Earth will be destroyed to make way for an intergalactic bypass and escapes with his alien friend Ford Prefect. They travel together through space, encountering strange creatures and learning about the universe.
>
>3.The Foundation Trilogy by Isaac Asimov - This series follows the life of psychohistorian Hari Seldon and his efforts to create a foundation based on psychology and mathematics to preserve human knowledge after the fall of the Empire. It includes the novels Foundation, Foundation and Empire, and Second Foundation.
>
>4.The Martian Chronicles by Ray Bradbury - This collection of stories follows the colonization of Mars by humans, their interactions with the native Martians, and their eventual departure from the planet. It examines themes such as technology, religion, and the human spirit.
>
>5.Ender’s Game by Orson Scott Card - This novel follows Andrew “Ender” Wiggin, a gifted young boy recruited by the military to train as a leader of an army of genetically-engineered children. He must use his intelligence and empathy to fight a war against an alien race.

70

__Maximum__ t1_jdkdtp2 wrote

ClosedAI is feeding off of our data. If we start using/supporting Open Assistant instead, it will beat chatgpt in a month or two.

54

master3243 t1_jdlhj77 wrote

Knowing how a lot of text data from Reddit comments ends up in these huge text datasets only for them to make it completely closed source rubs me the wrong way.

12

visarga t1_jdlo8hl wrote

Closed source on the generation end, but even more open than open source on the usage end. LLMs lift the open source idea to the next level.

2

wywywywy t1_jdm0xwo wrote

/r/OpenAssistant

https://open-assistant.io

9

plottwist1 t1_jdlj5r8 wrote

How open are they? I mean having open models is an improvment, but the training methods should be open too. And if we croud source data that should be accessible too.

6

kromem t1_jdkfj5w wrote

> The model underlying Dolly only has 6 billion parameters, compared to 175 billion in GPT-3, and is two years old, making it particularly surprising that it works so well. This suggests that much of the qualitative gains in state-of-the-art models like ChatGPT may owe to focused corpuses of instruction-following training data, rather than larger or better-tuned base models.

The exciting thing here is the idea that progress in language models is partially contagious backwards to earlier ones by using newer models to generate the data to update older ones not in pre-training but in fine tuning (and I expect, based on recent research into in context learning, this would extend into additional few shot prompting).

I'm increasingly wondering if we'll see LLMs develop into rolling releases, particularly in the public sector. Possibly with emphasis on curating the data set for fine tuning with a platform agnostic stance towards the underlying pre-trained model powering it.

In any case, it looks more and more like the AI war between large firms will trickle down into open alternatives whether they'd like it to or not.

38

WarAndGeese t1_jdl5aq6 wrote

That would be pretty nuts and pretty cool. It's still a weird concept, but if it becomes like an operating system that you update, that would be a thing.

9

visarga t1_jdlonpq wrote

One way to speed this up is to make an extension for voluntary contributions of LLM interactions to open source. A user decides when a chat deserves to be donated to open source and pushes a button to share. I don't think OpenAI can object to users donating their data.

7

SDRealist t1_jdmdwkl wrote

Users could certainly donate their questions, but I believe the TOS for ChatGPT forbid using the generated output to train competing models (at least for commercial purposes).

8

master3243 t1_jdlhb8c wrote

I have a theory that the main reason OpenAI decided to start keeping it's training and architectural details private is because through minor modification in training data and data augmentation they were able to gain significant improvements in the qualitative output of GPT.

Thus any competitor could replicate the pipeline with ease and reproduce the improvements, so they decided to keep it as a trade secret.

Glad more research like this is being done and shared to the rest of the community.

29

visarga t1_jdlp21i wrote

The combined effect of knowing what is possible and pressure to develop an alternative means replication effort will be huge.

9

ZetaReticullan t1_jdjrecp wrote

What a time to be alive! jointly terrifying and exciting!

18

visarga t1_jdloqee wrote

Most of our pre-2020 NLP skills are worthless now, what required bespoke models and datasets is just another emergent LLM ability. It's like a new starting line and we don't know what human skills will be valuable in the future.

15

sdmat t1_jdm0pmi wrote

> It's like a new starting line and we don't know what human skills will be valuable in the future.

With each passing day, the creature stirs, growing hungrier and more restless. The ground trembles beneath our feet, but we dismiss the warning signs.

Text above naturally written by GPT4.

Maybe we should start flipping the assumption - why would you want a human if inexpensive and dependable AI competence is the default?

5

ginger_beer_m t1_jdm6xfe wrote

This will kill so many smaller startups that do bespoke fine-tuned models as their core business.

5

big_ol_tender t1_jdjcfc8 wrote

The alpaca dataset has a no commercial license so idk what they are doing.. I’ve asked Stanford to change it but heard nothing back

13

Colecoman1982 t1_jdjkgjy wrote

When you asked, did you clarify that you were asking about the training data versus the whole project? The final Alpaca project was built, in part, on top of Meta's LLaMa. Since LLaMa has a strictly non-commercial license, there is no way that Stanford can ever release their final project for commercial use (as they've already stated in their initial release of the project). On the other hand, any training data they've created on their own (without needing any code from LLaMa) should be within their power to re-license. If they think you are asking for the whole project to be re-licenced, they are likely to just ignore your request.

23

MjrK t1_jdjqz9h wrote

> We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.

https://crfm.stanford.edu/2023/03/13/alpaca.html

22

Esquyvren t1_jdjsw1j wrote

They said it wasn’t ready but deployed it anyways… lol

4

MjrK t1_jdk4ig1 wrote

For demonstration and research, not widely nor generally.

9

Disastrous_Elk_6375 t1_jdlix6j wrote

The demo was up for a couple of days. The first hours of it being online were rough (80-200 people in queue). It got better the following day, and better still the 3'rd day. I believe they removed the demo ~1week later. IMO they've proven a point - the demo was extremely impressive for a 7b model.

1

big_ol_tender t1_jdjl1wx wrote

I opened an issue on GitHub specifically about the data license and linked to the data bricks release :)

10

danielbln t1_jdjt8zh wrote

Why has no one regenerated the training set? With gpt3.5 that's like 50 bucks. I can be the change I want to see in the world, but am I missing something?

8

mxby7e t1_jdjzkzy wrote

The use of OpenAI’s models for generating competing models violates the term of use, which is why the Stanford dataset is restricted.

17

__Maximum__ t1_jdkepie wrote

Also, it's very shady for a company called OpenAI. They claimed they became for profit because they needed the money to grow, but these restrictions just show that they are filthy liars and only care about keeping the power and making profit. I'm sure they already have a strategy going around that 30B cap, just like they planned stealing money and talent by calling themselves non-profit first.

17

throwaway2676 t1_jdl0y80 wrote

Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI

8

mxby7e t1_jdl18t6 wrote

Maybe, open assistant by Stability.ai is doing this type of manual dataset collection. The training data and the model weights are supposed to be released once training is complete

11

WarAndGeese t1_jdl5t0z wrote

Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?

7

visarga t1_jdlpae7 wrote

OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.

2

lexcess t1_jdlj8tf wrote

Classy, especially when they are breezing past any copyright of the datasets they are training off of. I wonder if they can legally enforce that without creating a potentially bad precedent for themselves. Or if it could be worked around if the training was indirect through something like Alpaca.

3

ebolathrowawayy t1_jdnc05i wrote

But what if you're training a model for a narrow use-case and don't intend for anyone to use it except for a niche set of users? Is that enough to be in the clear? Or is any use of OpenAI's model output to train a model for any purpose a no-no?

1

mxby7e t1_jdncs51 wrote

From my understanding its limited to no commercial use, so you can use it for what you need, but not commercially.

1

mxby7e t1_jdktvqr wrote

The license won’t change. The dataset was collected in a way that violates the term of service of OpenAI, which they used to generate the data. If they allowed commercial use it would open them up to lawsuit.

8

visarga t1_jdlpf0h wrote

What about data generated from Alpaca, is that unrestricted?

1

impossiblefork t1_jdlddlt wrote

Model weights though, are, I assume, not copyrightable.

Is there actually a law giving Stanford any special rights to the weights?

1

Educational_Ice151 t1_jdl47lq wrote

Hello Dolly. This look pretty interesting. I have been playing with creating cross model feedback loops that iterate for several cycles using few shot prompts and chain of thought models. This would work really well for my concept. I’ll likely publish my code in a day or two.

Shared to r/aipromptprogramming

10

hangtime79 t1_jdkrpft wrote

The Alpaca dataset DB used to train this model absolutely cannot be used for commercial purposes. It uses the Creative Commons Attribution-NonCommercial 4.0 International Public License.

https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE

7

biggieshiba t1_jdojnn6 wrote

I don't understand why anyone would care, in a few years half the internet will be ai generated. If someone uses GPT-4 to generate a sentence posted on Wikipedia how will you know before using it ? Don't you think many models will use that sentence?

Plus, how will they know, training data is not easy to extract from a model. Except if you are a direct OpenAI competitor they won't ever care or even look at you (well maybe their superAI will).

Lastly the dataset is full of errors, better generate again or even pay people would be quite cheap for 50k examples. This is quite a bad dataset when you really look at it, empty inputs or outputs, unclear instructions, instructions not fit for model... The fact that it is bad and small is very encouraging BTW since it performs pretty well.

2

dreamingleo12 t1_jdl3qgp wrote

It’s just a shameless copy of Stanford’s work. The innovative thing about Stanford Alpaca is it makes a ChatGPT style assistant with a language model, Meta LLaMA, and the cost is low. Databricks just followed Stanford’s approach and uses a different base model and claims it’s a big innovation. Alpaca actually can be fine-tuned with the same dataset in 3 hours and performs better than Databricks’ model.

4

Disastrous_Elk_6375 t1_jdlj4rn wrote

> and uses a different base model and claims it’s a big innovation

Huh? My read of their blog was that they wanted to highlight the fact that you can fine-tune a ~2yo LLM and still get decent results. I don't think they've claimed this is innovative, or that the innovation is theirs to boast...

I've played with GPT-neo (non X) and GPT-J when they were released, and the results were rough. You had to do a ton of prompt engineering work and exploration to find useful cases. This shows that even smaller, older models can be fine-tuned with the method proposed in Alpaca.

14

SeymourBits t1_jdlkln7 wrote

I second this. I was able to extract fairly useful results from Neo but it took a huge amount of prompt trial and error, eventually getting decent/stable results but not in the same ballpark as GPT3+. The dolly training results here seem good, if not expected. I'm now ready to move to a superior model like LLaMA/Alpaca though. What are you running?

7

dreamingleo12 t1_jdll44j wrote

I’ve been experimenting with Alpaca and able to fine-tune it using the dataset provided in 40 minutes with 8 A100s, spot instances. It actually works well.

3

Daveboi7 t1_jdm8aby wrote

What platform are you using for training?

2

dreamingleo12 t1_jdn511a wrote

By platform you mean?

2

Daveboi7 t1_jdnczd9 wrote

My bad. Did you train the model locally on your PC or using cloud?

1

dreamingleo12 t1_jdndszl wrote

I trained the model using cloud

2

Daveboi7 t1_jdndvq0 wrote

With databricks?

1

dreamingleo12 t1_jdndzmt wrote

No I don’t use databricks. I only tried LLaMA and Alpaca.

1

Daveboi7 t1_jdnedrd wrote

But which cloud service did you use to train them?

I tried using databricks to train a model but the setup was too complicated.

I’m wondering is there a more straightforward platform to train on?

1

dreamingleo12 t1_jdnel6b wrote

You can just follow Stanford Alpaca’s github instructions, as long as you have LLaMA weights. It’s straightforward.

2

Daveboi7 t1_jdneqdx wrote

Ah. I’m trying to train the Dolly model created developed databricks.

1

dreamingleo12 t1_jdnewt2 wrote

It’s just Alpaca with a different base model. Databricks boasted too much.

1

Daveboi7 t1_jdnf18o wrote

Yeah but the comparisons I have seen between Dolly and Alpaca look totally different.

Somehow the Dolly answers look much better imo

Edit: spelling

1

dreamingleo12 t1_jdnf4qn wrote

I don’t trust DB’s results tbh. LLaMA is a better model than GPT-J.

2

Daveboi7 t1_jdnf96e wrote

Somebody posted results on Twitter, they looked pretty good. I don’t think he worked for DB either. But who knows really

1

dreamingleo12 t1_jdlkbxl wrote

WSJ:

“Databricks Launches ‘Dolly,’ Another ChatGPT Rival The data-management startup introduced an open-source language model for developers to build their own AI-powered chatbot apps” (Apparently DB paid them)

DB’s blog:

“Democratizing the magic of ChatGPT with open models”

Introduced? ChatGPT rival? Didn’t you just follow Stanford’s approach? You used Stanford’s dataset which was generated by GPT right? huh? This is Stanford’s achievement not DB’s. DB went too far on marketing.

1

Disastrous_Elk_6375 t1_jdllii0 wrote

> https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

This is the blog post that I've read. I can't comment on the WSJ article, and your original message implied a bunch of things that, IMO, were not found in the blog post. If you don't like the WSJ angle, your grief should be with them, not databricks. shrug

From the actual blog:

> We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data.

> Acknowledgments > > This work owes much to the efforts and insights of many incredible organizations. This would have been impossible without EleutherAI open sourcing and training GPT-J. We are inspired by the incredible ideas and data from the Stanford Center for Research on Foundation Models and specifically the team behind Alpaca. The core idea behind the outsized power of small dataset is thanks to the original paper on Self-Instruct. We are also thankful to Hugging Face for hosting, open sourcing, and maintaining countless models and libraries; their contribution to the state of the art cannot be overstated.

More to the point of your original message, I searched for "innovative" "innovation" "inovate" and found 0 results in the blog post. I stand by my initial take, the blog post was fair, informative and pretty transparent in what they've done, how, and why.

7

dreamingleo12 t1_jdllxww wrote

Well if you ever worked with marketing or communication teams you would’ve known that DB co-authored the WSJ article. My point is that the democratization is an achievement of the Stanford Alpaca team, not DB. DB marketed it like they did the major work which is untrue.

−6

Disastrous_Elk_6375 t1_jdlm6qd wrote

That's fair. But you commented out of context, on a post that linked to the blog and not the WSJ article. That's on you.

6

dreamingleo12 t1_jdlmhcq wrote

Well if you have connections you would’ve seen they made a good amount of posts.

−6

gamerx88 t1_jdmndip wrote

Food for thought. Is this really surprising considering that the InstructGPT paper in early 2022, already showed how even a 1.3B model after RLHF could beat a much larger 175B model?

I guess what this shows is that it's the data that matters rather than SFT vs RLHF. Wondering if any ablation studies have been done here.

2

SatoshiNotMe t1_jdpgj80 wrote

I hope this is not closely tied to the Databricks ecosystem (i.e. their notebooks, spark clusters etc). Running things in DB notebooks is not a pleasant experience.

1

SatoshiNotMe t1_jdpgrat wrote

Looking at the repo, well, it does looks like we need to run this in a DB notebook.

1

SatoshiNotMe t1_jdtemml wrote

So if the notebook is tuning on a fixed dataset, anyone running it will arrive at the same weights after an expensive compute, which seems wasteful. Why not just share the weights, I.e the final trained + tuned model ? Or is that already available?

1

matterhayes t1_jeacmx0 wrote

1

SatoshiNotMe t1_jealb7d wrote

Is there a "nice" way to use this model, (say, via the command-line like in the GPT4All or alpaca.cpp repos), rather than in a databricks notebook or in HG spaces? For example I'd like to chat with it on my M1 MacBook Pro. Any pointers appreciated!

1