athos45678 t1_je82thk wrote on March 30, 2023 at 2:15 AM

Reply to comment by netham91 in [D] The best way to train an LLM on company data by jaxolingo

So as far as set up goes, you just need to: “”” Git clone https://github.com/lxe/simple-llama-finetuner Cd simple-llama-finetuner Pip install -r requirements.txt Python app.py ## if you’re on a remote machine (Paperspace is my go to) then you may need to edit the last line of this script to set ‘share=True’ in the launch args “””

Then you should get a link for the gradio web app. Copy and paste the code samples in the format described before in the input text box. It will look something like this:

“”” Write a code snippet that sorts a function Def sort(arr):

  Return arr.sorted()

Some other code snippet input

Some answer

Etc. “””

Edit: I’m drinking with friends sorry i can’t format better. Single line break between prompt and observed correct response, double line break between prompt instances.

athos45678 t1_je7ercw wrote on March 29, 2023 at 11:13 PM

Reply to [D] The best way to train an LLM on company data by jaxolingo

Train a Llama LoRa model. The 30 b model isn’t too expensive to tune (40 bucks ish), and is ridiculously capable.

You just need to format the data in a long text doc with each prompt separated by two line breaks. I found it worked best in the alpaca style, where you have a single line break after the prompt, like “write a function that sorts this table in python def sort():” followed by the written out code, and then the double line break to signal the start of the next input.

Then use the simple-llama trainer app to make it all easy.

athos45678 t1_j9lz58x wrote on February 22, 2023 at 10:50 PM

Reply to comment by iidealized in [P] MIT Introduction to Data-Centric AI by anishathalye

And those of us who taught ourselves need it even more. Love me some open source learning

athos45678 t1_j8zewjb wrote on February 18, 2023 at 1:26 AM

Reply to comment by kau_mad in [N] Google is increasing the price of every Colab Pro tier by 10X! Pro is 95 Euro and Pro+ is 433 Euro per month! Without notifying users! by FreePenalties

It’s 29 cents a gig per month over the storage limit, and i rarely go over the storage limit if i am carefully managing files. Definitely the biggest drawback though. You can always just use wkentaro’s gdrive package to pull from google drive as well

athos45678 t1_j8xjcfb wrote on February 17, 2023 at 5:40 PM

Reply to comment by Sid_b23692 in [N] Google is increasing the price of every Colab Pro tier by 10X! Pro is 95 Euro and Pro+ is 433 Euro per month! Without notifying users! by FreePenalties

Storage depends on your plan, but any overage is .02 usd per gb/month and the max is 10 TB.

Drive mounting isn’t exactly there, but you can pull any file with wkentaro’s gdown easily enough.

athos45678 t1_j8xj0gi wrote on February 17, 2023 at 5:38 PM

Reply to [N] Google is increasing the price of every Colab Pro tier by 10X! Pro is 95 Euro and Pro+ is 433 Euro per month! Without notifying users! by FreePenalties

Paperspace, lads. Paperspace is where it’s at. except for the storage limitations, my experience there is so much better than colab