Submitted by liyanjia92 t3_120csub in MachineLearning
hey folks, happy Friday! I wish to get some feedback for my recent project of a minimum example of using RLHF on language models to improve human alignment.
The goal is to compare with vanilla GPT-2 and supervised fine-tuned GPT-2 to see how much RLHF can benefit small models. Also I hope this project can show an example of the minimum requirements to build a RLHF training pipeline for LLMs.
Github: https://github.com/ethanyanjiali/minChatGPT Demo: https://colab.research.google.com/drive/1LR1sbWTyaNAmTZ1g1M2tpmU_pFw1lyEX?usp=sharing
Thanks a lot for any suggestions and feedback!
G_fucking_G t1_jdifa1c wrote
Very interesting.
Quick question. How long does training take? For:
I saw you used one 3090Ti, so was it done in hours/days/weeks?