CriticalTemperature1 t1_jdyubo2 wrote on March 28, 2023 at 4:40 AM

Reply to [D] FOMO on the rapid pace of LLMs by 00001746

Unfortunately the nature of this field is "the bitter lesson", scale trumps everything in machine learning so unfortunately/fortunately we are getting interested in language models when the scale is so large that it is impossible to make in impact on them unless you own your own $xxM company.

However, there are several interesting research avenues you can take:

Improve small models with RLHF + fast implementations for a specific task (e.g. llama.cpp)
Chaining models together with APIs to solve a real human problem
Adding multimodal inputs to smaller LLMs
Building platforms to make it easy to train and serve LLMs for many use cases
Analyzing prompts and understanding how to make the most of the biggest LLMs

CriticalTemperature1 t1_j7fzacn wrote on February 6, 2023 at 3:03 PM

Reply to [D] List of Large Language Models to play with. by sinavski

Google has their AI Test Kitchen for LaMDA

CriticalTemperature1 t1_j6g6xv5 wrote on January 30, 2023 at 2:29 AM

Reply to [D] AI Theory - Signal Processing? by a_khalid1999

The S4 Transformer uses structured state spaces which is a concept from EE that models the hidden state with differential equations. Seems to have SOTA results on a lot of tasks

CriticalTemperature1 t1_j29hrai wrote on December 30, 2022 at 4:54 PM

Reply to [D] NLP/NLU Research Opportunities which don't require much compute by WobblySilicon

Take a look at this paper. The authors pursued a similar approach to the one you mentioned:

https://arxiv.org/abs/2212.14034 (Cramming: Training a Language Model on a Single GPU in One Day)

>Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day?
>
>We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.

Although its not text to video, you can probably apply similar approaches to vision transformers, diffusion models, etc

CriticalTemperature1 t1_j1vrfz6 wrote on December 27, 2022 at 7:38 PM

Reply to [P] I built a CLI helper integrating with GPT-3. It enables you to ask questions straight in your terminal by maktattengil

This is a great idea! Maybe in future versions, when the user gets an error in the terminal with a command you can suggest the correct command with GPT-3

CriticalTemperature1 t1_j1nq72n wrote on December 25, 2022 at 10:41 PM

Reply to comment by blablanonymous in [D] The case for deep learning for tabular data by dhruvnigam93

The authors trained a transformer that takes a tabular data set and can learn SoTA embeddings on categorical data in under a second

CriticalTemperature1 t1_j1nf8g1 wrote on December 25, 2022 at 9:11 PM

Reply to [D] The case for deep learning for tabular data by dhruvnigam93

Take a look at TabPFN, which uses meta-learned networks for tabular data prediction: https://www.automl.org/tabpfn-a-transformer-that-solves-small-tabular-classification-problems-in-a-second/

CriticalTemperature1 t1_j1g44a8 wrote on December 24, 2022 at 1:57 AM

Reply to [D] Has anyone integrated ChatGPT with scientific papers? by justrandomtourist

You can finetune gpt-3, but it will cost you a few dollars. I've found good success just copying the text of a paper into chatGPT and asking for a summary that a fifth grader can understand.

Another way is to just input the titles of relevant papers and ask it for more suggestions, or ask it for the most influential papers in topic X

CriticalTemperature1 t1_j0l6du5 wrote on December 17, 2022 at 2:24 PM

Reply to [D] ChatGPT, crowdsourcing and similar examples by mvujas

Most people aren't labelling outputs as good or bad so how do they get any reward or training signals from these beta users

CriticalTemperature1 t1_izcwu1m wrote on December 8, 2022 at 4:22 AM

Reply to [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

Is it worth even reading old papers? These are meant to communicate with researchers at the time the paper was published so its probably more efficient to learn these concepts and techniques from AI textbooks and blogs