saintshing

saintshing t1_je9iw85 wrote

Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.

>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560

4

saintshing t1_je9fciu wrote

> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.

I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.

They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.

example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2

https://python.langchain.com/en/latest/getting_started/getting_started.html

https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/

https://twitter.com/yoheinakajima/status/1640934493489070080

7

saintshing t1_jcjc3zs wrote

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

2

saintshing t1_jc26rk0 wrote

I feel like it should be possible to extend diffusion transformer technique to code generation for web development.

You can input a screenshot of a static webpage, then use a text prompt like 'Change the style to fit a futuristic theme', or just input a low fidelity UI wireframe and it can generate a detailed webpage with the html and css. We can get training data from the internet for self supervised learning.

Also retrieval transformers or models that know how to query APIs, databases and prompt other models.

2

saintshing t1_j2coo4k wrote

Is this talking about their CURRENT jobs? Are finance and accounting considered STEM? They still use math.

I can't access the census website but according to this article which quotes the same stat, science teachers, health care workers, and management are not STEM occupations.

22