saintshing t1_jeaowjz wrote on March 30, 2023 at 5:16 PM

Reply to comment by A_Light_Spark in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

I almost missed it too. There are too many new results.

The most crazy thing is it is all done by one person when the big techs all work on transformer models.

saintshing t1_je9okpn wrote on March 30, 2023 at 1:03 PM

Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Apparently some people managed to reconstruct images from brain activitiy using stable diffusion technique. I wonder how it would apply to animals.

https://twitter.com/blader/status/1631543565305405443

saintshing t1_je9iw85 wrote on March 30, 2023 at 12:14 PM

Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.

>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560

saintshing t1_je9fciu wrote on March 30, 2023 at 11:39 AM

Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.

I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.

They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.

example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2

https://python.langchain.com/en/latest/getting_started/getting_started.html

https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/

https://twitter.com/yoheinakajima/status/1640934493489070080

saintshing t1_je9e5q1 wrote on March 30, 2023 at 11:27 AM

Reply to comment by silva_p in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

Natural language processing for cats and dogs

saintshing t1_jdgwgt7 wrote on March 24, 2023 at 9:15 AM

Reply to comment by Icko_ in [P] Open-source GPT4 & LangChain Chatbot for large PDF docs by radi-cho

I heard of people talking about using ANNOY for approximate nearest neighbor search. How is ANNOY compared to pinecone and faiss? Are pinecone and faiss self-hostable?

saintshing t1_jcjc3zs wrote on March 17, 2023 at 6:41 AM

Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

saintshing t1_jc26rk0 wrote on March 13, 2023 at 2:36 PM

Reply to Which topic in deep learning do you think will become relevant or popular in the future? by gokulPRO

I feel like it should be possible to extend diffusion transformer technique to code generation for web development.

You can input a screenshot of a static webpage, then use a text prompt like 'Change the style to fit a futuristic theme', or just input a low fidelity UI wireframe and it can generate a detailed webpage with the html and css. We can get training data from the internet for self supervised learning.

Also retrieval transformers or models that know how to query APIs, databases and prompt other models.

saintshing t1_jbdbgwy wrote on March 8, 2023 at 5:32 AM

Reply to comment by hcarlens in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Is pytorch also better than TF for usecases where I have to do training/inference on mobile?

saintshing t1_j2coo4k wrote on December 31, 2022 at 6:55 AM

Reply to TIL Among the 50 million employed college graduates ages 25 to 64 in 2019, 37% reported a bachelor’s degree in science or engineering but only 14% worked in a STEM occupation by Fit_Pangolin_8271

Is this talking about their CURRENT jobs? Are finance and accounting considered STEM? They still use math.

I can't access the census website but according to this article which quotes the same stat, science teachers, health care workers, and management are not STEM occupations.

saintshing t1_iyuxcdi wrote on December 4, 2022 at 10:14 AM

Reply to comment by PotentiallyAPickle in SteamDB: JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB. by Leinstay

Paradox games are probably up there with Sims

saintshing t1_iyuankf wrote on December 4, 2022 at 5:18 AM

Reply to comment by ogm4reborn in SteamDB: JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB. by Leinstay

Do we count cosmetic microtransactions?

saintshing t1_ixtqpai wrote on November 26, 2022 at 7:26 AM

Reply to Bondi Beach goes nude as thousands strip off for art project by UnusualSoup

How does someone become a photographer being known for large scale nude shoots?

saintshing t1_iucjd4f wrote on October 30, 2022 at 8:17 AM

Reply to comment by Kukuxupunku in The Discovery that Lemons Cure Scurvy Caused the Formation of the Sicilian Mafia by agreea

Canned orange juice https://americanhomefront.wunc.org/2018-11-26/a-frantic-effort-to-nourish-wwii-troops-led-to-this-common-breakfast-staple