saintshing
saintshing t1_je9okpn wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Apparently some people managed to reconstruct images from brain activitiy using stable diffusion technique. I wonder how it would apply to animals.
saintshing t1_je9iw85 wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.
>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560
saintshing t1_je9fciu wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.
I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.
They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.
example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2
https://python.langchain.com/en/latest/getting_started/getting_started.html
https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/
https://twitter.com/yoheinakajima/status/1640934493489070080
saintshing t1_je9e5q1 wrote
Reply to comment by silva_p in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Natural language processing for cats and dogs
saintshing t1_jdgwgt7 wrote
Reply to comment by Icko_ in [P] Open-source GPT4 & LangChain Chatbot for large PDF docs by radi-cho
I heard of people talking about using ANNOY for approximate nearest neighbor search. How is ANNOY compared to pinecone and faiss? Are pinecone and faiss self-hostable?
saintshing t1_jcjc3zs wrote
Reply to comment by currentscurrents in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
stolen from vitalik
>70 years is the time between the first computer and modern smart watches.
>70 years is more than the time between the first heavier-than-air flight and landing on the moon.
>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.
saintshing t1_jc26rk0 wrote
Reply to Which topic in deep learning do you think will become relevant or popular in the future? by gokulPRO
I feel like it should be possible to extend diffusion transformer technique to code generation for web development.
You can input a screenshot of a static webpage, then use a text prompt like 'Change the style to fit a futuristic theme', or just input a low fidelity UI wireframe and it can generate a detailed webpage with the html and css. We can get training data from the internet for self supervised learning.
Also retrieval transformers or models that know how to query APIs, databases and prompt other models.
saintshing t1_jbdbgwy wrote
Reply to comment by hcarlens in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Is pytorch also better than TF for usecases where I have to do training/inference on mobile?
saintshing t1_j2coo4k wrote
Reply to TIL Among the 50 million employed college graduates ages 25 to 64 in 2019, 37% reported a bachelor’s degree in science or engineering but only 14% worked in a STEM occupation by Fit_Pangolin_8271
Is this talking about their CURRENT jobs? Are finance and accounting considered STEM? They still use math.
I can't access the census website but according to this article which quotes the same stat, science teachers, health care workers, and management are not STEM occupations.
saintshing t1_iyuxcdi wrote
Reply to comment by PotentiallyAPickle in SteamDB: JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB. by Leinstay
Paradox games are probably up there with Sims
saintshing t1_iyuankf wrote
Reply to comment by ogm4reborn in SteamDB: JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB. by Leinstay
Do we count cosmetic microtransactions?
saintshing t1_ixtqpai wrote
How does someone become a photographer being known for large scale nude shoots?
saintshing t1_jeaowjz wrote
Reply to comment by A_Light_Spark in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
I almost missed it too. There are too many new results.
The most crazy thing is it is all done by one person when the big techs all work on transformer models.