Submitted by MysteryInc152 t3_11ckc8a in singularity
Facts_About_Cats t1_ja5q7ce wrote
What does the structure language have to do with the folding shapes if proteins?
MysteryInc152 OP t1_ja5rsxd wrote
It shouldn't as you understand it and that's why this is pretty huge. Whatever LLMs are learning during training is proving more and more to be the real deal.
throwaway_890i t1_jab66dd wrote
Isn't this just the same kind of neural network that has been solving this kind of problem long before LLMs?
MysteryInc152 OP t1_jab6uae wrote
Definitely not, no. This is the first time a language model is used to tackle this
dwarfarchist9001 t1_ja6cfn4 wrote
This paper actually skips the folding step entirely. The AI was trained a list of protein amino acid sequences that were labeled with their purpose. Then they had it predict new amino acid sequences to fulfill the same purposes. Finally they actually made the proteins the model suggested and the proteins worked with quite high levels of efficiency.
The most interesting part to me is that some of the proteins suggested by model worked despite having little similarity to the proteins in the training data, as low 31.4% in one case. This suggests to me the model has caught on to some thus far unknown rules underlying the relationship between the sequences and functions of proteins.
blueSGL t1_ja6pgm2 wrote
Listening to Neel Nanda talk about how models form structures to solve common problems presenting in training, no wonder they are able to pick up on patterns better than humans, that's what they are designed for.
and I believe that training models with no intention of running them purely to see what if any hidden underlying structures humanity has collectively missed is called something like 'microscope AI '
RabidHexley t1_jaa3go2 wrote
> purely to see what if any hidden underlying structures humanity has collectively missed
This is one of the things I feel has real potential even for "narrow" AI as far as expanding human knowledge. Something may very well be within the scope of known human science without humans ever realizing it. If you represented all human knowledge as a sphere it'd probably have a composition as porous as a sponge.
AI doesn't necessarily need to be able to reason "beyond" current human understanding to expand upon known science, but simply make connections we're unable to see.
Facts_About_Cats t1_ja8q9at wrote
There is no reason why the physical structure of proteins should in any way resemble or be related to the structure and grammar of the associations and relationships between words.
Jcat49er t1_ja96hy5 wrote
That’s the problem though. According to the results of this and other papers, there is a still unknown relationship between proteins that AIs are able to recognize and manipulate. It just happens that the way AI find the patterns in human language can also be used to find the structure of proteins.
diabeetis t1_jac6a4k wrote
I don't see why it shouldn't. It abstracts meaning from the relationships in the data, whether it's language or sequences
turnip_burrito t1_ja6mrbg wrote
Spooky model magic.
hackinthebochs t1_ja6uapk wrote
Any structured data is a language in a broad sense. Tokens identify structural units and the grammar determine how these structural units interrelate. But the grammar can be arbitrarily complex and so can encode deep relationships among data in any domain. This is why "language models" are so powerful in a vast array of contexts.
Viewing a single comment thread. View all comments