dwarfarchist9001 t1_ja6cfn4 wrote on February 27, 2023 at 4:08 AM

Reply to comment by Facts_About_Cats in Large language models generate functional protein sequences across diverse families by MysteryInc152

This paper actually skips the folding step entirely. The AI was trained a list of protein amino acid sequences that were labeled with their purpose. Then they had it predict new amino acid sequences to fulfill the same purposes. Finally they actually made the proteins the model suggested and the proteins worked with quite high levels of efficiency.

The most interesting part to me is that some of the proteins suggested by model worked despite having little similarity to the proteins in the training data, as low 31.4% in one case. This suggests to me the model has caught on to some thus far unknown rules underlying the relationship between the sequences and functions of proteins.

blueSGL t1_ja6pgm2 wrote on February 27, 2023 at 6:20 AM

Listening to Neel Nanda talk about how models form structures to solve common problems presenting in training, no wonder they are able to pick up on patterns better than humans, that's what they are designed for.

and I believe that training models with no intention of running them purely to see what if any hidden underlying structures humanity has collectively missed is called something like 'microscope AI '

RabidHexley t1_jaa3go2 wrote on February 27, 2023 at 11:13 PM

> purely to see what if any hidden underlying structures humanity has collectively missed

This is one of the things I feel has real potential even for "narrow" AI as far as expanding human knowledge. Something may very well be within the scope of known human science without humans ever realizing it. If you represented all human knowledge as a sphere it'd probably have a composition as porous as a sponge.

AI doesn't necessarily need to be able to reason "beyond" current human understanding to expand upon known science, but simply make connections we're unable to see.

Facts_About_Cats t1_ja8q9at wrote on February 27, 2023 at 5:58 PM

There is no reason why the physical structure of proteins should in any way resemble or be related to the structure and grammar of the associations and relationships between words.

Jcat49er t1_ja96hy5 wrote on February 27, 2023 at 7:40 PM

That’s the problem though. According to the results of this and other papers, there is a still unknown relationship between proteins that AIs are able to recognize and manipulate. It just happens that the way AI find the patterns in human language can also be used to find the structure of proteins.

diabeetis t1_jac6a4k wrote on February 28, 2023 at 11:20 AM

I don't see why it shouldn't. It abstracts meaning from the relationships in the data, whether it's language or sequences

turnip_burrito t1_ja6mrbg wrote on February 27, 2023 at 5:50 AM

Spooky model magic.