Sulstice2 OP t1_iy2e0kf wrote on November 28, 2022 at 5:32 AM

Reply to [OC] The Most Common And Rare Atoms and Bonds in the Chemical Universe - Snapshot 1,000,000 Molecules by Sulstice2

Mobile Friendly Demo:

https://sulstice.github.io/Faith/enamine_database/index.html

Motivation:

I am prepare for PyData Global 2022 and want to present the utility of a software we have in house called the Charmm General Force Field (CGenFF) which can explain a molecule's features by giving it a atom type. To demonstrate the power of a new software with python we can process the Enamine Database of 22 Billion compounds through our pipeline to generate a massive set of chemical space.

My pipelines started running and processed the first 1,000,000. I made a map key from the atom type language to a chemist's lamen terms to help describe different atoms. Still mapping out that dictionary to be more robust.

In the Gif I show above, you can see which type of atoms show up more based on the thickness of the line and their connections to others. Some atoms are more diverse and some only bond to one type. Alkynes are rare compared to others but bridged systems are very common as much as aliphatic to me.

Software:

I had to use C++ for the Force Field to process the Enamine DB, Python to do data processing and transformation and d3 for the visualization. I tried something different on setting up the amount of curvature for the arcs between connections and I could start to create this ball in the middle like a flower.

Here is the Data:

https://github.com/Sulstice/Faith/blob/main/enamine_database/atom_type_group_new.json

I wonder what will change as I sample more data and what becomes common.

Sulstice2 OP t1_ixau5zz wrote on November 22, 2022 at 1:35 AM

Reply to comment by Saint_Oliver in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

It's actually for something called van der waals force if you have ever heard of it. It links to my thesis project or will eventually.

Sulstice2 OP t1_ix9psjx wrote on November 21, 2022 at 8:39 PM

Reply to comment by dxhunter3 in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

Yeah sure. I recommend doing the main repo first global-chem. There is a lot of moving parts to get to this plot and some secret software that lives with me until I am ready to release it to get the results.

Sulstice2 OP t1_ix8enbd wrote on November 21, 2022 at 3:24 PM

Reply to comment by Saint_Oliver in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

CHARMM was awarded the Nobel Prize for being The software for Molecular Mechanics.

It isn't however a quantum chemistry package. I use Psi4 for that.

Sulstice2 OP t1_ix8eivb wrote on November 21, 2022 at 3:23 PM

Reply to comment by No-Farm6409 in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

Molecules interact with each other and they prefer to be in a specific orientation or geometry that is the most energetically favorable. Whatever takes the least amount of work. The equation helps us determine that by separating the energy into different components of physical and electronic characteristics.

In a Force Field we start off small with simple molecular systems and then apply it to larger systems in predicting how atoms will move based on their energy.

So for example, the energy interactions and orientations we use for simple alcohols or carboxylic acids can be applied to lipid membranes and simulating them.

Does that make sense?

Sulstice2 OP t1_ix8dgim wrote on November 21, 2022 at 3:16 PM

Reply to comment by AdImmediate7659 in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

I'm getting ready to move into larger datasets (millions to billions) and I think it would look truly wormhole very soon. I'm scared for what it's going to look like.

Sulstice2 OP t1_ix8d1en wrote on November 21, 2022 at 3:13 PM

Reply to comment by logistic_spock in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

I'm glad someone else thinks the same way! When I started this project I was shooting in the dark. Didn't know what would happen.

Sulstice2 OP t1_ix8cymc wrote on November 21, 2022 at 3:12 PM

Reply to comment by Apprehensive-Ad-5009 in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

Haha yeah, there is no connections to the virtual systems. I actually think I am going to change that soon by defining aromatic dummy electrons

Sulstice2 OP t1_ix6ei5p wrote on November 21, 2022 at 2:22 AM

Reply to comment by No-Farm6409 in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

https://en.wikipedia.org/wiki/CHARMM

Look under the section Force Fields for the Potential Energy Function. This predicts the potential energy by iterating through all the atoms and other features of molecules to predict how much energy is available to the system.

Sulstice2 OP t1_ix664u0 wrote on November 21, 2022 at 1:15 AM

Reply to comment by AlchemistJosh in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

Hi Josh,

That's actually a really good idea and I think that would help a lot. I actually mapped out the atom names to something like that already so this would be something I can prepare in my next round before the bigger talk.

Anymore I will gladly accept, data visualization I really want to get this information out to the public in the most efficient manner and it's been a little struggle.

Yeah took me awhile to record all the chemicals. About 2-3 years.

Sulstice2 OP t1_ix5t9v9 wrote on November 20, 2022 at 11:36 PM

Reply to comment by Ciarrai_IRL in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

No, but we have concept called "DUM" which means a virtual particle that has no definition. We use for lots of different stuff.

I do wonder how to catalog quantum particles and link to this though. Could be cool research in 5 years.

Sulstice2 OP t1_ix5t0l5 wrote on November 20, 2022 at 11:35 PM

Reply to comment by fruitydude in [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

There's a belief that the charmm equation for which this language is built on (the nodes) is the equation of simulating life.

I also believe that in the sea of chemical data we can filter data based on how common or useful it is to a particular community.

By connecting the two we can start to map out atom types of relevance to people. We can predict new chemical space based on their atom types.

So like let's say we want to predict a new sunscreen that doesn't harm the environment. We can use these relations to predict something better by know the features of a molecule.

The pseudo part is the belief that it will work.

Sulstice2 OP t1_ix5hq64 wrote on November 20, 2022 at 10:13 PM

Reply to [OC] The Most Common Atoms and Bonds in the Virtual Chemical Universe by Sulstice2

Hello,

Website & Mobile Friendly: https://sulstice.github.io/Faith/global_chem/index.html

I sampled the most commonly recorded chemicals across different sub-communities to understand what are the most common atoms and what together in pairs are the most common. Different communities meaning different classes of chemicals (Cannabis, Things used in Sex Products, Toxic Agents used in War, Food Colour additives, Materials, Cosmetics, Birth Control etc.)

https://github.com/Sulstice/global-chem/blob/development/global_chem/GlobalChem_Dictionary%20(1).pdf

In the chord diagram above, each node is an atom type that exists within the dataset and each link is a bond between the atom type. The thickness of the line correlates to how many of those particular atom types exist together. The Pink correlates to how much two different hydrogens exist and and the Blue represents a hydrogen and carbon. The rest of the plot is colored light grey.

Next what I did is pass them through something called the CHARMM ForceField which has a language where you can declare different types of atoms like an alkane vs an aromatic. If you see the plot I am highlighting HGA1, HGR62, these are methyl hydrogens and benzene hydrogens in our language.

That data is available here, feel free to play around with it:

https://raw.githubusercontent.com/Sulstice/Faith/main/global_chem/atom_type_group_new.json

Still a work a progress as I get it ready for the PyData Global. I think there are some bugs. The code is here:

https://github.com/Sulstice/Faith/blob/main/global_chem/index.html