Comments

You must log in or register to comment.

Sulstice2 OP t1_iy2e0kf wrote

Mobile Friendly Demo:

https://sulstice.github.io/Faith/enamine_database/index.html

Motivation:

I am prepare for PyData Global 2022 and want to present the utility of a software we have in house called the Charmm General Force Field (CGenFF) which can explain a molecule's features by giving it a atom type. To demonstrate the power of a new software with python we can process the Enamine Database of 22 Billion compounds through our pipeline to generate a massive set of chemical space.

My pipelines started running and processed the first 1,000,000. I made a map key from the atom type language to a chemist's lamen terms to help describe different atoms. Still mapping out that dictionary to be more robust.

In the Gif I show above, you can see which type of atoms show up more based on the thickness of the line and their connections to others. Some atoms are more diverse and some only bond to one type. Alkynes are rare compared to others but bridged systems are very common as much as aliphatic to me.

Software:

I had to use C++ for the Force Field to process the Enamine DB, Python to do data processing and transformation and d3 for the visualization. I tried something different on setting up the amount of curvature for the arcs between connections and I could start to create this ball in the middle like a flower.

Here is the Data:

https://github.com/Sulstice/Faith/blob/main/enamine_database/atom_type_group_new.json

I wonder what will change as I sample more data and what becomes common.

12

AwakenScience t1_iy5lp9g wrote

This is very interesting! Great work!

1