Submitted by ShakeNBakeGibson t3_10wblpv in IAmA
SandwichNo5059 t1_j7mb56j wrote
What steps do you take for controlling for batch variability?
How far do you think you’re from novel chemical matter rather than drug repurposing trials?
IHaque_Recursion t1_j7mjumw wrote
Batch effects are probably the most annoying part about doing machine learning in biology – if you’re not careful, ML methods will preferentially learn batch signal rather than the “real” biological signal you want.
We actually put out a dataset, RxRx1, back in 2019, to address this question. You can check this here.Here is some of what we learned (ourselves, and via the crowdsourced answers we got on Kaggle).
Handling batch effects takes a combination of physical and computational processes. To answer at a high level:
- We’ve carefully engineered and automated our lab to minimize experimental variability (you’d be surprised how clearly the pipetting patterns of different scientists can come out in the data – which is why we automate).
- We’ve scaled our lab, so that we can afford to ($ and time!) collect multiple replicates of each data point. This can be at multiple levels of replication – exactly the same system, different batches of cells, different CRISPR guides targeting the same gene, etc. – which enables us to characterize different sources of variation. Our phenomics platform can do up to 2.2 million experiments per week!
- We’ve both applied known computational methods and built custom ML methods to control / exclude batch variability. Papers currently under review!
Viewing a single comment thread. View all comments