Hey guys,

I am working the sector of computer science for agriculture research. I deal here with algorithm to monitor crop conditions and try to simulate what yield will be the outcome.

I am focussing on ML based methods, but data in agriculture can be a quite limiting factor. If you have 100k samples from real crop fields, thats a lot! So we are not like ChatGPT, who just used 500bn word samples to train their model.

To overcome the issues of small data + ML, I want to set up an approach that combines ML methods (learning from data) with expert knowledge.

What do I mean by this: E.g. Everybody knows, if you do not water your plant, it will die. Or if there are 90° Celsius, the plant will just burn. This knowledge is partially stored in so called "crop simulation models" designed by agronomy experts and my idea was to use these expert models to generate synthetic yield data and feed this data into the training dataset for the ML models.

For me that will somehow result in an approach of "constrained machine learning" where I want to combine both. However, does some of you have any other idea how ML and expert models could be combined or the knowledge could be injected to ML methods, except via the training dataset?

I am happy to hear your suggestions!

Comments

ndemir t1_j474eox wrote on January 13, 2023 at 5:04 PM

#1,351,869

When I have similar doubt, I ask myself; "forget ML, will statistics help you? Will just defining some rules will help you?" People in that industry already have some kind of idea about how to predict, learn their rules. By the way, I am not suggesting that you should not use ML. I am just asking you to look from a different angle.

PredictorX1 t1_j47qgxr wrote on January 13, 2023 at 7:18 PM

#1,353,312

Expert knowledge could be encoded as rules whose output is used as features for a machine learning system. These rules would accept data you already have, and produce new data as conclusions which would be fed as extra variables to a modeling algorithm.

[deleted] t1_j4843of wrote on January 13, 2023 at 8:43 PM

#1,354,086

[deleted]

trnka t1_j488v5u wrote on January 13, 2023 at 9:12 PM

#1,354,369

You might try Snorkel. The gist is that domain experts write rules and those rules are fed into ML. If that company doesn't work, I'm pretty sure there are alternatives. Or maybe they had their work in a Python library... it's been a while.

Compared to traditional ML, the benefit is that you're involving the subject matter experts more and giving them a say more directly. That tends to ensure that they're bought in to the approach. Having been in healthcare ML for a while, getting buy-in can be very challenging.

idly t1_j48gglo wrote on January 13, 2023 at 10:00 PM

#1,354,733

Look into hybrid modeling, there are multiple ways to do this

fudec t1_j48ku4i wrote on January 13, 2023 at 10:28 PM

#1,354,957

Hi! There is relatively new paradigm, 'Physics informed Machine Learning"

Here is a nice review of the different techniques:

https://www.nature.com/articles/s42254-021-00314-5

The most popular approach is based on physics regularization on neural networks.

PS: link for the paper is offered by the autor:

https://www.researchgate.net/publication/351814752_Physics-informed_machine_learning

Maggemkay t1_j48o8cw wrote on January 13, 2023 at 10:51 PM

#1,355,154

Im looking into something similar, essentially combining data driven ML with a knowledge base, but in the context of explainable AI and predictive maintenance.

I have stumbled across something called "Logic Tensor Networks" (search for the paper) which might help in your situation. I need to look into it more, but it combines ML + knowledge bases + fuzzy logic.

Hope you find a solution!

Cherubin0 t1_j48txr9 wrote on January 13, 2023 at 11:30 PM

#1,355,414

Or you could use ml to predict the error of the rule based system or simulation.

currentscurrents t1_j499l3p wrote on January 14, 2023 at 1:24 AM

#1,356,394

Are you trying to do research, or solve a problem? Building expert systems out of neural networks is still a new, experimental idea. If you just want to get the job done you may want to pick more proven methods.

janpf t1_j4ai7ia wrote on January 14, 2023 at 8:39 AM

#1,358,608

If you use synthetic data (from the crop simulation models), the model will kind of reverse-engineer it (it will learn what the simulation models are doing).

Using a mix of it with real word data, is like regularizing your model (adding a prior) to the simulation rules.

This is something that makes sense, and mixing data often is used. But "making sense" doesn't necessarily means it helps ... that depends a lot on your application. Also the next question is how much synthetic data you may want to mix ... fundamentally you'll have to figure it out by trial&error and having some way of measuring if things are getting better for whatever your extrinsic goal is (your business objective).

Meddhouib10 t1_j4andjc wrote on January 14, 2023 at 9:50 AM

#1,358,800

Replying to PredictorX1 (#1,353,312)

Have any paper in mind that speaks about this stuff ?

PredictorX1 t1_j4azldr wrote on January 14, 2023 at 12:30 PM

#1,359,294

Replying to Meddhouib10 (#1,358,800)

No, but the idea is pretty straightforward. Assuming that experts can provide domain knowledge that can be coded as conditions or rules (IF engine_temperature > 95 AND coolant_pressure < 12 THEN engine_status = "CRITICAL"), these can be used to generate 0/1 flags based on existing data to augment the training variables.

This can be made much more complex by using actual expert systems or fuzzy logic. There are entire sections of the technical library for those. For fuzzy logic, I would recommend:

"The Fuzzy Systems Handbook"

by Earl Cox

ISBN-13: 978-0121942700

Tigmib OP t1_j4bbsnf wrote on January 14, 2023 at 2:23 PM

#1,359,878

Replying to ndemir (#1,351,869)

Thanks, yes that is true, the recent days I had a look into Bayesian Statistics. That might be an alternative to pure ML that I am considering right now

Tigmib OP t1_j4bbxyz wrote on January 14, 2023 at 2:24 PM

#1,359,888

Replying to currentscurrents (#1,356,394)

I would say both. I have an actual problem (to predict crop yield as accurate as possible) but the way there is definitely a research problem... What proven methods would you think about?

Tigmib OP t1_j4bcr6q wrote on January 14, 2023 at 2:30 PM

#1,359,932

Replying to PredictorX1 (#1,359,294)

Thanks for that suggestion! Yeah I had thoughts about this. The problem is that plant crop probably has not so binary solutions like a engine status... Maybe a very simple "rule" (e.g. a functions of water access and crop yield) could be added into the loss function. If this easy expert knowledge output a high probability that the plant died (and yield=0) all y_train could be set to 0 also.... However, crop growth relies on so many events that happens during growth, that it would mean to implement many many rules...

Tigmib OP t1_j4bd1xk wrote on January 14, 2023 at 2:33 PM

#1,359,947

Replying to fudec (#1,354,957)

Thanks a lot! That looks like a very interesting approach! I will have a detailed look into it!

Tigmib OP t1_j4bdhpa wrote on January 14, 2023 at 2:36 PM

#1,359,970

Replying to Maggemkay (#1,355,154)

Hi, interesting! Do expert models exist for your problem already or would it be only the knowledge database you want to combine?

Tigmib OP t1_j4bezs5 wrote on January 14, 2023 at 2:47 PM

#1,360,048

Replying to janpf (#1,358,608)

Yes thats true. This is also what I thought about. Using a mixed dataset or transfer learning approaches (first train on synthetic data, then retrain on real world) should incorporate the domain knowledge. But you are right, right know thats just an hypothesis...but I will test it!

Maggemkay t1_j4bms1x wrote on January 14, 2023 at 3:43 PM

#1,360,442

Replying to Tigmib (#1,359,970)

There might be general existing models that i can fit to my problem, but i havent looked into it yet.

Im interested regardless if they already exist and if i can combine them with other data sources.