silentsnake t1_j6mk21a wrote on January 31, 2023 at 11:50 AM

To add-on, most real world business data are tabular data.

qalis t1_j6mmvwg wrote on January 31, 2023 at 12:21 PM

Absolutely. OPs question was about research, so I did not include this, but it's absolutely true. It also makes sense - everyone has relational DBs, they are cheap and scalable, so chances are a business already has a quite reasonable data for ML just waiting in their tabular database. This, of course, means money, which means money for research, even in-company research, which may not be even published, but is research nonetheless.

MrAcurite t1_j6msvtn wrote on January 31, 2023 at 1:18 PM

The customers I build models for insist on interpretability and robustness, which deep learning just doesn't give them right now. Actually just got a conference paper out of a modification to a classical method, which was kinda fun.

aschroeder91 t1_j6mys1h wrote on January 31, 2023 at 2:06 PM

Good to hear! Do you know what the space of hybrid models looks like? Specifically using deep learning for input signal to data and classical machine learning algorithms (e.g. gradient boosted trees) for data processing.

My intuition says that hybrid models definitely have a role in general problem solving machines. I've tried searching this topic and the space is muddy at best.

beanhead0321 t1_j6nbq1j wrote on January 31, 2023 at 3:36 PM

I remember sitting in on a talk from a large insurance company who did this a number of years back. They used DL for feature engineering, but used traditional ML for the predictive model itself. This had to do with satisfying some regulatory requirements around model interpretability.

aschroeder91 t1_j6nliim wrote on January 31, 2023 at 4:37 PM

the real innovation will be comfortably backpropagating through the hybrid model as a whole

ktpr t1_j6nmsol wrote on January 31, 2023 at 4:45 PM

Did they claim traditional ML explained the features engineered by the DL? If so, how did they explain the units of feature variables?

qalis t1_j6o6jjv wrote on January 31, 2023 at 6:45 PM

The simplest way to do this is combining autoencoders (e.g. VAEs) and boosting, I have seen this multiple times on Kaggle.

coffeecoffeecoffeee t1_j6o260e wrote on January 31, 2023 at 6:18 PM

For interpretable ML, I really like what Cynthia Rudin's lab at Duke has been putting out. They have a great paper on building ML models that generate rules with integer scores for classification, like what doctors typically use (Arxiv).

qalis t1_j6o5zha wrote on January 31, 2023 at 6:42 PM

Yeah, I like her works. iModels library (linked in my comment under "rule-based learning" link) is also written by her coworkers IIRC, or at least implements a lot of models from her works. Although I disagree with her arguments in "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead", paper which she is arguably the most well known for.

data_wizard_1867 t1_j6n6m7q wrote on January 31, 2023 at 3:02 PM

Another addendum to this fantastic answer: lots of work in uplift modelling also uses traditional ML methods (related to your counterfactual point) and will likely continue to do so.

ApprehensiveNature69 t1_j6nmxbc wrote on January 31, 2023 at 4:46 PM

Tangential, but I am so happy cupy is still being developed even though Chainer died.

nucLeaRStarcraft t1_j6nunti wrote on January 31, 2023 at 5:33 PM

There's also this survey of DL vs traditional methods for tabular data: https://arxiv.org/pdf/2110.01889.pdf

qalis t1_j6o6cou wrote on January 31, 2023 at 6:44 PM

That's a nice paper. There is also an interesting, but very niche line of using gradient boosting as a classification head for neural networks. Gradient flows through it normally, after all, just tree addition is used instead of gradient descent steps. But sadly I could not find any trustworthy open sourced implementation of this approach. If this works, it could bridge a gap between deep learning and boosting models.

[D] Have researchers given up on traditional machine learning methods?

qalis t1_j6mczg1 wrote on January 31, 2023 at 10:18 AM