Submitted by fujidaiti t3_10pu9eh in MachineLearning

This may be a silly question for those familiar with the field, but don't machine learning researchers expect any more prospects for traditional methods (I mean, "traditional" is other than deep learning)? I feel that most of the time when people talk about machine learning in the world today, they are referring to deep learning, but is this the same in the academic world? Have people who have been studying traditional methods switched to neural networks? I know that many researchers are excited about deep learning, but I am wondering what they think about other methods.

[ EDITED ]

I’m glad that I got far more responses than I expected! However, I would like to add here that my intention did not seem to come across to some people because of my inaccurate English.

I think “have given up" was poorly phrased. What I really meant to say was, are ML researchers no longer interested in traditional ML? Have those who studied, say, SVM moved on to DL field? That was my point, but u/qalis gave me a good comment on it. Thanks to all the others.

244

Comments

You must log in or register to comment.

qalis t1_j6mczg1 wrote

Absolutely not! There is still still a lot of research going into traditional ML methods. For tabular data, it is typically vastly superior to deep learning. Especially boosting models receive a lot of attention due to very good implementations available. See for example:

- SketchBoost, CuPy-based boosting from NeurIPS 2022, aimed at incredibly fast multioutput classification

- A Short Chronology Of Deep Learning For Tabular Data by Sebastian Raschka, a great literature overview of deep learning on tabular data; spoiler: it does not work, and XGBoost or similar models are just better

- in time series forecasting, LightGBM-based ensembles typically beat all deep learning methods, while being much faster to train; see e.g. this paper, you can also see it at Kaggle competitions or other papers; my friend works in this area at NVidia and their internal benchmarks (soon to be published) show that top 8 models in a large scale comparison are in fact various LightGBM ensemble variants, not deep learning models (which, in fact, kinda disappointed them, since it's, you know, NVidia)

- all domains requiring high interpretability absolutely ignore deep learning at all, and put all their research into traditional ML; see e.g. counterfactual examples, important interpretability methods in finance, or rule-based learning, important in medical or law applications

292

jiamengial t1_j6mdcrj wrote

Don't think so, diffusion models are based entirely on sampling methods; if anything what's exciting is to take the "traditional" methods and, instead of replacing the whole thing with neural nets, replace only a component of it

−8

arg_max t1_j6mg664 wrote

I think diffusion models are kind of a bad example. The SDE paper from Yang Song has shown that it's all about modeling the score function and this can't be done with simple models. Apart from that, the big text2img models work inside the latent space of a deep vae, make use of conditioning using cross attention which isn't a thing in traditional ML and use large language models to process the text input. All their components are very dl based.

13

PredictorX1 t1_j6mkzl0 wrote

To be clear, there are neural networks which are "deep", and others which are "shallow" (few hidden layers). From a practical standpoint, the latter have more in common with other "shallow" learning methods (tree-induction, statistical regressions, k-nearest neighbor, etc.) than they do with deep learning.

You're right that many people (especially in the non-technical press) have erroneously used "machine learning" to mean specifically "deep learning", just as they've used "artificial intelligence" to mean "machine learning". Regardless, there are still non-deep machine learning methods and other branches of A.I. In practice, non-deep machine learning represents the overwhelming majority of applications today.

I haven't followed the research as closely in recent years, but I can tell you that, deep learning aside, people have only begun to scratch the surface of machine learning application.

54

qalis t1_j6mmvwg wrote

Absolutely. OPs question was about research, so I did not include this, but it's absolutely true. It also makes sense - everyone has relational DBs, they are cheap and scalable, so chances are a business already has a quite reasonable data for ML just waiting in their tabular database. This, of course, means money, which means money for research, even in-company research, which may not be even published, but is research nonetheless.

32

MrAcurite t1_j6msvtn wrote

The customers I build models for insist on interpretability and robustness, which deep learning just doesn't give them right now. Actually just got a conference paper out of a modification to a classical method, which was kinda fun.

22

aschroeder91 t1_j6mys1h wrote

Good to hear! Do you know what the space of hybrid models looks like? Specifically using deep learning for input signal to data and classical machine learning algorithms (e.g. gradient boosted trees) for data processing.

My intuition says that hybrid models definitely have a role in general problem solving machines. I've tried searching this topic and the space is muddy at best.

14

new_name_who_dis_ t1_j6n7roh wrote

Trees/forests are still state of the art for structured data... So not only did they not give up on them, but traditional methods are seen as better in some domains. Not to mention the ease of use, and the quick training.

Also explainable AI is much more promising with traditional methods, especially trees.

22

kpalan t1_j6n8otw wrote

The main way diffusion uses to predict added noise is with deep convolutional neural networks,
Furthermore, stable diffusion specifically uses even more deep CNNs to downscale,upscale the image.

2

kpalan t1_j6n8z2q wrote

A reason behind this is probably because a lot of reaserch has been put into this field and companies don't invest time there as they do not expect to find anything new.

2

SaifKhayoon t1_j6n9kb2 wrote

Nah, researchers haven't given up on traditional machine learning methods! They combine them with deep learning in lots of places, like image classification, speech recognition, and recommender systems.

Plus, traditional methods can be better for some tasks, like when you have a small dataset or want an explainable model or real-time predictions.

16

beanhead0321 t1_j6nbq1j wrote

I remember sitting in on a talk from a large insurance company who did this a number of years back. They used DL for feature engineering, but used traditional ML for the predictive model itself. This had to do with satisfying some regulatory requirements around model interpretability.

11

bananonymos t1_j6nc2fr wrote

Lol no. Have researched given up on linear regression?

25

Internal-Diet-514 t1_j6nep37 wrote

Deep learning is only really the better option with higher dimensional data. If Tabular data is 2D, time series is 3D and image data 4D (extra dimension for batch) than deep learning is really only used for 3D and 4D data. As others have said tree based models will most of the time outperform deep learning on a 2D problem.

But I think the interesting thing is the reason we have to use deep learning in the first place. In higher dimensional data we don’t have something that is “a feature” in the sense that we do with 2D data. In time series you have features but they are taken over time so really we need a feature which describes that feature over time. That’s what CNNs do. CNNs are feature extractors and at the end of the process almost always put that data back into 2D format (when doing classification) which is sent through a neural net, but it could be sent through a random forest as well.

I think it’s fair to compare a neural network to traditional ML but when we get into a CNN thats not really a comparison. A CNN is a feature extraction method. The great thing is that we can optimize this step by connecting it to a neural network with a sigmoid (or whatever activation) output.

We don’t have a way to connect traditional ML methods with a feature extraction method in the way you can with back propagation for a neural net and a CNN. If it’s possible to find a way to do that, maybe we would see a rise in the use of traditional ML for high dimensional data.

8

Brudaks t1_j6nj9z1 wrote

For most established tasks people have a good idea (based on empirical evidence) about the limits of particular methods for this task.

There are tasks where "traditional machine learning methods" work well, and people working on these tasks use them and will use them.

And there are tasks where they don't and deep learning gets far better results that we could/can do otherwise - and for those types of tasks, yes, it would be accurate to say that we have given up on traditional machine learning; if you're given an image classification or text analysis task, you'd generally use DL even for a simple baseline without even trying any of the "traditional" methods we used in earlier years.

12

Grandexar t1_j6nlqx2 wrote

The hype for neural nets right now is just because the hardware has finally caught up

4

Technical_Ad_9732 t1_j6nqru3 wrote

Not at all. It would make this all a sad one trick pony, and frankly, not all fixes are requiring a hammer only when you have an entire toolbox in existence.

4

JimmyTheCrossEyedDog t1_j6nv3zg wrote

This feels like a mix-up between the colloquial and mathematical definitions of dimension Yes, NN approaches tend to work better on very high-dimensional data, but the dimension here refers to the number of input features. So, for a 416x416x3 image, that's >500k dimensions, far higher than the number of dimensions in almost all tabular datasets.

> image data 4D (extra dimension for batch)

The batch is an arbitrary parceling of data simply due to how NNs are typically trained for computational reasons. If I were to train a NN on tabular data, it'd also be batched, but it doesn't give it a new meaningful dimension (either in the colloquial sense or the sense that matters for ML)

Also, NNs are still the best option for computer vision even on greyscale data, which is spatially 2D but still has a huge number of dimensions.

edit: I'd also argue that high dimensionality isn't the biggest reason NNs work for computer vision, but something more fundamental - see qalis's point bin this thread

17

peatfreak t1_j6nvknw wrote

No way. Machine learning has always been subject to ideas cycling around over time, booms and busts of interest. For example, neural nets have come and gone many times during the decades. I wouldn't be surprised if there is a revival of interest in support vector machines, for example, in 5-10 years' time.

2

seba07 t1_j6nxuiq wrote

I think the argument is quite often "because it works". You throw your data at a standard CNN (if you're dealing with images) and often get decent results quite fast.

2

Internal-Diet-514 t1_j6nzvcc wrote

When talking about dimensions I meant (number of rows, number of features) is 2 dimensions for tabular data. (Number of series, number of time steps, number of features) is 3 dimensions for time series and (number of images, width, height, channels) is 4 dimensions for image data. for deep learning classification, regardless of the number of dimensions it originally ingests it will become (number of series, features) or (number of images, features) when we get to the point of applying an mlp for classification.

You could consider an image to have width x height x channels features but thats not what a CNN does, the cnn extracts meaningful features from the high dimensional space. The feature extraction phase is what makes deep learning great for computer vision. Traditional ML models don’t have that phase.

0

coffeecoffeecoffeee t1_j6o260e wrote

For interpretable ML, I really like what Cynthia Rudin's lab at Duke has been putting out. They have a great paper on building ML models that generate rules with integer scores for classification, like what doctors typically use (Arxiv).

12

qalis t1_j6o5zha wrote

Yeah, I like her works. iModels library (linked in my comment under "rule-based learning" link) is also written by her coworkers IIRC, or at least implements a lot of models from her works. Although I disagree with her arguments in "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead", paper which she is arguably the most well known for.

6

qalis t1_j6o6cou wrote

That's a nice paper. There is also an interesting, but very niche line of using gradient boosting as a classification head for neural networks. Gradient flows through it normally, after all, just tree addition is used instead of gradient descent steps. But sadly I could not find any trustworthy open sourced implementation of this approach. If this works, it could bridge a gap between deep learning and boosting models.

3

qalis t1_j6o79xv wrote

A better distinction would be that deep learning excels in application that require representation learning, i.e. transformation from domains that do not lie in Euclidean metric space (e.g. graphs) or that are too problematic in the raw form and require processing in another domain (e.g. images, audio). This is very similar to feature extraction, but representation learning is a bit more general term.

Tabular ML does not need this in general, since after obtaining feature vectors we already have a representation and deep learning like MLP can only apply (exponentially) nonlinear transformation of that space, instead of really learning fundamentally new representations of that data, which is the case e.g. for images, going from raw pixel values space into vector space that captures semantic features in the image.

2

dancingnightly t1_j6oaxeo wrote

This is commercial, not research but: A lot of scenarios where explainable AI is needed use simple statistical solutions.

​

For example a company I knew had to identify people in poverty in order to distribute a large ($M) grant fund to people in need, and they had only basic data about some relatively unrelated information, like how often these people travelled lets say, their age, etc.

​

In order to create an explainable model where factors can be understood by higher ups, and considered for bias easily, they used a k-means approach with just 3 factors.

​

It captured close to as much information as deep learning, but with more robustness to data drift, and with clear graphs segmenting the target group and general group. It also reduced use of data, being pro-privacy.

​

This 30 line of code solution with a dozen explanatory output graphs about EDA probably got sold for >500k in fees... but they did make the right choices in this circumstance. They saved on a complex ML model, bias/security/privacy/deployment hell, and left a maintainable solution.

​

Now for research, it's interesting from the perspective of applied AI (which is arguably still dominantly GOFAI/simple statistics) and communication about AI with the public, although I wouldn't say it's in vogue.

5

JimmyTheCrossEyedDog t1_j6odc4c wrote

> When talking about dimensions I meant (number of rows, number of features) is 2 dimensions for tabular data...

Right, but my point is that when people say "NNs work well on high dimensional data", that's not what they mean.

> You could consider an image to have width x height x channels features

It does have that many input features, i.e. dimensions, like you've written below.

> but thats not what a CNN does, the cnn extracts meaningful features from the high dimensional space.

Now we're talking about composite or higher level features, which is different from what we've been talking about up to this point. It's true that for tabular data (or old school, pre-NN computer vision) you generally start to construct these yourself whereas with images you can just throw the raw data in and the NN does this more effectively than you ever could, but this is irrelevant to the input dimensionality.

3

Internal-Diet-514 t1_j6oi9qu wrote

If we’re considering the dimensions to be the number of datapoints in an input than I’ll stick to that definition and use the shape of the data instead of dimensions. I don’t think I was wrong to use dimensions to describe the shape of the data but I get that it could be confusing because high dimensional data is synonymous with a large number of features, whereas I meant high dimensions to be data with shape > 2.

Deep learning or CNNs are great because of its ability to extract meaningful features from data with shape > 2 and then pass that representation to an mlp. But the feature extraction phase is a different task than what traditional ml is meant to do, which is to take a set of already derived features and learn a decision boundary. So I’m trying to say a traditional ml model is not super comparable to the convolutional portion (feature extraction phase ) of a cnn.

−3

JimmyTheCrossEyedDog t1_j6osqa5 wrote

Good call, shape is the much better term to avoid confusion.

> If we’re considering the dimensions to be the number of datapoints

To clarify - not the number of datapoints, the number of input features. The number of datapoints has nothing to do with the dimensionality (only the shape).

> Deep learning or CNNs are great because of its ability to extract meaningful features from data with shape > 2

This is where I'd disagree (but maybe you have a source that suggests otherwise). Even for time series tabular data, gradient boosted tree models typically outperform NNs.

Overall, shape rarely has anything to do with how a model performs. CNNs are built to take knowledge of the shape of the data into account (restricting kernels to convolutions of spatially close datapoints), but not all NNs do that. If we were using a network with only fully connected layers, for example, then there is no notion of spatial closeness - we might as well have transformed an NxN image into a N^2 x1 vector and your network would be the same.

So, neural networks handling inputs that have spatial (or temporal) relationships well has nothing to do with it being a neural network, but with the assumptions we've baked into the architecture (like convolutional layers).

3

Internal-Diet-514 t1_j6oujtg wrote

Time series tabular data would have shape 3 (number of series, number of time points, # of features). For gradient boosted tree models isnt the general approach to flatten the space to (number of series, number of time points X # of features). Where as a cnn would be employed to extract time dependent features before flattening the space.

If there’s examples that boosted tree models perform better in this space, and I think you’re right there are, than I think that just goes to show how traditional machine learning isnt dead, but rather if we could find ways to combine it with the thing that makes deep learning work so well (feature extraction) it’d probably do even better.

−4

DisWastingMyTime t1_j6p1t2y wrote

What untill you hear what is the underlying concept of everything ever.

CPU? Thats just bool logic! Differential equations for flight control? My man, that's just a few dxdys faking stable solutions!

What a boring take, every time I've heard it in person it proved to be said by a person who knew very little about the complexities of the topic they're reducing to the "underlying" concepts.

No offense to any of you "sophisticated" fellows

−13

bananonymos t1_j6p3k52 wrote

What crawled up your butt?

OP asked if ML was going away in place of AI.

My response is that many people still use linear regressions for problems.

Idiot responds hope they stop. That’s like saying we shouldn’t use cash because credit cards or phone wallets are better.

Underlying many ML and AI models are regression models. That’s all I said. Nothing about reducing everything to its basic parts. But something bothered you enough to basically insult me and make assumptions. Did someone like that make you feel inadequate enough to harass strangers?

Dare I say you must be fun at parties and before you respond back. Yeah I know Im not.

13

Comfortable_Slip4025 t1_j6parh8 wrote

I just worked on a quartet search tree project to create optimal evolutionary trees. So, not dead yet!

1

lukemtesta t1_j6pf223 wrote

AFAIK neural networks are best for modelling a function of some parameters. In contrast regime detection in financial systems prefer Gradient-Boosted Trees, Random Forests and markov chains. Autoregressive models such as ARMA, ARIMA and GARCH utilise regression, while game regression tests favour reinforcement learning techniques.

It depends on the application basically.

1