I am trying to solve a problem where a given input vector x must be transformed into an optimal output vector y. Both vectors are of the same length. The optimal transforming function y = F(x) is unknown. I can however measure how well a given output y’ matches some desired properties given x. More formally, there is a known and differentiable loss function L = G(x, y’). This loss function is different from usual loss functions which are used in neural networks (e.g. MSE) in the sense that it does not require any labels to compute the loss. It can compute the loss and its derivative only knowing the input x and the output y’ of the neural network.

Further, I have many different input vectors X = {x_1, x_1, …, x_n} and they all have their corresponding optimal output vectors Y = {y_1, y_2, …, y_n}, e.g. the vectors that minimize the loss function G(x_i, y_i). Now I want to train a neural network (a CNN to be specific, since my inputs and outputs are actually images) that approximates this optimal function F(x) as closely as possible for all possible inputs, including inputs that it has never seen before.

Simple example: The true underlying function of F(x) could be the DFT (discrete fourier transform). For the sake of this example, let's prepend that we do not know how to compute it. We do however know how to compute how well a given frequency spectrum y' = F(x) matches the properties of the DFT given the specific input x. This is our loss function L = G(x, y'). Now we compute the derivative of this loss function to update the weights of the neural network F(x), so that it will approximate y' a little bit better next time.

Now to my question: Since L = G(x, y’) is differentiable, it should be possible to train a neural network to approximate F(x) using SGD and its variations, right? Or am I missing something obvious here? It kind of feels like a “hack” and I feel like if it was possible to solve optimization problems in this fashion, more people would be doing it, but I can’t find any literature on it. Is there a name for this type of thing that I’m trying to do?

Thanks

Comments

ThrowThisShitAway10 t1_isyfpmq wrote on October 19, 2022 at 4:48 PM

#144,149

Isn't this just considered self-supervised learning? Like in an autoencoder you also have a loss L = G(x, y'). I don't see why it wouldn't work.

nikalehuh t1_isyn4fl wrote on October 19, 2022 at 5:36 PM

#144,632

Is this n-dimensional regression?

namesnonames t1_isyqn44 wrote on October 19, 2022 at 5:58 PM

#144,863

I think that this could totally work. My masters thesis was semi related to this, I used pytorch and instead of vectors x and y I was solving for trajectories that minimized a loss function. Try mocking up a toy version where it's easy to check that it works.

Master-Ad-6411 t1_isyvg72 wrote on October 19, 2022 at 6:30 PM

#145,274

You can. I don't know ML or math much, but I think this is how ML works, or at least some of the ML problem? Give some matching input and output date sample, you find the "best" mapping.

For typical optimization problems, people want to find the best input X that results in minimum energy, but your case is a little special, which you want to find the best mapping of X. So your energy term is the sum of loss for all data samples, and the X is the parameters inside your NN. NN is differentiable and L is differentiable, so it is totally doable. I have also done this, e.g., to find the best trajectory that minimize the energy consuption/impact force, or to position objects so their silouette matches a given one, though I think it is more a pure non-linear optimization problem where you use an NN rather than a ML problem.

lustiz t1_isza6yn wrote on October 19, 2022 at 8:03 PM

#146,241

What is your goal here? If your evaluation metric doesn’t care about the form of F, i.e. only cares about how well your predictions do, this is just standard supervised learning. However, given the way you phrased it, it sounds like there are some geometric or structural properties of your problem that allow you to synthesize training data. Just note that, if you can exploit structural properties of your problem, it may be much faster using standard nonlinear 2nd order optimizers. Will depend on what you are trying to do, though..

LaVieEstBizarre t1_iszotpo wrote on October 19, 2022 at 9:36 PM

#147,128

I don't see why you need any machine learning at all, this is a pure optimisation problem. ML happens when optimisation is based on data of some kind. That's why an objective function like MSE is needed, because a well behaved loss function isn't known generally.

You don't really need the data at all since you have an objective to minimise for the optimal transform, you can just initialise an X, ideally a good guess, and optimise it for your objective function.

iateatoilet t1_it0h1ia wrote on October 20, 2022 at 1:05 AM

#149,041

Check out deepritz

joeggeli OP t1_it22n27 wrote on October 20, 2022 at 11:38 AM

#152,030

Replying to iateatoilet (#149,041)

Thank you! This looks very interesting and might be exactly what I'm looking for :)

joeggeli OP t1_it26unz wrote on October 20, 2022 at 12:18 PM

#152,284

Replying to lustiz (#146,241)

Alright, I'll try to explain ...

My goal is to convert 3d scenes to bas-reliefs. The steps for this are:

Compute the height and normal maps of a 3d scene, as seen from a specific point of view.
Compress the heights in such a way that the overall height is reduced, while the shapes and details of the original 3d scene remain intact.

The first step is simple. The second step is a very difficult problem. There exists no method to do this in an optimal way. There are many approaches to formulate it as linear or quadratic optimization problems.

I'm currently solving it by formulating an optimization problem (which is basically the loss function L = G(x, y')), reformulating it into a poisson-like equation and then solving it using a multigrid method. This is a very standard way to compute such bas-reliefs from height or normal maps. The problems with this approach are:

a) It is relatively slow to transform a single image (1024x1024 takes 1-2 minutes). Neural networks could have the potential to be much faster. My goal is to be as close to real-time as possible. b) It is too restrictive because I have to formulate my loss function in such a way that it results in a linear system. This negatively affects the quality of the final result.

It is possible to compute how well a converted bas-relief y' = F(x) represents the original height map x, which leads to the loss function. As an example, you could compute the similarity of the normal vectors of x and y' (which will already result in a non-linear optimization problem). Similar normal vectors = low loss; different normal vectors = high loss. This type of non-linear optimization problem is what I want to solve, which is what lead me to the idea to solve this problem using a CNN.

joeggeli OP t1_it27igy wrote on October 20, 2022 at 12:24 PM

#152,323

Replying to Master-Ad-6411 (#145,274)

Thanks, it's great to hear that people have already done similar things successfully. The reason why I want to use a neural network over other non-linear solvers is because I need the computation of y' = F(x) to be as fast as possible during inference.

joeggeli OP t1_it27weu wrote on October 20, 2022 at 12:27 PM

#152,352

Replying to ThrowThisShitAway10 (#144,149)

I haven't heard of self-supervised learning before, but that looks very interesting. My problem might well under the umbreall of self-supervised learning, I'll have to look into that. Thanks :)

joeggeli OP t1_it27yj4 wrote on October 20, 2022 at 12:28 PM

#152,355

Replying to LaVieEstBizarre (#147,128)

The reason why I want to use a neural network over other non-linear solvers is because I need the computation of y' = F(x) to be as fast as possible during inference.

LaVieEstBizarre t1_it2axr3 wrote on October 20, 2022 at 12:52 PM

#152,581

Replying to joeggeli (#152,355)

Neural network doesn't actually mean machine learning, it only means machine learning if you do machine learning with it (i.e. optimisation with data). Otherwise it's just a general function approximator.

iateatoilet t1_it2ksrx wrote on October 20, 2022 at 2:05 PM

#153,376

Replying to joeggeli (#152,030)

Like others have noted, this is just a special case of supervised learning that looks a little different from regression. If you dig through scientific ML/physics-informed ML/similar papers you'll find lots of this popping up