Submitted by joeggeli t3_y83jow in MachineLearning
I am trying to solve a problem where a given input vector x must be transformed into an optimal output vector y. Both vectors are of the same length. The optimal transforming function y = F(x) is unknown. I can however measure how well a given output y’ matches some desired properties given x. More formally, there is a known and differentiable loss function L = G(x, y’). This loss function is different from usual loss functions which are used in neural networks (e.g. MSE) in the sense that it does not require any labels to compute the loss. It can compute the loss and its derivative only knowing the input x and the output y’ of the neural network.
Further, I have many different input vectors X = {x_1, x_1, …, x_n} and they all have their corresponding optimal output vectors Y = {y_1, y_2, …, y_n}, e.g. the vectors that minimize the loss function G(x_i, y_i). Now I want to train a neural network (a CNN to be specific, since my inputs and outputs are actually images) that approximates this optimal function F(x) as closely as possible for all possible inputs, including inputs that it has never seen before.
Simple example: The true underlying function of F(x) could be the DFT (discrete fourier transform). For the sake of this example, let's prepend that we do not know how to compute it. We do however know how to compute how well a given frequency spectrum y' = F(x) matches the properties of the DFT given the specific input x. This is our loss function L = G(x, y'). Now we compute the derivative of this loss function to update the weights of the neural network F(x), so that it will approximate y' a little bit better next time.
Now to my question: Since L = G(x, y’) is differentiable, it should be possible to train a neural network to approximate F(x) using SGD and its variations, right? Or am I missing something obvious here? It kind of feels like a “hack” and I feel like if it was possible to solve optimization problems in this fashion, more people would be doing it, but I can’t find any literature on it. Is there a name for this type of thing that I’m trying to do?
Thanks
ThrowThisShitAway10 t1_isyfpmq wrote
Isn't this just considered self-supervised learning? Like in an autoencoder you also have a loss L = G(x, y'). I don't see why it wouldn't work.