notdelet

notdelet t1_j9g627c wrote

> Assuming Gaussianity and then using maximum likelihood gives yields an L2 error minimization problem.

Incorrect, only true if you fix the scale parameter. I normally wouldn't nitpick like this but your unnecessary usage of bold made me.

> (if you interpret training as maximum likelihood estimation)

> a squared loss does not "hide a Gaussian assumption".

It does... if you interpret training as (conditional) MLE. Give me a non-Gaussian distribution with an MLE estimator that yields MSE loss. Also, residuals are explicitly not orthogonal projections whenever the variables are dependent.

0

notdelet t1_j7vv9pi wrote

You can get constrained optimization in general for unconstrained nonlinear problems (see the work N Sahinidis has done on BARON). The feasible sets are defined in the course of solving the problem and introducing branches. But that is both slow, doesn't scale to NN sizes, and doesn't really answer the question ML folks are asking (see the talk at the IAS on "Is Optimization the Right Language for ML").

2

notdelet t1_j48yvht wrote

Hot take: foundation models is pure branding, so if they say it's foundation models it will be foundation models that we're all using.

4