canbooo

canbooo t1_j8onqb4 wrote

No, valid question, I just find it difficult to give examples that are easy to understand but let me try. Yes OPs example is not a good one to demonstrate the use case. Let us think about a swarm of drones and their physics, specifically the airflow around them. Hypothetically, you maybe able to describe the physics for a single drone accurately, although this would probably take quite some time in reality. Think days on a simple laptop for a specific configuration if you rally want high accuracy. Nevertheless, if you want to model say 50 drones, things get more complicated. Airflow of one effects the behavior/airflow of others, new turbulence sources and other effects emerge. Actually simulating such a complex system may be infeasible even with supercomputers. Moreover, you are probably interested in many configurations like flight patterns, drone design etc. so that you can choose the best one. In this case, doing a functional interpolation is not very helpful due to the interactions and new emerging effects as we only know the form of the function for a single drone. Sure, you know the underlying equations but you still can't really predict the behavior of the whole without solving them, which is as mentioned costly. The premise of PINNs in this case is to learn to predict the behaviour of this system and the inductive bias is expected to decrease the number of samples required for generalization.

4

canbooo t1_j8ohxf9 wrote

Esp. in engineering applications, i.e. with complex systems/philysics, fundamental physical equations are known but not how they influence each other and the observed data. Alternatively, these are too expensive to compute for all possible states. In those cases, we already build ML models using the data to, e.g. optimize design or do stuff like predictive maintenance. However, these models often do not generalize well to out of domain samples and producing samples is often very costly since we either need laboratory experiments or actually create some design that are bound to fail (stupid but for clarity: think planes with rectangular wings, cranes so thin they could not even pick up a feather. Real world use cases are more complicated to fit in these brackets). In some cases, the only available data may be coming from products in use and you may want to model failure modes without observing them. In all these cases PINNs could help. However, none of the models I have tested so far are actually robust to real world data and require much more tuning compared to MLPs, RNNS etc, which are already more difficult to tune compared to more conventional approaches. So I am yet to find an actual use case that is not academic.

TLDR; physics (and simulations) may be inefficient/inapplicable in some cases. PINNs allow us to embed our knowledge about the first principles in form of inductive bias to improve generalization to unseen/unobservable ststes.

1

canbooo t1_j8mfn6z wrote

Since you are waiting since 6h without any response, let me share my 5c. You are probably inspired by chatgpt and the success of HRL so why not start there: https://openreview.net/forum?id=20-xDadEYeU

But this idea is not novel, only its application to nlp. It has been applied to other stuff like games and autonomous driving. They use PPO, which is to me the most robust on-policy algorithm. However, any other on-policy algorithm could also have been used instead and stuff like SAC could improve sample efficiency but might run into convergence problems. Also, you can try to be more generalistic and try off-policy algorithms independent of a specific language model. This would allow using same experience/value model to fine tune other LMs. But it might require much much more data to achieve a similar performance. In any case, the application of RL to NLP (except for language based games) is quite new and many points remain yet to be answered.

3

canbooo t1_j7z0lku wrote

I agree with the size of the difference yet disagree with the examples as there is ml research considering all 3 (causal ml, conformal ml/predictions/forecasting, AI safety, reliability etc.) I think the difference is more like deduction and induction in a sense, meaning the process of finding the answers are different. Since finishing pooping on corporate time, I will keep this short.

ML: Data -> Method -> Hypothesis -> Answers

Statistics: Hypothesis -> Method -> Data -> Answers

This may be too simplistic and please propose a better distinction but do not postulate that ML does not care about things statistics do.

0

canbooo t1_iydgqzt wrote

Oh, fair enough, my bad, I misunderstood what you mean. You are absolutely right for that case. For me the question is rather P(X>=x) = .2 since having more intelligence implies you have (implicit at least) 20% but this is already too many arguments for a joke. Enjoy the conference!

1

canbooo t1_iy3pylo wrote

Bad initialization can be a problem if you do it yourself (i.e. bad scaling of weights) and if you are not using batch or other kinds of normalizations, since it might make your neurons die. E.g. a tanh neuron with too large input scale will only predict -1 or 1 for all data, which leads it to being dead, i.e. not learning anything due to 0 grad for the entire data set.

6

canbooo t1_ivydtlt wrote

You are right and what I ask may be practically irrelevant and I really should rtfp. However, think about the edge case of 1 Layer with 1 input and 1 output. Each node having 1 as input weight sees the same gradient, similar to the nodes having 0. Increasing the number of inputs make it combinatorially improbable to have the same configuration but increasing the number of nodes in a layer makes it likelier. So, it could be relevant for low dimensions or models with a narrow bottleneck. I am sure that the authors already thought about this problem and either discarded it as it is quite unlikely in their tested settings or they already have a solution/analysis somewhere in the paper, hence my question.

2

canbooo t1_iv6j73z wrote

I think the comment above you is gold and you are approaching this kinda wrong if this is about research. The fact that they are not (yet) solving cv/nlp tasks is an advantage rather than a disadvantage. Although I must admit, I see a more direct relation to RL than anything, this makes it even more interesting since any idea you will come up with will probably be novel.

6