Viewing a single comment thread. View all comments

xquizitdecorum t1_j2y0ire wrote

We should have a more rigorous definition of "outperform". What are we comparing? Your question touches on the idea of internal versus external validity - if the data is fundamentally flawed, there is performance ceiling if it doesn't reflect the use case of the ML algorithm developed using it. It may be internally valid (the ML model is trained correctly), but has poor external validity (the ML model doesn't apply to the task it was trained for).

1