Viewing a single comment thread. View all comments

vladosaurus OP t1_j9gu3cr wrote

Ideally we have to generate many examples as such without seeing them and wrap them in some test suite using automatic differentiation to see how many will come out correct.

Something similar to what the authors did in the OpenAI Codex model. They provided the function signature and the docstrings and promted the model to generate the rest of it. Then they wrapped the generated function into test suites and calculated how many of them pass. It's the pass@K metric.

I am not aware if something similar is done for differentiation, maybe there is, I have to search for.

0

Delacroid t1_j9itr09 wrote

Well that good be an amazing post to read. How many times does it get math questions right but with an statistically significant number of samples. So that we can actually compare to the state of the art, such as galactica.

1