Viewing a single comment thread. View all comments

londons_explorer t1_jcj8p9y wrote

Can we run things like this through github.com/OpenAI/evals?

They have now got a few hundred tests, which is a good way to gauge performance.

9

Taenk t1_jckzuxm wrote

Sorry, I am not an expert, just an enthusiast, so this is a stupid question: Where can I see a list of these few hundred tests and is there some page where I can see comparisons between different models?

3