meister2983 t1_je0s90f wrote on March 28, 2023 at 4:18 PM

Reply to comment by ArnoF7 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

GPT-4 is an extremely good pattern matcher - probably one of the best ever made. Most exams made seem to be able to executed with straight-forward pattern matching (with no backtracking). The same thing applies to basic coding questions - it reasonably performs at the level of a human gluing stack overflow solutions together (with the obvious variable renaming/moving lines around/removing dead code/etc.)

It struggles at logical reasoning (when it can't "pattern match" the logical reasoning to something it's trained on).

Coding example:

Had no problem writing a tax calculator for ordinary income with progressive tax brackets
It struggles to write a program to calculate tax on long term capital gains (US tax code), which is very similar to the above, except has an offset (you start bracket indexing at ordinary income). I'd think this is actually pretty easy for a CS student especially if they saw the solution above -- GPT4 struggled though as it doesn't really "reason" about code the way a human would and would generate solutions obviously wrong to a human.

meister2983 t1_jdx06k9 wrote on March 27, 2023 at 8:24 PM

Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Yes. RLHF both increases accuracy on certain tests while decreasing calibration on others.

meister2983 t1_jdwu6ig wrote on March 27, 2023 at 7:46 PM

Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

It's necessary to improve overall performance; GPT-4 isn't just a thing to answer multiple choice questions.

E.g. Accuracy on adversarial questions (Truthful QA) goes from 40% to 60%.

meister2983 t1_jdwt675 wrote on March 27, 2023 at 7:40 PM

Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Also this is for multiple choice questions (MMLU). I don't think they reported if the pre-RLHF model confidence numbers on fill in the blank world facts aligned to reality.

meister2983 t1_jdwswgt wrote on March 27, 2023 at 7:38 PM

Reply to comment by arg_max in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

Asked a bunch of factual questions on less commonly known stuff. It's either hallucinating or has such poorly calibrated confidence numbers it is useless.

meister2983 t1_jdgghu6 wrote on March 24, 2023 at 5:32 AM

Reply to comment by signed7 in [N] ChatGPT plugins by Singularian2501

The Microsoft Research paper assessing intelligence capability of GPT4 effectively did this. If you just define APIs for the model to use under certain conditions it will write the API call. Once you do that, it's straightforward for a layer on top to detect the API call, actually execute it, and write the result back.

meister2983 t1_j8wt70l wrote on February 17, 2023 at 2:51 PM

Reply to comment by Lemonio in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip

Fair, there's variance, but I don't see what this has to do with outcomes of students.

Utah has high test scores, and low funding. DC the exact opposite.

meister2983 t1_j8vh4y1 wrote on February 17, 2023 at 6:03 AM

Reply to comment by Lemonio in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip

Offset by state and federal funding. See here.

> on average, poor students attend schools that are at least as well-funded as their more advantaged peers.

meister2983 t1_j8v4cdb wrote on February 17, 2023 at 3:58 AM

Reply to comment by NikTheHNIC in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip

Because most school districts receive similar funding in the US.