meister2983
meister2983 t1_jdx06k9 wrote
Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Yes. RLHF both increases accuracy on certain tests while decreasing calibration on others.
meister2983 t1_jdwu6ig wrote
Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
It's necessary to improve overall performance; GPT-4 isn't just a thing to answer multiple choice questions.
E.g. Accuracy on adversarial questions (Truthful QA) goes from 40% to 60%.
meister2983 t1_jdwt675 wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Also this is for multiple choice questions (MMLU). I don't think they reported if the pre-RLHF model confidence numbers on fill in the blank world facts aligned to reality.
meister2983 t1_jdwswgt wrote
Reply to comment by arg_max in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Asked a bunch of factual questions on less commonly known stuff. It's either hallucinating or has such poorly calibrated confidence numbers it is useless.
meister2983 t1_jdgghu6 wrote
Reply to comment by signed7 in [N] ChatGPT plugins by Singularian2501
The Microsoft Research paper assessing intelligence capability of GPT4 effectively did this. If you just define APIs for the model to use under certain conditions it will write the API call. Once you do that, it's straightforward for a layer on top to detect the API call, actually execute it, and write the result back.
meister2983 t1_j8wt70l wrote
Reply to comment by Lemonio in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip
Fair, there's variance, but I don't see what this has to do with outcomes of students.
Utah has high test scores, and low funding. DC the exact opposite.
meister2983 t1_j8vh4y1 wrote
Reply to comment by Lemonio in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip
Offset by state and federal funding. See here.
> on average, poor students attend schools that are at least as well-funded as their more advantaged peers.
meister2983 t1_j8v4cdb wrote
Reply to comment by NikTheHNIC in [OC] Prediction markets forecast the supreme court will end affirmative action by liortulip
Because most school districts receive similar funding in the US.
meister2983 t1_je0s90f wrote
Reply to comment by ArnoF7 in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
GPT-4 is an extremely good pattern matcher - probably one of the best ever made. Most exams made seem to be able to executed with straight-forward pattern matching (with no backtracking). The same thing applies to basic coding questions - it reasonably performs at the level of a human gluing stack overflow solutions together (with the obvious variable renaming/moving lines around/removing dead code/etc.)
It struggles at logical reasoning (when it can't "pattern match" the logical reasoning to something it's trained on).
Coding example: