Submitted by Osemwaro t3_ziwuna in MachineLearning

I've been trying to empirically assess what biases ChatGPT has about certain things when I give it minimal information about what I want. The approach that I've tried is to repeatedly make a request in a new thread, look at the distribution of key words, phrases or word/phrase categories across its responses, and compare these distributions across different requests. E.g. one set of requests that I've made have the structure:

>Make up a realistic story about (a|an) <TRAIT> person. Include their name and a description of their appearance.

I collected 10 responses for each of the following **<TRAIT>**s: "intelligent", "unintelligent", "devious", "trustworthy", "peaceful", "violent", and did the same for 2 other request structures that request similar information, using the same set of **<TRAIT>**s. So I have 30 responses in total for each of the 6 **<TRAIT>**s.

Before I finished writing a program to analyse the results, some biases stood out immediately. E.g. for "intelligent", the responses were almost always about women, except for one or two that were about a person called Alex, of unspecified gender (it used "they/them" pronouns in those responses). The people in these responses were almost always scientists too, and the names were nowhere near as diverse as they could have been (e.g. for the request structure above, 4 of the 10 women in the responses were called Samantha). If I repeatedly make the same request in the same thread, these characteristics of the responses do display more diversity, but the responses all have the same structure (e.g. the same number of paragraphs, and often near-identical sentences in corresponding paragraphs).

It wasn't clear to me if these biases are representative of its biases across a wide range of interactions, or if it's just bad at drawing random samples in its first response, for some reason. So I tried a simpler request, of giving me the name of a vegetable. I asked 35 times, and it said "carrot" 30 times and "broccoli" 5 times. The results of all my vegetable-name interactions are here. I also tried asking it to name an American president in 6 threads, and it said "George Washington" each time, and I tried asking it to name an intelligent person, and it usually said Albert Einstein, although it did occasionally say Stephen Hawking.

Questions

Assuming that carrots do not constitute anywhere near 85% of the vegetables in ChatGPT's training set, can anyone suggest likely causes for this bias in its initial responses? E.g. what characteristics of the reward function are likely to have made its initial responses so biased, compared to the training data? Is this a common phenomenon in conversational agents trained by RL?

2

Comments

You must log in or register to comment.

farmingvillein t1_j004cnd wrote

Yes, it could be a function of RL, or it could be simply how they are sampling from the distribution.

If this is something you truly want to investigate, I'd start by first running the same tests with "vanilla" GPT (to possibly include avoiding the InstructGPT variant, if you are concerned about RL distortion).

As a bonus, most of the relevant sampling knobs are exposed, so you can make it more or less conservative in terms of how widely it samples from the distribution (this, potentially, is the bigger driver in what you are seeing).

3

red75prime t1_j015ur9 wrote

Looks like the network mimics the representativeness heuristic (skewed by anti-bias bias).

1

Osemwaro OP t1_j04kh8k wrote

Ah yes, I see that the GPT-3 tutorial discusses controlling the entropy as you described with a temperature parameter, which presumably corresponds to a softmax temperature. That sounds like a likely culprit.

I don't have an NLP background, so I'm not familiar with the literature, but I did some Googling and came across a recent paper called "Softmax Bottleneck Makes Language Models Unable to Represent Multi-mode Word Distributions", which says

>In this paper, we discover that, when predicting the next word probabilities given an ambiguous context, GPT-2 is often incapable of assigning the highest probabilities to the appropriate non-synonym candidates.

The GPT-3 paper says that GPT-2 and GPT-3 "use the same model and architecture", so I wonder if the softmax bottleneck is part of the problem that I've observed too.

1

AlexeyKruglov t1_j04rrg5 wrote

Probably because temperature parameter is not 1.0 when the model samples next tokens. Having it above 1 leads to the bias towards the more probable tokens.

2

Nameless1995 t1_j05chgd wrote

> If I repeatedly make the same request in the same thread, these characteristics of the responses do display more diversity, but the responses all have the same structure (e.g. the same number of paragraphs, and often near-identical sentences in corresponding paragraphs).

For proper results you should resample by clicking the "try again" button (or thread reset). Otherwise if by chance the first sample talks about a woman scientist named Samantha, all the later responses would be biased by that. Your next samples won't be independent by selectively biased based on the initial sample. To control for that, when comparing multiple samples you should make sure they are sampled under the similar conditions besides differences in rng (i.e use the "try again" given the same past conversation, or ask all of them in reset state).

> So I tried a simpler request, of giving me the name of a vegetable. I asked 35 times, and it said "carrot" 30 times and "broccoli" 5 times. The results of all my vegetable-name interactions are here. I also tried asking it to name an American president in 6 threads, and it said "George Washington" each time, and I tried asking it to name an intelligent person, and it usually said Albert Einstein, although it did occasionally say Stephen Hawking.

Sounds about expected.

1

Osemwaro OP t1_j06rbf6 wrote

I know -- all of the statistics that I gave are based on samples created in new threads or with "try again". I only mentioned what happens when I repeat a request within one thread to prove that ChatGPT knows the names of other vegetables, etc.

1

Osemwaro OP t1_j06y0j1 wrote

I did wonder if its developers' attempts to address the biases in the training data may have inadvertently led to it being biased in the opposite direction in some cases (if that's what you mean by "anti-bias bias").

My goal was to identify and measure expressions of bias that are unlikely to be censored by the content filter, including rarely discussed biases (e.g. it described a disproportionate number of the women in its stories about intelligent people as being tall and having a slender/athletic build). But I can't easily get a representative sample of responses that it might give over the course of millions of interactions with users if its developers have used a low softmax temperature to massively reduce its entropy, as some other commenters have suggested.

1

Osemwaro OP t1_j06z76a wrote

Yeah, u/farmingvillein suggested that before you. The temperature parameter behaves like temperature in physics though, so low temperatures (i.e. temperatures below 1) decrease entropy, by biasing it towards the most probable tokens, and high temperatures increase entropy, by making the distribution more uniform.

1