Kinexity t1_j9yyi6o wrote on February 25, 2023 at 4:29 PM

Reply to comment by Depression_God in Likelihood of OpenAI moderation flagging a sentence containing negative adjectives about a demographic as 'Hateful'. by grungabunga

That's true but assuming that they somehow can tweak flagging rates (as in not like they fed some flagging model a bunch of hateful tokens and it's automatic) then it's pretty fucked up that there are differences between races and sexes.

Obviously it's based on an assumption and shows that they should have been more transparent over how flagging works.

Depression_God t1_j9z6e93 wrote on February 25, 2023 at 5:21 PM

The only problem we can be certain of is the lack of transparency. Regardless of which direction or how strong the bias is, they should always be transparent about how it works.