Kinexity t1_j9yyi6o wrote
Reply to comment by Depression_God in Likelihood of OpenAI moderation flagging a sentence containing negative adjectives about a demographic as 'Hateful'. by grungabunga
That's true but assuming that they somehow can tweak flagging rates (as in not like they fed some flagging model a bunch of hateful tokens and it's automatic) then it's pretty fucked up that there are differences between races and sexes.
Obviously it's based on an assumption and shows that they should have been more transparent over how flagging works.
Depression_God t1_j9z6e93 wrote
The only problem we can be certain of is the lack of transparency. Regardless of which direction or how strong the bias is, they should always be transparent about how it works.
Viewing a single comment thread. View all comments