Since we are heading into the holiday season, I thought it would be interesting to take a look if you could create a model to look at morality with user's Reddit comments. I used Scikit-Learn's Logistic Regression Model for this.

I started by downloading around 750 comments from Social Grep's website. They have pulled Reddit comments from different sets of subreddits. I pulled from their datasets for confession-like subreddits, the irl subreddits, and the dataset subreddit. I classified the comments manually by a set rule of morality. Once they were scored, I trained/tested the Logistic model with those comments.

For the specific user testing, I used PRAW to pull the most recent 50 comments from the username provided in the Hex Application. I ran the trained model and outputted the probability of each comment being nice and took an average of the probabilities and used that value to determine whether the user was naughty or nice. I use a script to email a CSV with all of the tested comments and the final score to the user.

Based on the results that have came through so far, the model is definitely biased towards giving the user a nice decision. I believe that is based on the training data being around 70% nice versus naughty. Does anyone have a way to help the model from being biased like that?

Feel free to try the app out and let me know what you think!

Comments

You must log in or register to comment.

A1-Delta t1_j1ao5gn wrote on December 22, 2022 at 10:14 PM

Would be great if you’d just let it show me results in browser. I’m not going to supply my email address no matter how many times you pinky promise it wont end up in a marketer’s hands.

Steven_Johnson34 OP t1_j1dv8xk wrote on December 23, 2022 at 4:06 PM

Completely understand. I knew that could be a concern when I was building it out. Process wise, it was not possible for me to show the results in the browser or else I would have done so based on API capabilities. We are going to try and use this as a jump off point to expand the API to be able to send statuses back on a webhook.

Lajamerr_Mittesdine t1_j1fuu1j wrote on December 24, 2022 at 12:41 AM

It's 2022. Everyone should be having their own email address dumps with their domain name.

For example with Google Domains you can easily spin up 100 email addresses forwarded to your main mailbox no extra charge. Comes with your yearly domain renewal.

I create emails for each service I use.

reddit@mydomain.com , google@mydomain.com , walmart@mydomain.com

I can just create an email called junktest@mydomain.com

And if it ever gets to spammy you can just delete that email from the list and it won't get forwarded to your main inbox

shitboots t1_j19bbmw wrote on December 22, 2022 at 4:54 PM

Santa making meta jokes. It determined with 69% certainty that I've been nice. Nice.

Steven_Johnson34 OP t1_j19ble8 wrote on December 22, 2022 at 4:55 PM

Maybe I need to make a model to check if Santa is naughty or not 😂

Koitenshin t1_j1cb7ny wrote on December 23, 2022 at 6:14 AM

If "The Santa Clause" is anything to go by, Santa has no problem sliding down any hole he can find.

Phoneaccount25732 t1_j19v3re wrote on December 22, 2022 at 7:01 PM

Focal loss is typical for class imbalanced data.

My brain prefers solutions that modify the training objective to resampling based solutions on the grounds of elegance. I'm not sure if that's a good attitude to have or not, if anyone has any thoughts.

Cherubin0 t1_j1cm12j wrote on December 23, 2022 at 8:27 AM

Nice, one step closer to full Social Credit System.

impossiblefork t1_j1cqvnr wrote on December 23, 2022 at 9:32 AM

We would be crazy to hand you our e-mail addresses. It's a fun idea, but that just isn't happening.