brightbehaviorist

brightbehaviorist t1_iu964wd wrote

It’s just not true that the absolute number of crimes “should” correlate with the ratio of trips interrupted by crime : trips attempted, even if both routes have been taken at least some minimum number of times (which your model doesn’t have any way of knowing, anyway). If you don’t understand this very basic bit of data science, it’s totally reckless for you to be offering people advice on where to walk.

Look, there were 485 murders in NYC in 2021, compared to 337 in Baltimore in the same period. But we all know better than to say that Baltimore is safer than NYC, because NYC has many times more people in it than we do. You have to correct for the base rate of population by comparing murders/100k residents or something. When you do that, you see that the count doesn’t correlate with the relative risk ratio at all!

If you don’t have the information you need for the denominator of the relative risk ratio, there’s no amount of “testing” that can show your model works.

1

brightbehaviorist t1_iu8tmct wrote

Another big problem with this kind of data mining is that it doesn’t account for the base rate of route use.

Say there’s a very popular, busy pedestrian route that thousands of people walk every day. Over 6 months, there’s four crimes in the database for that stretch, but that’s out of hundreds of thousands successfully completed safe trips. Another route is much quieter—through neighborhood streets instead of up the main drag. It only has one crime in the database for the same period, but that’s out of only a few thousand successfully completed safe trips. If the app is just telling you that 1 < 4, it will recommend the quieter route, but that’s not necessarily the safer one. What you’d really want to know is the ratio of trips interrupted by crime : trips attempted. Otherwise the app is just reacting to where the people are and sending you away from them—and it’s common sense that an empty street is often more dangerous than a populated one.

That’s on top of the substantial problems with the database data that others have pointed to.

7