Submitted by ssharpe42 t3_ytf0pl in MachineLearning

https://sharpestats.com/mlb-injury-point-process/

I've wanted my hand at modeling injury risk for a while, I finally got around to compiling a large dataset of injuries in the MLB. I wrote an overview of point processes and applied them to injuries in the 2012-2022 seasons to illustrate and quantify how injury history influences future injury risk. Let me know what you think!

13

Comments

You must log in or register to comment.

hypothesis_tooStrong t1_iw5ylu8 wrote

Great post! I've only recently been getting into point processes and it's nice to see it applied on a real world application that is also easily understandable.

I think that non-baseball related injuries can also be incorporated as external events that have their own additive influence term in the intensity equation that decays with time, if there are enough such data points to justify it. I've seen this done in some (finance related) paper, but not sure how it affects the likelihood calculation.

Also, I'm not familiar with this field. Is your accuracy typically what can be expected from other ML models too?

2

ThePhantomPhoton t1_iw92cxj wrote

This is very interesting! I’m a fat neck beard who works in medicine, but one of my colleagues who was interested in baseball went on to work for the Boston Red Sox— if you’re interested in these analyses, maybe ping a baseball team or two and see if they’re interested in this kind of work. Very cool topic for ML!

2

Tea_Pearce t1_iwfyifj wrote

this is gold! great write up 👍

2

csreid t1_iwm0di2 wrote

I've always kicked around the idea of using a Hawkes process to model the concept of "momentum" in sports (which statistically doesn't seem to exist but has tons and tons of people who will chase you with weapons when you tell them that), but I'm lazy.

You wouldn't be willing to open source the code here, would you? 😅

1

CommunicationAble621 t1_iwn5q5a wrote

Just wondering - this would be a gamma distribution approach to the problem, right? Waiting time until injury? Just like machine "time-to-fail"?

1

ssharpe42 OP t1_iwz22gq wrote

That is a cool idea though…basically just take a simple dataset like binary indicator if pitcher had a good start or not and see what comes out.

1