Also seems like a "learning to rank" problem to me! There's a lot of good literature out there on this stuff.
Here's a widely cited paper from Microsoft research on how to formulate a loss function for listwise (rather than pairwise) ranking.
Suppose your model assigns each object a "score" $s_1, \ldots s_n$.
Here's the probability model -- if we have a collection of objects, we will take object $i$ without replacement with probability proportional to $\exp(s_i)$. Repeat until you run out of objects.
E.g. if we only have two objects, then we will take object $1$ with probability $\frac{\exp(s_1)}{\exp(s_1) + \exp(s_2)} = \text{sigmoid}(s_1-s_2)$. This looks a lot like what you'd do if you wanted to do binary classification on every pair of objects!
Let $\pi$ be a permutation of the natural numbers up to $n$ where $\pi(1)$ is the index of your "first" object, $\pi(2)$ the index of the second, etc.
Then the probability that the above procedure assigns to observing the objects in the order specified by $\pi$ is
Note that we get a distribution over rank-orderings of your objects from our scores!
The average value of this over a set of ground-truth rankings is the cross entropy between the empirical rank distribution and your model's rank distribution.
IAmAFedora t1_irxspl4 wrote
Reply to comment by Jelicic in [P] Model for choosing items from a queue based on priority by reckless_commenter
Also seems like a "learning to rank" problem to me! There's a lot of good literature out there on this stuff.
Here's a widely cited paper from Microsoft research on how to formulate a loss function for listwise (rather than pairwise) ranking.
Suppose your model assigns each object a "score" $s_1, \ldots s_n$.
Here's the probability model -- if we have a collection of objects, we will take object $i$ without replacement with probability proportional to $\exp(s_i)$. Repeat until you run out of objects.
E.g. if we only have two objects, then we will take object $1$ with probability $\frac{\exp(s_1)}{\exp(s_1) + \exp(s_2)} = \text{sigmoid}(s_1-s_2)$. This looks a lot like what you'd do if you wanted to do binary classification on every pair of objects!
Let $\pi$ be a permutation of the natural numbers up to $n$ where $\pi(1)$ is the index of your "first" object, $\pi(2)$ the index of the second, etc.
Then the probability that the above procedure assigns to observing the objects in the order specified by $\pi$ is
$$ P(\pi | s_1, \ldots s_n) = \prod_i \frac{\exp(s_{\pi(i)})}{\sum_{j\geq i} \exp(s_{\pi(j)})} $$
Note that we get a distribution over rank-orderings of your objects from our scores!
The average value of this over a set of ground-truth rankings is the cross entropy between the empirical rank distribution and your model's rank distribution.
Pretty neat, I think.