I've got an interesting project, and I'm looking for recommendations for a suitable model.

The project is basically a priority queue. Items of various types arrive in the queue in an out-of-order sequence and must be handled according to various factors, including timeliness.

The upside is that I have a massive training data set to work with - a vast assortment of examples in which items were chosen according to some institutional logic. I'd like to develop a model that can learn that logic and replicate it to choose items in a similar manner.

Note that I'm not trying to schedule the selection of items - no planning is required. Rather, at each time step, the model will score all of the items currently in the queue according to item-specific criteria, and complete the one with the highest urgency. For simplicity, I presume that an item of any type can be completed in one time step.

The basic algorithm seems very straightforward: for each item class, define a set of properties; and fore each time step, calculate a score for each item based on those properties, and pick the item with the highest score. The primary criterion is timeliness - some items have a target deadline; others don't, but shouldn't sit too long. The secondary criterion is complexity - some items can be handled more quickly than others. I can normalize each property to a range of 0-1. The model simply learns the weights of the properties that contribute to the scoring, such that the selection process is consistent with the training data.

Where I'm getting stuck is the selection of a model. An ordinary backpropagation neural network doesn't seem appropriate, because if item #1 is high priority and item #2 is just a little higher priority for some reason, I don't necessarily want to reduce the weights for the scoring of item #1. Reinforcement learning also doesn't seem appropriate, since the goal is not to maximize the selection based on a known scoring algorithm, but rather to learn the scoring algorithm itself that is inherent in the training data set.

Any suggestions?

Comments

You must log in or register to comment.

Jelicic t1_irwd5yh wrote on October 11, 2022 at 3:16 PM

#78,228

Seems like a ranking problem. XGBoost might be appropriate (learning to rank)

Dragonsareforreal t1_irwf8cg wrote on October 11, 2022 at 3:30 PM

#78,326

Maybe create a priority embedding space using contrastive learning? You can just find priority within a mini batch of examples.

lmericle t1_irxbw0u wrote on October 11, 2022 at 7:02 PM

#79,911

Reinforcement learning does seem like a good option, but you'd need a bit of nuance in your reward function for training to work IMO. It would need to be more complex (and less sparse) than just "did you choose the correct next item or not".

IAmAFedora t1_irxspl4 wrote on October 11, 2022 at 8:49 PM

#80,710

Replying to Jelicic (#78,228)

Also seems like a "learning to rank" problem to me! There's a lot of good literature out there on this stuff.

Here's a widely cited paper from Microsoft research on how to formulate a loss function for listwise (rather than pairwise) ranking.

Suppose your model assigns each object a "score" $s_1, \ldots s_n$.

Here's the probability model -- if we have a collection of objects, we will take object $i$ without replacement with probability proportional to $\exp(s_i)$. Repeat until you run out of objects.

E.g. if we only have two objects, then we will take object $1$ with probability $\frac{\exp(s_1)}{\exp(s_1) + \exp(s_2)} = \text{sigmoid}(s_1-s_2)$. This looks a lot like what you'd do if you wanted to do binary classification on every pair of objects!

Let $\pi$ be a permutation of the natural numbers up to $n$ where $\pi(1)$ is the index of your "first" object, $\pi(2)$ the index of the second, etc.

Then the probability that the above procedure assigns to observing the objects in the order specified by $\pi$ is

$$ P(\pi | s_1, \ldots s_n) = \prod_i \frac{\exp(s_{\pi(i)})}{\sum_{j\geq i} \exp(s_{\pi(j)})} $$

Note that we get a distribution over rank-orderings of your objects from our scores!

The average value of this over a set of ground-truth rankings is the cross entropy between the empirical rank distribution and your model's rank distribution.

Pretty neat, I think.

ZealousidealGrass365 t1_irzzw8w wrote on October 12, 2022 at 9:01 AM

#84,568

I was working on a similar problem and found optimizing with Mixed Integer Programming to be helpful start. There’s a paper about receiving trucks dynamically on an inbound cross dock that is First come first serve. I since added a priority system but I found the algorithms in that paper to be helpful setting up the model. Lots of good ideas in there. Also going to take a look at the learning to rank problem it looks super helpful also

http://www.aimsciences.org/journals/displayArticlesnew.jsp?paperID=11272

xbno t1_is3y22o wrote on October 13, 2022 at 2:59 AM

#91,039

Sounds like me deciding which stuff from the grocery store to put away first lol. Always ice cream. Frozen fish ehhh. Cheese definitely high up there. Candy - yes if kids are around.