Submitted by natural_language_guy t3_ypxyud in MachineLearning

I am trying to build an NER model, but want multiple options for the spans...ex:

"I like green cats." -> {BOBI, BIII, BOOO, etc}

that I can feed into another algorithm to choose based on downstream criteria.

​

With something like T5, I would modify the beam search to give me a list of generative texts from most probable to nth most probable. With BERT, I don't know how to do this because I can't condition the result of a token classification on the previous one.

12

Comments

You must log in or register to comment.

suflaj t1_ivlokew wrote

BERT has no decoder. Hence you would need to add a decoder. You can use BERTs pretrained weights with huggingfaces EncoderDecoderModel.

8

new_name_who_dis_ t1_ivq7gwm wrote

To make Beam search work with BERT you'd need to change the way BERT works, which you could do but it's probably too complicated for what you want to do.

What you could do instead is just using a non-determenistic classifier, like a Mixture Density Network. It predicts several outputs as well as their likelihood.

2

natural_language_guy OP t1_ivsdjo8 wrote

If the advice is to discard BERT and go with MDN, do you think MDNs in this case would perform better than some large generative model like t5 with beam search?

The MDN does look interesting, and it looks like there are some libraries available for it already, but I don't have much experience using deep prob. models.

1

new_name_who_dis_ t1_ivtav2j wrote

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs

1