Viewing a single comment thread. View all comments

GitGudOrGetGot t1_ix3s761 wrote

>First the Bert model generates word embeddings by tokenizing strings into a pre trained word vector, then you run those embeddings through a transformer for some type of inference

Could you describe this a bit further in terms of inputs and outputs?

I think I get htat you go from a string to a list of individual tokens, but when you say you then feed that into a Pre Trained Word Vector, does that mean you output a list of floating point values representing the document as a single point in high dimensional space?

I thought that's specifically what the transformer does, so not sure what other role it performs here...

1