Submitted by Devinco001 t3_yzh6v1 in MachineLearning
[removed]
Submitted by Devinco001 t3_yzh6v1 in MachineLearning
[removed]
I am going to use the embeddings for clustering the text in an unsupervised manner to get the popular intents actually.
1,2. Would be fine with a bit of trade off in accuracy. Time is the main concern, since I want it not to take more than a day. Maybe, I have to use something other then BERT
Googled them out and RoBERTA seems to be the best choice. Much better than base BERT or larger BERT
I actually asked this because Google collab has some restrictions on the free usage
Thanks, really good article
If you want to cluster sentences, take a look in LABSE. This model was specially designed for embedding extraction. https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html?m=1
This looks really interesting, thanks. Is it open source?
There are several pretrained implementations:
Will surely check them out, thanks
>First the Bert model generates word embeddings by tokenizing strings into a pre trained word vector, then you run those embeddings through a transformer for some type of inference
Could you describe this a bit further in terms of inputs and outputs?
I think I get htat you go from a string to a list of individual tokens, but when you say you then feed that into a Pre Trained Word Vector, does that mean you output a list of floating point values representing the document as a single point in high dimensional space?
I thought that's specifically what the transformer does, so not sure what other role it performs here...
what length of texts? sentence? paragraph? page? multiple pages? books?
A sentence might average 10 tokens, a page 750 tokens, a book 225,000 tokens. So 25 million to 562.5 billion tokens.
Yes, they are short, conversational based. Business intent. Average token length around 10. Total approx 2.5 million sentences
skelly0311 t1_iwzz7td wrote
For starters, why are you generating word embeddings? First the Bert model generates word embeddings by tokenizing strings into a pre trained word vector, then you run those embeddings through a transformer for some type of inference. So, I'll assume you're feeding those word embeddings into an actual transformer for inference. If this is true.