matth0x01
matth0x01 t1_j7ayc9e wrote
Reply to comment by Ggronne in Information Retrieval book recommendations? [D] by Ggronne
Seems that you are more interested on the crawling and ETL side.
Maybe you should look more into Data warehouse or Data lake literatur. Especially the shift in paradigm from ETL (extract, transform, load) to ELT (extract, load, transform) respectively schema-on-read.
matth0x01 t1_j76dt6k wrote
Depends a bit on your skill level and what you want to achieve.
I started with the Introduction to Information Retrieval (2008) book, which was quite math-heavy back then. But I learned a lot and found it a good starting point.
You get the concept of decompounding, reverse index, ranking functions, etc.
Newer IR strategies involve word2vec methods for item representation instead of handcrafted ones or directly learning the search ranking function, which is a different beast compared to traditional search engines.
matth0x01 t1_j2x49gm wrote
Reply to comment by unkz in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Thanks - I think I got it. Kind of new to me why language models use perplexity instead of log-likelihood which is a monotonic function of perplexity.
From Wikipedia it seems that perplexity is in unit "words" instead of "nats/bits", which might be more interpretable.
Are there other advantages I overlook?
matth0x01 t1_j2vxl6g wrote
Reply to comment by prototypist in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Thanks! Hm, seems to be a measure of sharpness for the predicted words?
matth0x01 t1_j2vx7z4 wrote
Reply to comment by unkz in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Yes, I know the concept, but where's the connection to the pruning approach here?
matth0x01 t1_j2u5rwm wrote
Reply to comment by bloc97 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Sorry - What's meant by perplexity here?
matth0x01 t1_j7c3smm wrote
Reply to comment by Ggronne in Information Retrieval book recommendations? [D] by Ggronne
Sorry, my library seems a bit outdated on that side.
But the one from Wikipedia looks great at first sight. Ralph., Kimball (2004). The data warehouse ETL toolkit : practical techniques for extracting, cleaning, conforming, and delivering data