YOLOBOT666 t1_j7iov1k wrote on February 7, 2023 at 1:54 AM

Reply to comment by mostlyhydrogen in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

Nice! I guess the heuristic part is how you use the queries at every iteration and make it “usable” in your iterative approach. What’s the size and dimension of your dataset? These graph-based ANNs are memory intensive, wondering what can you do for your dimensions?

If it’s a public repo/planning to release it on GitHub, I’d be happy to join!

mostlyhydrogen OP t1_j7km5j2 wrote on February 7, 2023 at 2:09 PM

Thanks for the offer! This is a work project, though. I'm working with images. I can't give too many details due to confidentiality, but we're sub-billion images scale.

Usability is determined by trained annotators. If they find an object of interest and want to harvest more training data, they do a reverse image search across the whole training data and tag true matches.