Viewing a single comment thread. View all comments

learn-deeply t1_j287u7z wrote

So it's calculating nearest neighbor compared to all of the images in the index every time a new search is done? Might be slow past say, 1,000 images.

1

londons_explorer t1_j28cfh3 wrote

It should scale to 1 million images without much slowdown.

1 million images * 512 vector length= 512 million multiples, which the neural engine ought to be able to do in ~100ms

4

learn-deeply t1_j28hirz wrote

Is that calculation taking into account memory (RAM/SSD) access latencies?

1

londons_explorer t1_j28kvqp wrote

There is no latency constraint - it's a pure streaming operation, and total data to be transferred is 1 gigabyte for the whole set of vectors - which is well within the read performance of apples ssd's.

This is also the naive approach - there are probably smarter approaches by doing an approximate search with very low resolution vectors (eg. 3 bit depth), and then a 2nd pass of the high resolution vectors of only the most promising few thousand results.

3

Steve132 t1_j28oxex wrote

One thing you aren't taking into account is that the computation of the similarity scores is O(n) but the sorting he's doing is n log n which for 1m might dominate especially since it's not necessarily hardware optimized

1

londons_explorer t1_j28ufby wrote

Top K sorting is linear in computational complexity, and I doubt it will dominate because it just needs to be done on a single number rather than a vector of 512 numbers.

1

RingoCatKeeper OP t1_j2885ds wrote

You're right. There were some optimized work by Google called ScanNN, which is much faster on large scale vector similarity search. However, it's much more complicated to port this model to iOS.

1