Submitted by RingoCatKeeper t3_zypzrv in MachineLearning

I built an iOS app called Queryable, which integrates the CLIP model on iOS to search the Photos album offline.

Photo searching performace of search with the help of CLIP model

Compared to the search function of the iPhone Photos, CLIP-based album search capability is overwhelmingly better. With CLIP, you can search for a scene in your mind, a tone, an object, or even an emotion conveyed by the image.

How does it works? Well, CLIP has Text Encoder & Image Encoder

>Text Encoder will encode any text into a 1x512 dim vector
>
>Image Encoder will encode any image into a 1x512 dim vector

We can calculate the proximity of a text sentence and an image by finding the cosine similarity between their text vector and image vector

The pseudo code is as follows:

import clip

# Load ViT-B-32 CLIP model
model, preprocess = clip.load("ViT-B/32", device=device)

# Calculate image vector & text vector
image_feature = model.encode_image("photo-of-a-dog.png")
text_feature = model.encode_text("rainly night")

# cosine similarity
sim = cosin_similarity(image_feature, text_feature)

To use Queryable, you need to first build the index, which will traverse your album, calculate all the image vectors and store. This takes place only ONCE, when searching, only one CLP forward for the user's text input query, below is a flowchart of how Queryable works:

How does Queryable works

On Privacy and security issues, Queryable is designed to be totally offline and will Never request network access, thereby avoiding privacy issues.

As it's a paid app, I'm sharing a few promo codes here:

Requirement:
- Your iOS needs to be 16.0 or above.
- iPhone XS/XSMax or below may not working, DO NOT BUY.

9W7KTA39JLET
ALFJK3L6H7NH
9AFYNJX63LNF
F3FRNMTLAA4T
9F4MYLWAHHNT
T7NPKXNXHFRH
3TEMNHYH7YNA
HTNFNWWHA4HA
T6YJEWAEYFMX
49LTJKEFKE7Y

YTHN4AMWW99Y
WHAAXYAM3LFT
WE6R4WNXRLRE
RFFK66KMFXLH
4FHT9X6W6TT4
N43YHHRA9PRY
9MNXPAJWNRKY
PPPRXAY43JW9
JYTNF93XWNP3
W9NEWENJTJ3X

Hope you guys find it's useful.

147

Comments

You must log in or register to comment.

CallMeInfinitay t1_j27i13b wrote

It's a shame this is only available for iOS 16, sounds useful.

2

RingoCatKeeper OP t1_j27jmic wrote

Major issues was CoreML operator support, another reason was, iOS 16.0 may block away some very old iPhone (below X), otherwise users paid but run CLIP very laggy, which is bad experience. Of course I admit that the UI of iOS 16.0 is really ugly

7

Taenk t1_j27vxn1 wrote

I think you could port this to the M-chip MacBooks as well.

2

brucebay t1_j27z7xg wrote

Great idea. Hope you will earn more money after people recognize its value.

34

beatle5 t1_j285pjr wrote

All the codes seem to be redeemed already. Could you please DM me one if you’re okay with me trying the app out? I’m working on CLIP and cross-modal learning at university and am interested to try out an application that uses large language models.

2

Vendraaa t1_j286gee wrote

If you port it to android as well, I'd like to try it

2

learn-deeply t1_j2875vr wrote

How do you do the top-k neighbor search in iOS? Is there a library for it?

2

Evoke_App t1_j288jn8 wrote

Google photos has the same feature, do you find this has better search capabilities than google photos?

Though offline search is a godsend.

12

RingoCatKeeper OP t1_j288x67 wrote

This is not comparable. Google runs models on professional GPUs, while this app can only use Apple chips, so there is a big difference in the size of models that can be run.
Offline search lets you not worry about anyone invading your album privacy, including Google.

17

Several-Aide-8291 t1_j28a160 wrote

Overall the app looks good. A few suggestions: 1. Allow user to mark bad results so that they are ignored next time. 2. Add ability to scroll, right now it only gives top 12 results but in my album there are consistently many more results. 3. Once I find a photo there is not much I can do with it, adding share/save/edit would enhance the experience

17

Evoke_App t1_j28a8h5 wrote

How are you currently promoting it? And is it a one time purchase?

I think people would be more open to it as a free trial and then subscription after that. You'd have recurring income too.

I'm curious because depending on how you're promoting it, I'd be more than happy to help.

1

RingoCatKeeper OP t1_j28adm9 wrote

Thanks for your useful advice!

1.Great idea!

2.I will change the number to larger(or even configurable) in the next version.

3.Some functions may requires network, but it's cool for the idea of manually adding results.

2

RingoCatKeeper OP t1_j28aoff wrote

Yes, it's a one time permanent purchase.

I agree with you on "free trail then subscription", actually I was going to do the same thing. However, a In-App Purchase requires network connection.

Currently, I'm promoting it at reddit, produchunt, and nowhere, It would be great if you could help me.

1

undefdev t1_j28cf5b wrote

Nice! This seems to work better than iOS own photo search, thanks!

2

SweatyBicycle9758 t1_j28jmv2 wrote

Then I would suggest that feature too, to be able to look up images based on dates filter too. Honest opinion, personally I wouldn’t put money into something which Apple already does(of course based on comments I see ur app does better in similar context pictures) for someone like me dates are more important as I could remember, if that feature is gonna be included I’ll definitely take it. Good luck

2

Evoke_App t1_j28kvoy wrote

Oh, I see. Do you need your app to have a permanent network connection for subscription?

I would imagine to purchase the sub the customers need to be online, but their data gets logged into a separate server that is permanently online, so it doesn't matter if they go offline, they'll still be charged until they unsub

And for promotion, I was referring more to writing descriptions for your product hunt, but if I find anyone that's looking for something like this on Reddit, I'll tag you and bring up your app ;)

1

londons_explorer t1_j28kvqp wrote

There is no latency constraint - it's a pure streaming operation, and total data to be transferred is 1 gigabyte for the whole set of vectors - which is well within the read performance of apples ssd's.

This is also the naive approach - there are probably smarter approaches by doing an approximate search with very low resolution vectors (eg. 3 bit depth), and then a 2nd pass of the high resolution vectors of only the most promising few thousand results.

3

Steve132 t1_j28og0o wrote

There's an O(n) algorithm for top k partitioning that could be much much faster than .sort() when you have thousands of elements.

QuickSelect. In C++ its available as std::nth_element in swift I couldn't find it directly but you can implement it in a few lines using .partition as a subroutine

7

Steve132 t1_j28oxex wrote

One thing you aren't taking into account is that the computation of the similarity scores is O(n) but the sorting he's doing is n log n which for 1m might dominate especially since it's not necessarily hardware optimized

1

TheIdesOfMay t1_j28qz2q wrote

Great implementation! What is the run time for calculating the CLIP embeddings per image? And inference latency? Were any low-level model optimisations made for it to run on iOS hardware or am I deeply underestimating the power of these new chips lol

5

omgpop t1_j28w0m9 wrote

Does not work for me at all on iPhone XS. All photos indexed and the search finds nothing. Want my money back lol. Since there are no settings, there’s nothing to troubleshoot. It simply does not work, search produces 0 results.

1

RingoCatKeeper OP t1_j28wvxw wrote

I'll check it out, got notice from another user with xsmax not working, I guess it's a chip problem. I'm sorry for that. You can refund first, and I'll also confirm and consider ban the phone before iPhone 11.

1

pridkett t1_j292fr1 wrote

I used one of the codes to start poking around (X6RPT3HALW6R). I was optimistic about it working with M1/M2 Macs too. Downloaded the iPad version onto my M2 iPad Air and started a query and it crashed after I clicked to have it start indexing the photos.

Currently playing with it on my iPhone. Seems really neat. Would be great if there were a way to synchronize the indexes across devices through iCloud (or even iCloud drive).

I've had similar thoughts but doing something with X-CLIP to search the videos on your phone for when you're looking for a specific video (I take a lot of short videos of my family).

3

RingoCatKeeper OP t1_j293hgk wrote

It's an interesting idea to synchronize the indexes for different devices, however anything related with network connection is a disaster of an app that reads all you photos. Maybe there exists a better way to do this.

On the issue of running on M2, I'll check it out later.

Your project sounds interesting, please get me noticed when there is a product.

1

hermlon t1_j295dp1 wrote

This is a really cool idea. I'm currently using the CLIP model for an image retrieval task at university. We're using the Ball Tree for finding the closest images to the text in the vector space. What algorithm are you using for finding the nearest neighbors?

1

RingoCatKeeper OP t1_j295uu0 wrote

I'm using the simple cosine similarity between embedding vectors. There were some optimized work by Google called ScanNN, which is much faster on large scale vector similarity search. However, it's much more complicated to port this model to iOS.

1

pridkett t1_j296ivt wrote

That's why I was suggesting just saving the index to iCloud files. You're not providing the synchronization nor do you need to provide servers to handle more people. The data stay secure in iCloud.

I also want to add that I really like how you've managed to do this in a way that is privacy centric. It also has a nice side effect of making things much more scalable - you just need to provide someplace to download the models, which are infrequently needed (likely only on a new device)?

2

ElectronicCress3132 t1_j29c108 wrote

Btw, one should take care not to implement the worst-case O(n) algorithm (which is Quickselect + Median of Medians), because it has high constant factors in the time complexity which slow it down in the average case. QuickSelect + Random Partitioning, or Introselect (the C++ standard library function mentioned) have good average time complexities and rarely hit the worst case.

1

OmarMola69 t1_j29gqod wrote

Guys i need some help in my project can some one contact with me in +201012505830

−4

1995FOREVER t1_j29ocu0 wrote

I used RHT3NMLHPFMW. Gonna try it out on my ipad 6. Thanks!

edit: does not work on ipad 6. I think anything lower than a a13 wouldn't work since it crashes in iphone XS

1

NoThanks93330 t1_j2aam8x wrote

Damn I need this for android.

Does anyone know if there is something similar available for Android?

1

unicodemonkey t1_j2auxih wrote

Hi. Thanks for the code, I've used 7HWRPY9RXEWY.
The app does work for me even with a fairly large index (35K photos) and I have some feedback to share:

  • a first-time user can type in a query before being asked to build the index. Might be better to offer indexing right after the first start.
  • the query doesn't get re-run automatically after indexing completes, so the user sees the "no index, no results" response to the initial query until they try searching again
  • the indexer has to rely on low-res thumbnails when processing photos that have been offloaded to iCloud. Does this affect accuracy? I'm not sure if there are enough pixels for CLIP.
  • such photos don't get redownloaded from iCloud when I'm viewing them in the search results. I just get blurry thumbnails.
  • there's no way to actually do anything useful with a search result. The "Share" button would be a welcome addition, as well as metadata display and a viewer that supports the zoom gesture.
  • I see you l've extended the number of search results from 12 to 120, great. Maybe it's possible to load more results dynamically when scrolling instead of a configurable hard limit.
  • I think ranking just by similarity is not intuitive enough, though. Recent photos or favorites are likely to be more important for the user, for example. Just an idea for future improvement - a simple ranking model over CLIP similarity and a number of other features might be useful.
  • Would be nice to have search restricted by a particular album
  • The model does produce unexpected results at times - e.g. "orange cat" seems to be a fitting description for a gray cat sitting on an orange blanket.
1

stablebrick t1_j2axj0s wrote

I really hope apple picks this up and makes it an actual feature this is great

2

RingoCatKeeper OP t1_j2b3dtq wrote

Thanks for your long feedback, I've read it twice.

1.re-run the initial query is a great idea, will try to update in the next version.

2.For a ViT-B-32 CLIP model, it will resize all imagines input to the size by 224x224, which is even smaller than that thumbnails, so this will do no harm to performance.

3.Download imagines from iCloud is easy to implement, however it requires network access. It's a disaster for an app that reads all your photos having access to a network, so I made a compromise here.

4.I've tried dynamic scrolling but it cost more time to fetch results, will consider do that way.

5.Search from a few specific album names is a better experience, will definitely find how to implement it.

Really thanks for your patient feedback!

1

unicodemonkey t1_j2bdf3g wrote

I think network access would be legitimate if used specifically by the iCloud service to display photos. It probably happens in a separate background process that manages the photo library, not in the app itself. But it's up to you to decide, of course.

2

pridkett t1_j2bigzy wrote

I trust iCloud a whole lot more than I trust a random service to store my content. I also trust iCloud more than Google Drive. I also have all my photos in iCloud - so yes, I trust iCloud.

1