Submitted by RingoCatKeeper t3_zypzrv in MachineLearning
I built an iOS app called Queryable, which integrates the CLIP model on iOS to search the Photos album offline.
Photo searching performace of search with the help of CLIP model
Compared to the search function of the iPhone Photos, CLIP-based album search capability is overwhelmingly better. With CLIP, you can search for a scene in your mind, a tone, an object, or even an emotion conveyed by the image.
How does it works? Well, CLIP has Text Encoder & Image Encoder
>Text Encoder will encode any text into a 1x512 dim vector
>
>Image Encoder will encode any image into a 1x512 dim vector
We can calculate the proximity of a text sentence and an image by finding the cosine similarity between their text vector and image vector
The pseudo code is as follows:
import clip
# Load ViT-B-32 CLIP model
model, preprocess = clip.load("ViT-B/32", device=device)
# Calculate image vector & text vector
image_feature = model.encode_image("photo-of-a-dog.png")
text_feature = model.encode_text("rainly night")
# cosine similarity
sim = cosin_similarity(image_feature, text_feature)
To use Queryable, you need to first build the index, which will traverse your album, calculate all the image vectors and store. This takes place only ONCE, when searching, only one CLP forward for the user's text input query, below is a flowchart of how Queryable works:
On Privacy and security issues, Queryable is designed to be totally offline and will Never request network access, thereby avoiding privacy issues.
As it's a paid app, I'm sharing a few promo codes here:
Requirement:
- Your iOS needs to be 16.0 or above.
- iPhone XS/XSMax or below may not working, DO NOT BUY.
9W7KTA39JLET
ALFJK3L6H7NH
9AFYNJX63LNF
F3FRNMTLAA4T
9F4MYLWAHHNT
T7NPKXNXHFRH
3TEMNHYH7YNA
HTNFNWWHA4HA
T6YJEWAEYFMX
49LTJKEFKE7Y
YTHN4AMWW99Y
WHAAXYAM3LFT
WE6R4WNXRLRE
RFFK66KMFXLH
4FHT9X6W6TT4
N43YHHRA9PRY
9MNXPAJWNRKY
PPPRXAY43JW9
JYTNF93XWNP3
W9NEWENJTJ3X
Hope you guys find it's useful.
dmart89 t1_j27at4r wrote
This is cool.