One-Shot on-Device Learning for Image Classifiers Using Classification-by-Retrieval

Classification-by-retrieval is a simple method for developing a neural network-based classifier that does not require computationally intensive backpropagation training. This technology can be used to create a lightweight mobile model with as little as one picture per class or an on-device model that can classify tens of thousands of categories. For example, mobile models can recognize tens of thousands of landmarks using classification-by-retrieval technology.

There are several applications for classification-by-retrieval, including:

  • Education through machine learning (e.g., an educational hackathon event).
  • Image categorization may be quickly prototyped or shown.
  • Custom product recognition (for example, creating a product recognition app for a small/medium-sized firm without the requirement for actual training data or heavy coding).

Continue reading

https://i.redd.it/nb9cqiki6ad81.gif

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/techsucker
πŸ“…︎ Jan 22 2022
🚨︎ report
Google AI Introduces MURAL (Multimodal, Multi-task Retrieval Across Languages) For Image–Text Matching

There is no straight one-to-one translation from one language to another for many concepts. Even when there is, such translations typically contain various connections and meanings that a non-native speaker would easily miss. However, when the idea is anchored in visual examples, the meaning may be more evident. Although each person’s associations with the term may differ significantly, the meaning becomes more evident when they are presented with a visual of the intended concept. Take the word β€œwedding,” for example. In English, a bride in a white gown and a groom in a tuxedo are frequently associated, however in Hindi (ΰ€Άΰ€Ύΰ€¦ΰ₯€), a bride in brilliant colors and a guy in a sherwani may be a more fitting association.

It is now feasible to eliminate ambiguity in translation by displaying a text combined with a supporting image, thanks to recent developments in neural machine translation and image recognition. For high-resource languages like English, previous research has made significant progress in learning image–text combined representations. These representation models aim to store the picture and text as vectors in a shared embedding space where the image and the text describing it are close to each other. For example, ALIGN and CLIP have shown that when given enough training data, training a dual-encoder model (i.e., one with two independent encoders) on image-text pairs using a contrastive learning loss works exceptionally well.

Quick Read: https://www.marktechpost.com/2021/12/06/google-ai-introduces-mural-multimodal-multi-task-retrieval-across-languages-for-image-text-matching/

Paper: https://arxiv.org/pdf/2109.05125.pdf

Google Blog: https://ai.googleblog.com/2021/11/mural-multimodal-multi-task-retrieval.html

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/techsucker
πŸ“…︎ Dec 07 2021
🚨︎ report
Google AI Introduces MURAL (Multimodal, Multi-task Retrieval Across Languages) For Image–Text Matching /r/ArtificialInteligence/…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Dilip-Rajpurohit
πŸ“…︎ Dec 07 2021
🚨︎ report
Image Storage/Retrieval Question

Hello All!

I have been researching my question for a couple of weeks now, but all I can find are examples/explanations for sites which have multiple profiles, which doesn't match my situation. Right now, I am just concerned with the functionality and overall concept of how this would work. I know there are security issues, but I'm not going to deal with those issues until I have the overall concept worked out.

My situation:

I am building a photography site (Node, Express) where the client has pictures they would like to display in a slideshow on each page, where different pages are different image categories. The total image count is around 2k pictures. Each page shows a certain number of images at a time, say 6. Then you click a button to either go to the next 6 pictures or go back to the previous 6 pictures.

Research Completed:

I have searched on Stack Exchange and Google. Through this, I have determined that storing the actual images in a database would not work well in this situation. At first, I was thinking of storing the file paths to the images in the database because that's what people had suggested online (on Stack Exchange, etc.). However, I am now realizing that I think the reason they were doing that was because all of those people were talking about pictures associated with a certain user's profile, which is not what I'm trying to do.

I am now thinking that I could also include some kind of "index" (for lack of a better word) in the image file paths (like: folder/myimage_003). Then, I could just check the "first image" in the set of 6 to see "where I am" in order to load the next 6 images.

Here's my question:

If I store all the images in the host server, would using some kind of increment system (with the file names) be detrimental in terms of functionality/speed? Is there a simpler way of doing this that I'm just not thinking of?

Thank you!!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/drg_prime
πŸ“…︎ Oct 08 2021
🚨︎ report
[R] DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

Local information with multi-atrous convolutions and self-attention with orthogonal components concatenated and aggregated with the global representation to generate the final representation.

The paper shows state-of-the-art image retrieval performance on the Revisited Oxford and Paris datasets.

Paper Link : https://arxiv.org/abs/2108.02927

Unofficial Code : https://github.com/dongkyuk/DOLG-pytorch

πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Oct 05 2021
🚨︎ report
BoofCV v0.38: Better 3D Reconstruction from photos/videos and automatic retrieval of images taken of the same scene. Source in comments. youtu.be/BbTPQ9mIoQU
πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/lessthanoptimal
πŸ“…︎ Aug 03 2021
🚨︎ report
Confusion about illuminance from HDR image retrieval

I have recently found a paper about extracting illuminance from HDR images, captured with a fisheye lens. This is the paper in question: Link. In section III - D there is an equation given [kernel function (2)] of how the process is supposed to work.

My question is, why do we multiply the cosine corrected illuminance by the differentiated area? I am generally a bit confused about the idea of the paper, but that part specifically I really don't get. Also, I want to change the equation to work with HDR equirectangular panoramas and I'm not sure about whether this will work or not.

If anyone could help me out or give me some pointers, that'd be fantastic!

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/dralois
πŸ“…︎ Aug 11 2021
🚨︎ report
A fun video of my colleagues dissertation work - how to do large-scale image retrieval on small objects! He's very proud that it got accepted to T-IP youtube.com/watch?v=D9Dh_…
πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/isml_
πŸ“…︎ Jul 27 2021
🚨︎ report
[D] Similar Image Retrieval

I am trying to build a similar image retrieval system where given an image, the system is able to show top 'k' most similar images to it. For this particular example, I am using the DeepFashion dataset where given an image containing say a shirt, you show top 5 clothes most similar to a shirt. A subset of this has 289,222 diverse clothes images in it. Each image is of shape: (300, 300, 3).

The approach I have includes:

  1. Train an autoencoder
  2. Feed each image in the dataset through the encoder to get a reduced n-dimensional latent space representation. For example, it can be 100-d latent space representation
  3. Create a table of shape m x (n + 2) where 'm' is the number of images and each image is compressed to n-dimensions. One of the column is the image name and the other column is a path to where the image is stored on your local system
  4. Given a new image, you feed it through the encoder to get the n-dimensional latent space representation
  5. Use something like cosine similarity, etc to compare the n-d latent space for new image with the table m x (n + 2) obtained in step 3 to find/retrieve top k closest clothes

How do I create the table mentioned in step 3?

I am planning on using TensorFlow 2.5 with Python 3.8 and the code for getting an image generator is as follows:

image_generator = ImageDataGenerator(
    rescale = 1./255, rotation_range = 135)

train_data_gen = image_generator.flow_from_directory(
    directory = train_dir, batch_size = batch_size,
    shuffle = False, target_size = (IMG_HEIGHT, IMG_WIDTH),
    class_mode = 'sparse'

How can get image name and path to image to create the m x (n + 2) table in step 3?

Also, is there any other better way that I am missing out on?

Thanks!

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/grid_world
πŸ“…︎ Jun 22 2021
🚨︎ report
Visual Verification Image Retrieval?

After an rx is sold, can we go back & look at the production images? It wasnt in the training module & i cant find anything in rxconnect.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/NayChan07
πŸ“…︎ Jun 18 2021
🚨︎ report
Discussion: How does Google Lens image retrieval work? And how is it so fast?

The question is in the title but I want to add some points to guide the conversation a bit:

  1. I know that unless you're in Google working on that project, you probably don't know the answer. In fact, I'm more interested in the discussion points rather than knowing the real answer.
  2. While being creative, let's challenge ourselves. Google lens takes less than one second to come up with results. So let's not discuss ideas that would take over a second on a desktop GPU. This also means I'm not inviting you to enumerate all the image retrieval techniques you know. Stick to the problem at hand!

Some of my observations on Google Lens:

- It seems to work on a local feature as well as global concept level. For instance, if I take a photo of a distinctive calculator, it will return a similar calculator, not just any calculator. If I take a photo of my hand, it will return any hand, probably because there's nothing distinctive about my hand, but what's more interesting is that it doesn't rely on the presence of distinctive features.

- It may be using text / logos to help with matching. I took a photo of a calculator and it showed me one of the same brand

- It seems to pick out objects and focus on them specifically. I took a photo of a bare kitchen counter-top with a tea kettle in the background. It returned a tea kettle!

πŸ‘︎ 32
πŸ’¬︎
πŸ‘€︎ u/_4lexander_
πŸ“…︎ Mar 18 2021
🚨︎ report
Best way to optimise storing images for fast retrieval on search?

For an application that allows users to upload images with a description and related tags,
What would be the best way to:

  1. store images and have them quickly accessible to website
  2. have the user-uploaded description and tags stored together
  3. have this optimised for a goood search experience, i.e. user searches "red bag and green shoes" and those images come up fast.
  4. backend for the search database: elasticsearch or a graphdb or a combination?

I would appreciate any ideas, suggestions, thought patterns.

Thank you

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/tallwithknees
πŸ“…︎ May 31 2021
🚨︎ report
Using OCR + Image segmentation for text information retrieval

Hi,

I'm working on information retrieval from documents such as invoices where the goal is to extract elements such as the date of the invoice, amounts, sender/receiver, reference etc ...

My dataset is comprised of dozens of thousands of various type of invoices. We can considered it labelled with a reasonable accuracy. I work with mainly scanned documents so the first step is to use OCR to retrieve words and numbers, as well as their position.

Using a bit of hand-engineered features and some basic NLP techniques to build a dictionary of keywords, I was able to come up with some interesting results using gradient boosting and treating it as a classification problem bounding-box-wise (accuracy of 90+% on most fields, with similar recall and precision).

However, because of the nature of the feature engineering, a lot of context information is lost, so I would like to try and see if a deep-learning base model could achieve better results.

My idea is the following :

  1. Run an OCR on the image
  2. Use a dictionary of keywords (for instance, "date","reference", "invoice","total" etc ...) to encode a custom image. I was thinking of using one channel for each keyword; 1 for all the pixels contained in the bounding boxes where the OCR found said keyword and 0 for the rest. Since my dictionary is around 300 words, I will end up with an image with 300 channels. I also figured why not add one channel for the actual image that might contain information such as lines, background colors etc ... On top of that, I also add a few channels with information such as "Does the bounding box contains a date, an amount, how many digits, how many alpha etc ..."
  3. Consider the problem as a supervised image segmentation. I feed the network the custom image over 300+ channels, as well as the target that is also a custom image with each channel representing each field of interest (using the same encoding strategy as for the keywords)

I'm familiar with how CNN works, but I'm by no means an expert. I've used a classic U-net architecture and tried a couple of loss functions (dice loss and focal loss considering the highly imbalanced nature of the problem). I have some results, but it's definitely not comparable to the good old gradient boosting.

Question 1 : Am I too optimistic on making this work with a CNN or are there any alterations I can make to achieve similar if not better results than the gradient boosting ?

Question 2 : I tried overfitting my network with a sm

... keep reading on reddit ➑

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Random-Learning
πŸ“…︎ Mar 28 2021
🚨︎ report
[R] Training Vision Transformers for Image Retrieval: consistent and significant improvements of transformers over convolution-based approaches arxiv.org/abs/2102.05644
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/downtownslim
πŸ“…︎ Feb 12 2021
🚨︎ report
Fvid - Encode any file as a video using 1-bit colour images to survive compression algorithms for data retrieval github.com/AlfredoSequeid…
πŸ‘︎ 38
πŸ’¬︎
πŸ‘€︎ u/LuvPastelPink
πŸ“…︎ Oct 10 2020
🚨︎ report
UPDATE: Working with u/bronfoth u/JohnTruthseekerSmith u/HugeRaspberry, I'm in the Defense Personnel Records Info Retrieval System (image#1) in hopes of obtaining my DA-31s. I'd like your input next. Which docs should I select before submitting my request? (image#2) Thank you. reddit.com/gallery/isowxo
πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/Bill_Rausch
πŸ“…︎ Sep 14 2020
🚨︎ report
Latest development of content-based image retrieval

I used to develop applications for image similarity search. The methodology was basically training a CNN on labeled data under a supervised learning setting. Then the output from the global pooling layer is used (after appropriate normalization) as the embedding, where similarity metrics (e.g. cosine similarity) can then be computed on these embeddings. As far as I know this is one popular approach for training an image feature extractor. However this requires a decent amount of richly annotated data, which is expensive.

There's methodology such as MoCo (https://arxiv.org/abs/1911.05722) which leverages the idea of triplet and contrastive loss. The "positive" image in the triplet is constructed from an augmentation of the "query" image, while the "negative" image is randomly selected. In this way you can get an image embedding without requiring annotated data. However whether the embedding model trained in this way is sufficient for image retrieval is uncertain from my point of view.

What's your thought on image retrieval based on embedding models trained in a completely unsupervised setting? Is it still dominant to use interim layers from models trained on classification tasks for extracting images features, or is there major development in recent years which enables training decent feature extractors without annotated data for image retrieval?

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/manfredcml
πŸ“…︎ Feb 21 2021
🚨︎ report
Content Based Image Retrieval (CBIR) for just two image comparison

Hello, can anyone tell me how can I check the similarity between just two images and get a probability score on this? Using the one of the best similarity algorithms (CBIR).

I want to implement this in Python. Any advice would be great. Thank you.

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/stroke4ai
πŸ“…︎ Jun 28 2020
🚨︎ report
Microsoft Releases Distributed Conditional Image Retrieval in Open Source Library v.redd.it/vh7x6pohiio51
πŸ‘︎ 27
πŸ’¬︎
πŸ‘€︎ u/mhamilton723
πŸ“…︎ Sep 21 2020
🚨︎ report
Image retrieval

Dear users and developers of duckduckgo,

as you have noticed how search engines use image retrieval, it is quite interesting to see that only three search engines use this feature. Namely Google, Bing and Yandex. The latter search engine outperforms the two others I named. Especially due to the fact that image retrieval of Yandex looks for familiar images. Google gets words from images and Bing doesn't always show familiar images. Google on the other hand can use image retrieval for finding terms from images.

Is it possible to make duckduckgo have image retrieval? If you reach the quality of Yandex in terms of image retrieval, than I can congratulate you for doing so. And here's my dream about this feature, where you can look for to make duckduckgo even outperform Yandex. Making image retrieval avialable not just for finding similar images, but also links and videos.

How much effort does it take to get such a feature? And what are your thoughts on this feature? Comment below to share your thoughts

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Redpill_Creeper
πŸ“…︎ Jul 19 2020
🚨︎ report
[D] Content-Based Image Retrieval - guideline + PyTorch implementation by CV researcher

Content-based image retrieval is an important task in CV, as it allows you to find images containing some attributes which are not in the image metadata.

To that end, we compiled guideline on how to build such system together with underlying concepts' explanations.

We cooked example task: find face images with certain attributes (we use CelebA dataset), that we approach by using Siamese Networks / Triplet Loss.

Here, are the main points:

Prepare balanced training triplets.

They should consist of three elements

  1. An image (which is the anchor),
  2. Its attributes vector (which is positive),
  3. And a negative attributes vector (the negative).

As for creating negative attributes, common strategy, is to do the following:

  • Sample a random attributes vector from the training data,
  • Check that is different than our positive vector,
  • Use it as negative.

note:
It's just one example/typical strategy. I would love to hear what you do to create negative attributes. Do you know some tricks of the trade that you can share?

How to design the model and train it?

In principle, you need a neural network architecture that learns image and attribute vector embeddings in the same embedding space.

  • To learn the image embeddings, we use a CNN (i.e. ResNet-50) that outputs an N-D vector. "N" is embedding space dimensionality.
  • To learn the attributes vector embeddings use - for example - MLP.

With those at hand we can try to sketch training loop:

anchor = self.CNN(img_anchor)
positive = self.MLP(att_positive)
negative = self.MLP(att_negative)
loss = criterion(anchor, anchor, negative)
optimizer.zero_grad()
loss.backward()
optimizer.step() 

Evaluation of the trained model.

Usually go with either:

  • Precision@K (P@K) -> used when you are only interested in retrieving correctly a limited number of images, like in the recommendation systems,
  • Mean Average Precision (mAP) -> more expensive to compute and less intuitive, but it evaluates a retrieval system deeply.

Remark

During training it's so important to balance easy and hard triples. You start with easy ones, and gradually introduce hard ones. How it looks like in your practice? How do you introduce hard negatives, so that they can be learned?

This is not the end of the story. We also reviewed theory behind content-based image retrieval. If you feel intrigued and willing to check more, here as the whole [article](https://neptune.

... keep reading on reddit ➑

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/kk_ai
πŸ“…︎ Sep 17 2020
🚨︎ report
[R] Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval v.redd.it/ok2x7icxh5031
πŸ‘︎ 57
πŸ’¬︎
πŸ‘€︎ u/AnjanDutta
πŸ“…︎ May 24 2019
🚨︎ report
PyRetri: An Open-Source Deep Learning Based Unsupervised Image Retrieval Library Built on PyTorch (Github and Paper link in article) marktechpost.com/2020/05/…
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/ai-lover
πŸ“…︎ May 09 2020
🚨︎ report
Google's Objectron uses AI to track 3D objects in 2D video with implications for robotics, self-driving vehicles, image retrieval, and augmented reality venturebeat.com/2020/03/1…
πŸ‘︎ 10
πŸ’¬︎
πŸ“…︎ Mar 12 2020
🚨︎ report
Content-Based Image Retrieval - guideline + PyTorch implementation by CV researcher

Content-based image retrieval is an important task in CV, as it allows you to find images containing some attributes which are not in the image metadata.

To that end, we compiled guideline on how to build such system together with underlying concepts' explanations.

We cooked example task: find face images with certain attributes (we use CelebA dataset), that we approach by using Siamese Networks / Triplet Loss.

Here, are the main points:

Prepare balanced training triplets.

They should consist of three elements

  1. An image (which is the anchor),
  2. Its attributes vector (which is positive),
  3. And a negative attributes vector (the negative).

As for the negative attributes, common strategy which preserves the dataset statistics, is to do the following:

  • Sample a random attributes vector from the training data,
  • Check that is different than our positive vector,
  • Use it as negative.

How to design the model and train it?

In principle, you need a neural network architecture that learns image and attribute vector embeddings in the same embedding space.

  • To learn the image embeddings, we use a CNN (i.e. ResNet-50) that outputs an N-D vector. "N" is embedding space dimensionality.
  • To learn the attributes vector embeddings use - for example - MLP .

With those at hand we can try to sketch training loop:

anchor = self.CNN(img_anchor)
positive = self.MLP(att_positive)
negative = self.MLP(att_negative)
loss = criterion(anchor, anchor, negative)
optimizer.zero_grad()
loss.backward()
optimizer.step()

Evaluation of the trained model?

Usually go with either:

  • Precision@K (P@K) -> used when you are only interested in retrieving correctly a limited number of images, like in the recommendation systems,
  • Mean Average Precision (mAP) -> more expensive to compute and less intuitive, but it evaluates a retrieval system deeply.

But this is not the end of the story. We also reviewed theory behind content-based image retrieval. If you feel intrigued and willing to check more, here as the whole article.

Cheers!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/kk_ai
πŸ“…︎ Sep 17 2020
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.