A list of puns related to "Object Class Detection"
Hi r/MachineLearning,
https://github.com/jacobgil/pytorch-grad-cam is a project that has a comprehensive collection of Pixel Attribution Methods for PyTorch (like the package name grad-cam that was the original algorithm implemented).
Class Activation Maps can help diagnose properties about the model predictions, like "where does the model see a cat in the image".
After many requests I added support for Object Detection and Semantic Segmentation, and wanted to share this with you.
Here you can find detailed notebook tutorials about this:
Computing the CAM for object detection
Computing the CAM for semantic segmentation
Class Activation Maps are usually researched and applied for classification models.
A repeating request in this repository, and also in some object detection projects, was to add support for grad-cam for object detection.
One challenge with this, is that object detection frameworks typically don't output tensors you can back-propagate through to compute gradients.
They typically output dictionaries with bounding boxes, labels, etc, after a lot of processing, and don't expose any way to compute gradients with respect to those detections.
If you want to compute CAMs for them, you typically have to dive into the code of these object detection packages and create solutions that work only with them.
There was no "generic" tool that just works and can be adapted to new object detection models.
Some Class Activation Map methods don't depend on computing the gradients. Examples of these:
So sorry because Iβm a total newbie at this! Iβve been having a really hard time understanding how to do this.
I found a pre trained caffemodel online which can detect classes like a tv monitor, water bottle and a person. For my school project, I need the model to detect new classes like matchsticks, lighters, etc.
I already have the folder containing about 300 images of each class I need for the training/fine tuning but I am so lost at how to go about this.
I would greatly appreciate any help! Thanks so much guys!!
I need to do a multi-class binary classification using object detection. For example, I need a binary classification for mask or no mask, helmet or no helmet, hot-dog or no hot-dog. Is it better to train each binary classifier and run all of the model at the same time or is it better to use object detection and train 6 classes object detection?
Sorry English is not my first language.
In this git repo (https://github.com/qqwweee/keras-yolo3), I want to add my own class to coco_classes.txt. I have tried another git repo (https://github.com/AntonMu/TrainYourOwnYOLO) for custom training, but its not as accurate as the pretrained coco dataset. I have created a CSV for my custom classes. How do I merge my custom classes with the coco dataset, so that I can make use of it without affecting the existing 80 classes?
As in, I want my model to be able to detect these 80 classes plus my custom classes.
Sorry if this is a stupid question, I am a beginner to Machine Learning.
We have created a dataset of 10,000 images and 17 classes. There are 177,106 objects in total. Their percentages (number of occurrence of each class / total number of objects in dataset * 100%) are as follows:
29.72 %, 24.41 %, 15.90 %, 11.18 %, 4.86 %, 4.19 %, 2.99 %, 2.86 %, 1.01 %, 1.01 %, 0.66 %, 0.55 %, 0.5 %, 0.09 %, 0.04 %, 0.02 %
We are training pre-trained CNNs (EfficientDet, YOLOv2 and YOLOv4/5) on this dataset.
As one might expect, we are having trouble detecting objects that occur less than 1% of the time in the dataset. Any idea how we can tackle this problem?
Hey guys!
Iβm very new to ML, and Iβm working a college project to detect allow entry to places with automatic doors (I.E. supermarkets, hospitals) only if the person is wearing a mask using a Raspberry Pi 4. I was able to train a model based on a pre-trained MobileNetV2 with TensorFlow 1.13.1, but the code is mostly written on the tensorflow/models GitHub repo & itβs a very ugly and un-intuitive code to rewrite by myself & it is based off another pre-trained model - all of which are things Iβm trying to avoid.
I started to learn how to do something similar with TensorFlow 2 and Keras. Iβve watched the I/O 2019 ML Zero to Hero & the 4 part mini-series to try and get the basic idea of convolutional networks and machine learning in general; However, I canβt find a single example of using Keras for creating a convolutional network for object detection with multi-class classification in a single model - I DID find a version where the code used a model to predict bounding boxes for faces first, extract them, and use image classification on it similar to the Rock/Paper/Scissors model in ML Zero To Hero - but thatβs kind of a compromise instead of using a single model that detects masks directly, and can also slow things down considering it will be running on a Raspberry Pi, that also does some more calculations based on the results of the model (temperature check, and audio instructions to the person requesting access if they are wearing the mask incorrectly).
Hereβs my current implementation (dataset is available with tfrecord, xml, and csv annotations):
It currently relies on the MobileNetV2 model, and for some reason While training the loss and accuracy are staying relatively the same - theyβre not improving over time as they should. If there is a better way or a more efficient way of implementing what I have been trying to do, Iβd be happy to learn about it.
To summarize, What I need help with (in relation to the notebook linked above):
β’ Need to figure out if the way I implemented it is the ideal / code efficient way to do such task (probably not?)
β’ re-implement it as a completely custom network (as in the ML Zero to Hero, but object detection instead of image classification), without relying on MobileNetV2 or any other pre-trained model.
β’ Figure out why the model does not improve itβs accuracy when training.
I'm building an detection model by YOLOv4 darknet, and my configuration contains 7 different classes. However, I only train model with a single class, which I provided the same label for all training images.
After training, the detect image shows all 7 classes, although other classes has low probability, but that confused me. Why would a model detect class that is not trained?
If it's because I list all 7 classes in the configuration, then do I have to set only one class in case I only want to detect one class?
I'm new to object detection, thank you!
I'm training a model to detect both vehicles and components of vehicles (e.g. wheels, doors.) I have a large, open dataset where only vehicles are labelled, and a smaller bespoke one with the vehicle and components labelled.
Obviously, a picture from the first dataset may have unlabeled instances of these components in it. Do these unlabelled instances reduce the accuracy of the component classes by essentially being used as false negative examples, or are positive examples more important for a multiclass detector? If they do detract, how can I use the larger dataset to improve vehicle detection accuracy?
If it's implementation specific, I'm finetuning the SSDLite MobileDet model using the TF Object Detection API with TF 1.15.
Thanks!
Pretty much the title. I have about 1250 images with Bounding Boxes coordinates.
Which architecture will be suitable? Any code I can follow for the same?
Hey all.
So this is my first time posting in this Subreddit.
I have this task of detecting the white circles in my link. It's basically LED light reflected onto the iris from a camera. It's for a positioning system that uses a 3-axis robot.
I tried to use open CV initially but due to vast variation in the lighting condition it wasn't able to detect the object in all frames.
Then I tried using YOLO V2. Specifically Tiny YOLO. So the link is basically the result of using YOLO. The tracking is fine.
Now what I have to do is to implement this on a Raspberry Pi 4 Model B. So when I tried this I got 1FPS when I was using real time video. I understand that there are hardware constraints. I tried using SSD mobileNET as well. It gave me around 2FPS.
So I want detect these objects in real time with a frame rate of around 7-10 FPS. Due to budget restrictions I cannot use a hardware accelerator.
I just wanted to know how I can do the object detection in real time with a good frame rate on the Raspberry pi 4.
Also I'm new to this and I'm trying to learn on the go.
I would like to ask for advice on what specific topic/architectures should I be looking for if I have this problem.
Broad Topic: Multi-class Object Detection or Semantic Segmentation (which includes multiple classes for classification) of defects
Problems:
I have started looking into salient object detection but I think it is the wrong route. Those are my main problem. I am having an intuition of improving the feature maps extraction or the whole process by sort of introducing the "background" without any object (or in this case, defects) to the architecture (since concretes are kind of uniformly textured and the defects that I am detecting and segmenting is just the deformity in that uniformly textured concrete".
Any advice would be much appreciated.
I'm using TensorFlow's 1.x Object Detection API to train a model for detection on satellite imagery tagged with 15 classes. Essentially trying to recreate Xia et all. https://vision.cornell.edu/se3/wp-content/uploads/2018/03/2666.pdf
I have only ever done this with a single class detector before, so I am unsure what the usual training time looks like. I am transfer learning a SSD trained on COCO model. After about two or so hours it's sitting at a loss of ~4, is this usual or should I start looking into adjustments?
As a side note: I have not done object detection since 2017 when this API just came out and was all the rage. Are there better methods out there nowadays?
Thanks!
I am working on a problem where we are interested in only 1-class at the end. Simple example detecting only person. I have an annotated data for only person. Now I need to decide whether to train a 1-class YOLO or put more annotation in background and mark as "other" class then train a 2-class YOLO.? Which one would give better accuracy or recall?
Additional query is: a) How the YOLO will behave when there is only 1-class. I mean does the network internally learns person vs non-person? How ot will do a contrast when there only 1-class. b) If a) does not work well, I feel creating a separate class should actually help. BUT here "other" class could contain wide variety of objects fitted under 1 annotation. Is network will get even more confused? Will it affect "person" class as well?
I was wondering what model people recommend (trying to maximize mAP where speed is not a concern) for object detection with roughly 20 classes on a small dataset of roughly 12 thousand images with fairly high class imbalance. Iβve tried Retinanet50 with mAP of 33 and Retinanet101 with mAP of 36. Models Iβm considering trying in the future are as follows (Iβd love to test them all but as these models take a long time to train Iβm trying to not waste too much training time): YOLOv3 SNIPER NASnet Faster R-CNN Mask-R-CNN
What models (or strategies) do people recommend for this scenario to try to maximize mAP?
let's say my objective is to identify 3 dogs, 2 cats, 4 elephants in a picture, putting aside object localisation (drawing bounding box) and object segmentation (mask rcnn) in object detection, can we do multi label/class image classification instead? and how to do that?
Hello I have an existing yolo format dataset of vehicle plate and I want to convert it to this format , Does anyone know what kind of format is this?
https://preview.redd.it/ecwnlq2rg8d81.png?width=389&format=png&auto=webp&s=c18f0a92723d5a6fae4dbd215cb844c34122d21b
Hi, I'm starting this project for my final year Msc thesis and, in the last 6-7 days, I've been searching and reading papers, blogs, forums to evaluate the possible solutions and whether they might be viable or not.
The proposed project is for a first prototype of a Smart Adapter which connects to an IP camera and does object detection through a configurable Neural Network, outputting relevant data through binary or parsable format. As a matter of fact they didn't mention tracking but that's a problem I think I'll have to address too, if I'll want to include that kind of data too. An example they showed me was from traffic aerial static road cameras, where all the detection was done on a server so that might be a use case on which I might work.
The supervisor proposed the use of an Nvidia Jetson TK1 (which they have in the laboratory), since it has the advantage of being low powered and low-cost. My idea was to start and try and use YOLOv3 pretrained models (normal or tiny) over darknet and then try to train it on a dataset like UA-DETRAC, for the roads use case. I thought about tensorflow with other algorithms but my guess is that I would have less fps performance.
I know from start that I won't get real time detection but what about reaching 7-8 fps? Maybe with the custom trained tiny model do you think I could have acceptable detecting and fps performance?
MOST IMPORTANT QUESTION, PLEASE, RELATED TO OFFLINE TRAINING: I couldn't find any relevant info on YOLOv3 normal/tiny models training time with darknet, I would like to do it on the UA-DETRAC dataset with ~80.000 images with multiple tagged boxes. Could I do it with a gtx 970 in a reasonable time or would it take days, if not weeks? In the latter case, are there online services which offer linux workstations with CUDA and cuDNN gpus for darknet training?
Thanks for any answer you might give :)
Hi everyone,
I am currently working on my bachelor thesis in the field of object detection. I have chosen the Yolov5 model from "https://github.com/ultralytics/yolov5". Looking for a dataset for autonomous driving I found NuScenes and Waymo, but in Waymo I have problems converting the TFRecords files to .yaml files. Does anyone know of an approach?
Does anyone knows of any other good datasets in the area of Autonomous Driving? They should also be optimally convertible to .yaml files.
Greetings
GT_King0895
Perhaps a stupid question, but as I understand it mAP is the average of AP scores for all classes. If I only have a single class would mAP = AP?
I have dataset of class A and a dataset of class B. Instances of class B exist in dataset A but are not annotated in dataset A and vice-versa. Is there a way to somehow train object detection system like SSD to detect these two classes? Intuition tells me it depends on the penalty(loss) for false positive samples.
Preparing dataset for training object detection model is a time-consuming task.
Generating synthetic dataset is much faster and easier, so it can save a lot of time for data scientists and ML engineers.
I've created a detailed tutorial on how to do that with Python and published it here:
Or, you can skip the tutorial and use the script from here:
https://github.com/alexppppp/synthetic-dataset-object-detection
Take a look at an example of synthetic scene here:
https://preview.redd.it/lzr920vntlc81.png?width=2053&format=png&auto=webp&s=a39442e63eb02eeb2897530323d0a92e81041122
I hope the script will be useful for you! Also I will appreciate any feedback and ideas how to improve it)
Hello, I am a newbie in drones and UAVs. I have been training deep learning models to recognize objects in the air. In order to actually deploy it, I was wondering if I could get some suggestions about possible cameras that I can stream and a GPU unit that would allow me to deploy the trained model. Some ideas that I came across:
Any suggestions would be great!
Hi everyone!
So I have a project related to object detection, specifically real time animal and person detection. Would it be possible to train an object detection model with generalized classes and subclasses?
What I mean is, if an animal enters the frame is it possible for a model to classify it as an animal and when there is more information (ex. the animal comes closer to the camera) to classify it as the type of animal (eg. a deer). Similarly, if it detects a person, it will also be able to tell the gender next to the person classification.
I have heard that it is possible to train with labels and sublabels (or attributes), but I haven't found anything to proceed as of now. Does anyone have any idea on how to proceed? Any lead is welcome!
Cheers! :)
Hey guys!
Iβm very new to ML, and Iβm working a college project to detect allow entry to places with automatic doors (I.E. supermarkets, hospitals) only if the person is wearing a mask using a Raspberry Pi 4. I was able to train a model based on a pre-trained MobileNetV2 with TensorFlow 1.13.1, but the code is mostly written on the tensorflow/models GitHub repo & itβs a very ugly and un-intuitive code to rewrite by myself & it is based off another pre-trained model - all of which are things Iβm trying to avoid.
I started to learn how to do something similar with TensorFlow 2 and Keras. Iβve watched the I/O 2019 ML Zero to Hero & the 4 part mini-series to try and get the basic idea of convolutional networks and machine learning in general; However, I canβt find a single example of using Keras for creating a convolutional network for object detection with multi-class classification in a single model - I DID find a version where the code used a model to predict bounding boxes for faces first, extract them, and use image classification on it similar to the Rock/Paper/Scissors model in ML Zero To Hero - but thatβs kind of a compromise instead of using a single model that detects masks directly, and can also slow things down considering it will be running on a Raspberry Pi, that also does some more calculations based on the results of the model (temperature check, and audio instructions to the person requesting access if they are wearing the mask incorrectly).
Hereβs my current implementation (dataset is available with tfrecord, xml, and csv annotations):
It currently relies on the MobileNetV2 model, and for some reason While training the loss and accuracy are staying relatively the same - theyβre not improving over time as they should. If there is a better way or a more efficient way of implementing what I have been trying to do, Iβd be happy to learn about it.
To summarize, What I need help with (in relation to the notebook linked above):
β’ Need to figure out if the way I implemented it is the ideal / code efficient way to do such task (probably not?)
β’ re-implement it as a completely custom network (as in the ML Zero to Hero, but object detection instead of image classification), without relying on MobileNetV2 or any other pre-trained model.
β’ Figure out why the model does not improve itβs accuracy when training.
I found a trained object detection model online which can detect object classes such as a water bottle and a person. I would like to add another class. How can this be achieved?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.