The image recognition market is estimated to grow from USD 15.95 Billion in 2016 to USD 38.92 Billion by 2021, at a CAGR of 19.5% between 2016 and 2021. Advancements in machine learning and use of high bandwidth data services is fueling the growth of this technology. Companies in different sectors such as e-commerce, automotive, healthcare, and gaming are rapidly adopting image recognition. According to the report by MarketsandMarkets, the image recognition market is divided into hardware, software, and services. The hardware segment dominated by smartphones and scanners can play a huge role in the growth of image recognition market. There is an increasing need for security applications and products with innovative technologies such as surveillance cameras and face recognition.
Image recognition refers to technologies that identify places, logos, people, objects, buildings, and several other variables in images. Users are sharing vast amounts of data through apps, social networks, and websites. Additionally, mobile phones equipped with cameras are leading to the creation of limitless digital images and videos. The large volume of digital data is being used by companies to deliver better and smarter services to the people accessing it.
Image recognition is a part of computer vision and a process to identify and detect an object or attribute in a digital video or image. Computer vision is a broader term which includes methods of gathering, processing and analyzing data from the real world. The data is high-dimensional and produces numerical or symbolic information in the form of decisions. Apart from image recognition, computer vision also includes event detection, object recognition, learning, image reconstruction and video tracking.
How Image Recognition Technology Actually Works?
Facebook can now perform face recognize at 98% accuracy which is comparable to the ability of humans. Facebook can identify your friend’s face with only a few tagged pictures. The efficacy of this technology depends on the ability to classify images. Classification is pattern matching with data. Images are data in the form of 2-dimensional matrices. In fact, image recognition is classifying data into one category out of many. One common and an important example is optical character recognition (OCR). OCR converts images of typed or handwritten text into machine-encoded text.
The major steps in image recognition process are gather and organize data, build a predictive model and use it to recognize images.
Gather and Organize Data
The human eye perceives an image as a set of signals which are processed by the visual cortex in the brain. This results in a vivid experience of a scene, associated with concepts and objects recorded in one’s memory. Image recognition tries to mimic this process. Computer perceives an image as either a raster or a vector image. Raster images are a sequence of pixels with discrete numerical values for colors while vector images are a set of color-annotated polygons.
To analyze images the geometric encoding is transformed into constructs depicting physical features and objects. These constructs can then be logically analyzed by the computer. Organizing data involves classification and feature extraction. The first step in image classification is to simplify the image by extracting important information and leaving out the rest. For example, in the below image if you want to extract cat from the background you will notice a significant variation in RGB pixel values.
However, by running an edge detector on the image we can simplify it. You can still easily discern the circular shape of the face and eyes in these edge images and so we can conclude that edge detection retains the essential information while throwing away non-essential information. Some well-known feature descriptor techniques are Haar-like features introduced by Viola and Jones, Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Feature (SURF) etc.
Build a Predictive Model
In the previous step, we learned how to convert an image to a feature vector. In this section, we will learn how a classification algorithm takes this feature vector as input and outputs a class label (e.g. cat or background/no-cat). Before a classification algorithm can do its magic, we need to train it by showing thousands of cat and non-cat images. The general principle in machine learning algorithms is to treat feature vectors as points in higher dimensional space. Then it tries to find planes or surfaces (contours) that separate higher dimensional space in a way that all examples from a particular class are on one side of the plane or surface.
To build a predictive model we need neural networks. The neural network is a system of hardware and software similar to our brain to estimate functions that depend on the huge amount of unknown inputs. According to Kaz Sato, Staff Developer Advocate at Google Cloud Platform “A neural network is a function that learns the expected output for a given input from training datasets”. A neural network is an interconnected group of nodes. Each processing node has its own small sphere of knowledge, including what it has seen and any rules it was originally programmed with or developed for itself. The neural network would require one learning algorithm. There are numerous algorithms for image classification in recognizing images such as bag-of-words, support vector machines (SVM), face landmark estimation (for face recognition), K-nearest neighbors (KNN), logistic regression etc.
While the above two steps take up most of the effort, this step to recognize image is pretty easy. The image data, both training, and test are organized. Training data is different from test data, which also means we remove duplicates (or near duplicates) between them. This data is fed into the model to recognize images. We have to find the image of a cat in our database of known images which has the closest measurements to our test image. All we need to do is train a classifier that can take the measurements from a new test image and tells us about the closest match with a cat. Running this classifier takes milliseconds. The result of the classifier is the ‘Cat’ or ‘Non-cat’.
The major challenges in building an image recognition model are hardware processing power and cleansing of input data. It can be possible that most of the images might be high definition. If you are dealing with large images of size more than 500 pixels, it becomes 250,000 pixels (500 X 500) per image. A training data of mere 1000 images will amount to 0.25 billion values for the machine learning model. Moreover, the calculations are not easy addition or multiplication, but complex derivatives involving floating point weights and matrices.
There are some quick hacks to overcome the above challenges:
– Image compression tools to reduce image size without losing clarity
– Use grayscale and gradient version of colored images
– Graphic processor units (GPU) – To train the neural networks containing large data sets in less time and with less computing infrastructure.
How to use image recognition for your business?
From the business perspective, major applications of image recognition are face recognition, security, and surveillance, visual geolocation, object recognition, gesture recognition, code recognition, industrial automation, image analysis in medical and driver assistance. These applications are creating growth opportunities in many fields. Let’s take a look at how image recognition is creating a revolution in some of the business sectors –
The level of adoption of this technology is the highest in e-commerce including search and advertising. Image recognition can transform your smartphone into a virtual showroom. It is used in mobile applications to identify specific products. It presents a more interactive view of the world by making everything they see searchable.
A prominent example of image recognition is CamFind API by Image Searcher Inc. It’s technology enable an advanced level of mobile commerce. CamFind identifies objects like watches, shoes, bags and sunglasses etc and returns purchasing options to the user. Prospective buyers can perform live product comparison without visiting any website. Developers can use this image recognition API to build their own mobile commerce application. Similarly, ViSenze is an artificial intelligence company that solves real-world search problems using deep learning and image recognition. Products made by ViSenze are used by online shoppers, internet retailers, and media owners for the use of product recommendation and Ad targeting.
The world of gaming will be revolutionized by image recognition and computer vision technology. In fact, this revolution is already started. The Microsoft Kinect video game holds Guinness World Record for the fastest-selling consumer electronics device ever. The game is based on computer vision and tracks the human body in real time. Serious gamers are more inclined towards the ones with action in the real world away from the device. Image recognition holds the key in generating such new user experiences and user interfaces. Combining image technologies mashed up with geo-targeting and in-app purchasing, search-based commerce or advertising begin to transition into the real world, opening the doors to incredible AdWords-sized, off-device business opportunities.
Image recognition and processing is an essential part of autonomous vehicles pioneered by Google and Uber. Cars of the future are expected to detect obstacles and warn you about proximity to guardrails and walkways. The technology is even capable of reading road signs and stop lights. Computer vision systems powered by deep learning are trained using thousands of images. Images of road signs, humans, roads etc under different weather conditions are fed into the neural networks. The systems get intelligent as more training data is fed into the system.
Do you think above examples are focussed on big industries and might not apply to your business? On the contrary, image recognition can be applied in small methods to derive benefits. Image recognition technology is primarily used to engage the audience and drive social sharing. For example, it can be used in optimizing mobile advertising. Using image recognition, marketers can deliver highly visible advertising campaign with less intrusive and targeted ads.
How Maruti Techlabs uses Image Recognition for our Client?
The organizations looking to adopt this technology for the first time should start with a specific business segment. These segments should have strong business rules to guide the algorithms, and large volumes of data to train the machines. We have integrated an image recognition solution for our client in the automobile sector. The client has an e-commerce platform to buy and sell cars. The sellers uploaded images of the cars to verify the vehicle’s present condition. The fraud sellers were uploading offensive or irrelevant content to trick the system and get the quote for the car. To reduce such fraud cases the organization had to dedicate some people to manually check the images.
We designed a solution using Google Vision technology to weed out the irrelevant (non-cars) images. Vision uses the power of Google image search feature to detect explicit content, facial attributes, label images into categories, extract text etc. We have used the safe search annotation feature of Vision to process more than 1000 seller images per day. The images can also be tagged based on content such as adult, violence, spoof and medical. Google Vision improves over time as new data and concepts are introduced. As we gather more data (images) we would be implementing a customized image recognition solution using the above technique.
It is difficult for every company to invest in this technology and subsequently built an engineering team for computer vision. Even with the right team, it can be a lot of work to generate results. This is where our data science experts can help you in defining a roadmap for incorporating image recognition and related machine learning technologies. Mostly managed in the cloud, we can integrate image recognition with an existing app or use it to build a specific feature for your business.