Click to learn more about author Jasmine Morgan.
From unlocking your phone to automatically tagging pictures on Facebook, deep learning enhances image recognition in multiple sectors. This is generally called computer vision, and it means that algorithms can see and understand images, much like humans.
However, since computers operate using binary code, machine learning transforms images into contours, gradients, and color codes which are then fed into the algorithms to make them search for patterns. The only way a computer recognizes an image is by comparing it to a vast database of data that it already has seen during its training sessions. The machine then computes the probability that the current image belongs to a specific category by comparing contours, shades, light, and more.
Neural Networks: How They Work
The binary judgment of the computer means that the algorithm does not see, for example, the image of a dog or a cookie. It sees rows and rows of data it is trying to make sense of. This is where neural networks make a difference. These are sets of computational cells arranged in layers. Each cell processes information individually, the layer creates an output which it passes on to the next layer, and this procedure gets repeated over and over.
Each layer is, in fact, a set of filters, ranging from basic geometric filters (edges, angles, circles) to more sophisticated ones capable of detecting body parts, animals or even animal breeds. The result is a probability that the object in the picture belongs to a predefined class.
A neural network becomes better the more you use it, as it starts to learn from its past experiences that constitute actual training. Neural networks are, in fact, statistical models aimed at comparing matrixes of pixels for similarities.
For efficient training, there is a massive demand for labeled training images. The primary database was Image Net, a database of 3.2 million pictures, followed by AlexNet.
Uses of Deep Learning in Image Recognition
There are a few areas which can benefit directly from the advancements of AI image recognition. Here are a few ideas about how this technology can solve daily challenges in business, healthcare, and logistics.
Image Classification
Most of the time, the need for computer vision goes beyond mere tagging of friends on social media. It can help decide if a tumor is cancerous or not, or enhance optical character recognition tools necessary for invoice processing.
An enhanced version of image classification is the one with localization. This does the same task as the simpler version, but once it identifies an object, it also draws a border around it and attaches a label. It is beneficial when there are multiple objects in a picture, and each needs to be assigned its class. A direct application of this technique is helping vehicles keep a distance from other traffic participants.
Object Segmentation
Once an object is identified, it can be necessary to split it into smaller parts, which could help with further identification. For example, if an image is labeled “animal,” the identification of the ears, tail, and paws can point to what kind of animal it is.
Image Colorization and Reconstruction
Recolorization serves more artistic and historical purposes, but could help fight crime, especially in the case of police evidence photos, which are usually black and white.
An image recognition algorithm can be trained to fill in missing parts of pictures based on the existing objects and identified colors. Training is done on modified versions of a picture by comparing it to the original. Applications here could include airport security, where CCTV cameras create fragmented images which can be then used to re-create faces.
Deep learning can do even more and improve the resolution of a given image. Such an algorithm learns by comparing scaled-down versions of the original image and then reversing the process.
Deepfake or Image Synthesis
Probably the scariest application of deep learning for images is deepfake; an AI-powered way of merging multiple image sources into a unitary product that seems believable while not existing in reality.
AI Limitations at the Core
The major limitation of neural networks compared to the human brain is that there is no way to transfer knowledge between classes of objects. Neural networks learn characteristics of each object type by scanning thousands of tagged images belonging to that class. If there is not enough training material, or content variations are not significant enough, the algorithm will make funny or dangerous mistakes, depending on the context.
Another issue of deep learning when it comes to image processing is that it doesn’t understand the context. It can identify individual objects in an image, but it can’t make sense of what they mean when located together. For example, in a family picture, AI can identify three persons and an animal, maybe it can go more in-depth and detect there are an adult man, an adult woman, a child, and a dog, but unless it was trained on images tagged as family, it will not assign this class like a human would do naturally.
There is still a long way to go before computer vision will fully understand what it looks at, but until then we can cheer at the fact that it is better at detecting cancer than human doctors.
Final Thoughts
The amount of money poured into AI, in general, has been soaring with multipliers of x5 or even x10 since 2000, and a large chunk of those investments were spent in the computer vision sector.
While the investment in machine learning was $1.58 billion in 2017, it is projected that it will reach $20.83 billion by 2024, according to Zion Market Research. Right now companies should set aside some cash for AI tools when drafting their budgets, to seize the moment when they are still a competitive advantage.