1. Deep Learning is not magic
DL (Deep Learning) covers a very large area and is very complex to understand. DL is often compared to the process of learning that occurs in the human brain. Similarly, DL is also based on the use of neural networks. Artificial neural networks actually try to imitate the brain’s activity.
A baby aged only a few months learns to identify visual content by viewing samples and from the labeling she is exposed to from the human beings around her. For example, as time passes she understands that a man usually has short hair as opposed to a woman whose hair is usually longer and whose facial features tend to be more delicate. With exposure to a growing number of examples, she learns to recognize even more refined details that differentiate between males and females.
Eventually an adult is able to understand visual content according to the ‘training’ process he has undergone during his childhood and life. For example, if he has seen many different types of birds, he will be able to name them properly as opposed to someone who has seen a smaller sample who will be limited in his ability to classify birds.
A major problem in teaching a network using Deep Learning is providing a high quality, extensive as possible sample collection. This is actually one of the major challenges in the training of neural networks. Another difficulty lies at the basic level of deciding what the model should understand.
There are many visual examples that are similar, a few simple examples to demonstrate:
- A Labrador is very similar to a Golden Retriever. Do we have to differentiate between them or is it enough to just identify both as hounds?
- A drinking glass – as there are many different types of glasses, very different from each other in look and color, it will be difficult to teach the model what a glass is and we will have to split the category into wine glass, beer glass, etc.
It is very important to decide this at the beginning stages as each such decision affects what is being taught and what training set is used and this will have extensive implications on the model’s behavior and accuracy.
2. Good news for Deep Learning – you don’t have to start from scratch (use existing Frameworks instead!)
The DL world is evolving meteorically and there is a wide range of platforms and infrastructures used for training models. Most tools are free and the documentation is relatively comprehensive. Some central Frameworks are Caffe, Theano, Google’s TensorFlow etc. Another positive point is that both the academic world and the industry use the same tools so it’s relatively easy to use tools from the academic world.
3. The not so good news – GPU is better than CPU
Training a neural model for deep learning requires many hardware resources, mainly serious processing power to perform repetitive iterations in order to update the model in accordance with the samples it is working on. The mathematical calculations that take place during training are relatively simple but there is a huge number of such calculations going on at the same time.
The difference between CPU and GPU is mainly the number and strength of the processors. CPU usually has a few (a few dozen at best) strong processors as opposed to GPU that has many (thousands) of relatively weak processors. For this reason, it is better to do the parallel training on the GPU. Just to illustrate this point, training that takes only 5 days on the GPU could take months on the CPU.
The bad news is that GPUs are expensive. A GPU with thousands of processing units could cost thousands of dollars just for the graphics card. And there may also be need for dedicated servers for the graphics card, which would involve additional expenditure.
4. Using existing models
One of the burning questions in the field of DL is – is it possible to use a certain model to train another model? The answer is – it depends…
In certain situations a model that specializes in a specific field can be adapted to understanding a different field. The closer the fields are, the easier it is to ‘train’ a model to understand another field. For example, a model that specializes in facial recognition can serve as a basis for a model that recognizes gender. This is called ‘transfer learning’. Similarly, in the human brain, areas that are near each other understand close visual content.
The field of transfer learning is very important and interesting as it can assist in decreasing the number of samples needed train similar/close models.
5. The challenge: preparing the training set
The most challenging part in training a model is the collection of the large amounts of data necessary to teach the model how to understand a certain picture.
For example, if we want to teach the concept of ‘basketball’ to a model the first problem would be defining ‘basketball’, as there are many kinds of basketball – professional, children’s, women, men, street ball etc. So it is necessary to see what is it takes to understand a specific category.
Only then, we have to collect as many pictures as possible that represent the category we want to teach. This leads us to the question – which picture represents the category and which does not? The answer is not always clear.
Click here to learn more about Visual API
Written by Lior Cohen, PicScout VP R&D, published in the Israeli tech blog – Geektime.