The saying “a picture is worth a thousand words” never rang so true in the world of machine vision as it does today. With deep learning, thousands, even millions of lines of code can be replaced by a simple neural network trained with images and very little coding.
The great news is that deep learning is no longer a method available only to researchers or people with highly specialized skills and/or big budgets. Today, many tools are free, tutorials are easy to find, hardware cost is low and even training data is available at no cost. This presents both opportunities and threats - as new players emerge to disrupt established names and spur innovation. It also provides opportunities for your machine vision systems to do things previously unimaginable – as an example, deep learning can be used to recognize unexpected anomalies, typically very difficult or almost impossible to achieve with traditional coding.
A noteworthy benefit of deploying deep learning neural networks is that it allows complex decisions to be performed on the edge with minimal hardware and very little processing power - Low cost ARM or FPGA based systems and new inference cameras like the FLIR Firefly DL make this possible.
In this article, you will learn:
Deep learning is a subset of machine learning inspired by how the human brain works. The thing that makes deep learning “deep” is the fact that there are multiple “layers” of neurons of various weights which help a neural network make its decision. Deep learning can be broken into two stages, training and inference.
During the training phase, you define the number of neurons and layers your neural network will be comprised of and expose it to labeled training data. With this data, the neural network learns on its own what is ‘good’ or ‘bad’. For example, if you are grading fruits, you would show the neural network images of fruits labeled “Grade A”, “Grade B”, “Grade C”, and so on. The neural network then figures out properties of each grade; such as size, shape, color, consistency of color and so on. You don’t need to manually define these characteristics or even program what is too big or too small, the neural network trains itself. Once the training stage is over, the outcome is a trained neural network.
The process of evaluating new images using a neural network to make decisions on is called inference. When you present the trained neural network with a new image, it will provide an inference (i.e. an answer): such as “Grade A with 95% confidence.”
The development of a deep learning application broadly entails 5 steps. A high-level description of each step follows:
Step 1] Identify the appropriate deep learning function
In the world of deep learning, tasks are classified into several functions. The ones we consider most common to machine vision are:
First and most basic application of deep learning is classification. The process involves sorting images into different classes and grouping images based on common properties. As an example, you can use classification to separate a flawed part from a good one on a production line for quality control or while conducting PCB solder inspections – like illustrated in Fig.1 below (using a FLIR Firefly DL inference camera).
Detection and Localization
Another deep learning task ideal for machine vision is called detection and localization. Using this function, you can identify features in an image and provide bounding box coordinates to determine its position and size. For example, it can be utilized to detect a person breaching a safety parameter around robots on a production line or identify a single bad part on a production/assembly line conveyor system.
The third type of deep learning is segmentation; typically used to identify which pixels in an image belong to which corresponding objects. Segmentation is ideal for applications where determining the context of an object and it’s relationship to each other are required (for example autonomous vehicle / Advanced Driver Assistance Systems, popularly referred to as ADAS).
This type of deep learning task can be utilized to identify regions that do not match a pattern. A typical example of an application where anomaly detection can add value would be stock control and inventory management at grocery stores, like illustrated in Fig 4. This application involves using an inference camera to detect and highlight shelves that are empty or about to need replenishment, providing real-time notifications and improving efficiency.
Once you determine the deep learning function you intend to use, you’ll need a toolset (developers call this a “framework”) best suited to your needs. These frameworks will provide a choice of starter neural networks and tools for training and testing the network.
With some of the world’s biggest technology companies vying for dominant positions in the deep learning market, frameworks like TensorFlow by Google, Caffe2 by Facebook and OpenVino by Intel (all free) demonstrate the quantum of investments and resources flowing into the deep learning market. On the other end of that spectrum, you also have Pytorch, an open source solution thats now part of Facebook. These tools are easy to use and provide great documentation (including examples), so even a novice user can train and deploy a neural network with minimum effort.
Discussing all available frameworks would warrant a separate article, but the following pointers list key advantages and disadvantages for 3 of the most popular frameworks:
The neural network you choose would eventually depend on the complexity of the task at hand and how fast your inference needs to run. E.g. one can choose a neural network with more layers and more neurons, but the inference would run slower. Typically, a trained neural network requires very little computing power and can deliver results in a matter of milliseconds. This allows complex deep learning inferences to be performed on the edge with low power ARM boards or inference on the edge - with specially manufactured inference cameras like the FLIR Firefly DL.
Furthermore, even companies lacking the resources and employees to learn and implement a deep learning solution can rely on third party consultants; who can help various stakeholders through the entire development cycle – from conceptualization to deployment. One such example is Enigma Pattern (https://www.enigmapattern.com).
Depending on the type of data you want to evaluate, you’ll require a repository of images with as many characteristics you hope to utilize in your evaluation and they need to be labelled appropriately As an example, if your neural network needs to identify a good solder from a bad one, it would need hundreds of variations of what a good solder looks like and a similar set of what bad solders look like with labels that identify them as such.
There are several avenues to obtain a dataset of images:
In the process of building their own deep learning code, several developers open source their solution and are happy to share them for free. One such tool particularly useful if your dataset is not pre-labelled, is called LabelImg; a graphical image annotation tool that helps label objects into bounding boxes within images (https://github.com/tzutalin/labelImg). Alternatively, the entire process can be outsourced to a third party.
Another way to shorten the data preparation stage entails augmenting one image into many different images, by performing image processing on them (rotate, resize, stretch, brighten/darken… etc.). This would also free up development time, as the process of capturing training data and labeling the images can be offloaded to a novice user.
Furthermore, if you have specific hardware limitations or preferences, this becomes even more important, because deep learning tools discussed in the next section support a finite set of hardware and are often not interchangeable.
After the data is prepared, you will need to train, test, and validate the accuracy of your neural network. This stage involves configuring and running the scripts on your computer until the training process delivers acceptable levels of accuracy for your specific use case. It is a recommended best-practice to keep training and test data separate to ensure the test data you evaluate with is not used during training.
This process can be accelerated by taking advantage of transfer learning: the process involves utilizing a pre-trained network and repurposing it for another task. Since many layers in a deep neural network are performing feature extraction, these layers do not need to be retrained to classify new objects. As such, you can apply transfer learning techniques to pre-trained networks as a starting point and only retrain a few layers rather than training the entire network. Popular frameworks like Caffe2 and TensorFlow provide these for free.
Furthermore, adding new features to detect to an already trained neural network is as easy as adding additional images to the defective image set and applying transfer learning to retrain the network. This is significantly easier and faster than logic-based programing; where you must add the new logic to the code, recompile and execute while ensuring the newly added code does not introduce unwarranted errors.
If you lack coding expertise to train your own neural network, there are several GUI (Graphical User Interface) based software that work with different frameworks. These tools make the training and deployment process very intuitive, even for less experienced users. Matrox MIL is one such example from the machine vision world.
The last step entails deployment of your trained neural network on the selected hardware to test performance and collect data in the field. The first few phases of inference should ideally be used in the field to collect additional test data, that can be used as training data for future iterations.
The following section provides a brief summary of typical methods for deployment with some associated advantages and disadvantages:
Edge (Standard PC)
Edge (ARM, FPGA & Inference Cameras like FLIR Firefly DL)
Now that we’ve covered an overview of the development and deployment process, it would be pertinent to look at some shortcomings too.
In addition to these shortcomings, many applications are conceptually better suited for logic-based solution. For instance, if your problem is well defined, deterministic and predictable, using logic-based solutions may provide better results as compared to deep learning. Typical examples include barcode reading, part alignment, precise measurements and so on.
Even with some of the shortcomings highlighted above, the potential benefits accrued from deep learning far outweigh the negatives (rapid development, ability to solve complex problems, ease of use and deployment - just to name a few). Furthermore, there are constant improvements being made in the field of deep learning that overcome these shortcomings. For instance, activation maps can be used to visually check what pixels in the image are being considered when a neural network is making its decision, so that we can better understand how the network arrived at its conclusion. Also, with wider adoption many companies are now developing their own neural networks instead of relying on transfer learning – improving performance and customizing the solution for specific problems. Even in applications that are suited for logic-based programming, deep learning can assist the underlying logic to increase overall accuracy of the system. As a parting note, it’s getting easier and cheaper than ever before to get started on developing your own deep learning system: click here to learn how to build a DL classification system for less than $600.