In this article, we will see why it is important to have high energy efficiency in embedded ML systems and demonstrate how to achieve it using the Renesas RZ technologies.
Years ago, when you wanted to apply machine learning (ML) to data, you had just one option available: the cloud. Nowadays, however, you can implement a wide variety of ML models in embedded systems. Applications include:
As you can see, there is a broad range of uses for ML in embedded systems.
Of the solutions listed above, the ones involving image and video processing are the most energy-consuming. Reducing the energy use of these devices can allow applying video analytics even in very constrained situations, like solar-panel/battery-fed cameras. On the other hand, video analytics are used not just in video surveillance systems, but also as an important part of more complex systems. Think about robots that need to move around avoiding obstacles. These robots depend on batteries, and the running of ML models can have a great impact on their operating life. Other examples are drones equipped with different types of cameras, including thermal and day/night vision. In this case, the lasting of the batteries determines the time of flight of the device.
The execution of ML models can consume a considerable amount of energy for an embedded system. This happens because the running of the model is CPU and memory intensive. However, keeping the power consumption at low levels is important, consider the following factors:
Now that we have explored the reasons to implement an energy-efficient device, let’s see how you can achieve it.
There are two major approaches to reducing power consumption in embedded systems: hardware-related, and software-related. An efficient device will combine both strategies.
This is the case with cutting-edge microprocessors and microcontrollers.
There is a wide range of ML applications and each of them requires a different amount of processing power and memory resources. The following figure shows the type of ML applications that you can run across each family of microcontrollers from Renesas.
As you can see, you can deploy simple anomaly detection models to advanced video analytics. Signal processing, anomaly detection, character recognition, simple classifications, or even simple object detection require low processing power. Voice recognition is a harder task and you will have to choose at least the RX, RA, or RZ/A-T family. Then, for complex scenarios where you want to apply classification or object detection, you will have to select the RZ/G family.
Before seeing the solution for embedded video analysis developed by Renesas, let’s refresh the general procedure of implementing ML in embedded systems.
The pipeline of ML for embedded systems is as follows:
In this article, we focus on the RZ/V family and their capabilities to reduce power consumption for intense ML uses, like image and video processing and analytics. Let’s see how Renesas addresses this problem.
Renesas trains the ML models in the cloud or on your infrastructure and then deploys them to the embedded devices.
This approach allows that the devices don’t need to connect to the cloud for running the inference. This removes the delays involved with cloud communications and allows the model to run in a very fast and efficient way.
An interaction between a device and the cloud could take as long as one second, while running the inference locally only takes a few milliseconds. This is a great improvement in time and use of energy.
You can build your models using any of the common ML frameworks, like “PyTorch”, “Keras”, “Tensorflow”, or the 8-bit quantized model of “TensorFlow Lite”. Then you can convert it to the supported format by the hardware, as you can see in the following figure.
The step of deploying the models to the MPUs involves the use of the e-AI translator or the DRP-AI translator from Renesas, depending on the family of products that you are using. These translators perform very efficient programming of the devices, drastically reducing the energy consumption.
In particular, the e-AI translator lets you implement the model across many MCUs. On the other hand, the Dynamically Reconfigurable Processor for AI (DRP-AI) translator allows you to convert ONNX models by using the AI accelerator "DRP-AI" provided in RZ/V devices.
Typical applications for RZ/V with embedded AI are:
RZ/V uses the DRP-AI to accelerate the inference and save energy in the process. To reduce the power consumption on AI operations, RZ/V microprocessors apply the following techniques.
Besides the power consumed by matrix operations in the accelerator, the transactions between it and the external memory consume a considerable amount of energy. On the other hand, the process of applying ML to images is a memory-intensive process.
The embedded AI accelerator reduces the amount of memory access by reusing the data obtained from the memory. This leads to an important reduction in power consumption.In particular, the number of transactions between the accelerator and the memory is reduced by a factor of nine compared to a GPU.Also, the accelerator can use the same data as output data and weight information.
One important feature of AI models is that a high percentage - typically 50 percent - of input/output data and weights are equal to zero. The DRP-AI uses a technology that detects zeroes inputs and avoids their computation, saving processors cycles and energy.
The third technique is related to the synchronization of operations. This involves the management of computation operations, external memory access, etc.
The correct schedule of tasks avoids the occurrence of waiting times, improving the efficiency of the entire process. For example, by queuing the access to the memory after the weight information has been read and stored in the buffer, the memory access latency can be optimized. The best part of it is that you don’t need to take care of it because this optimization is automatically generated by the DRP-AI translator.
In the following figure, you can see thermal images that compare the heating of a DRP-AI system and a GPU.
Brighter images mean more surface temperature, because of more heating. As you can see, there is a big difference between the heat dissipated by the GPU and the DRP-AI system. This is clear evidence of the outstanding performance of DRP-AI when we compare it with a typical GPU system.
Keeping power consumption at low levels and achieving an energy-efficient embedded ML device is a key factor because of the following reasons.
First, many embedded systems today depend on batteries and autonomous energy systems. At the same time, the execution of ML models is part of their intrinsic functionality. Some examples are video surveillance cameras with analytics, robots, and drones.
Also, the use of powering technologies like PoE requires keeping the power consumption of the devices at the levels supported by the standards.
Finally, having an energy-efficient device leads to a simpler and more compact design, avoiding big heat sinks and air-forced coolers.
To achieve the required level of efficiency you can use energy-efficient technologies, like the DRP-AI provided by Renesas.
This technology allows deploying the ML models in the embedded devices taking into consideration three key factors:
These techniques produce an improvement in the use of energy up to a factor of nine, compared to traditional GPU systems.
At Renesas we continuously strive to drive innovation with a comprehensive portfolio of microcontrollers, analog and power devices. Our mission is to develop a safer, healthier, greener, and smarter world by providing intelligence to our four focus growth segments: Automotive, Industrial, Infrastructure, and IoT that are all vital to our daily lives, meaning our products and solutions are embedded everywhere.