Today, more and more appliances in our homes and in the office are “smart”. We call it the internet of things, IoT. But IoT is much more than just “things” connected to the internet. It is about devices that can understand their surrounding, interact with us and contribute to creating safer, smarter, more efficient, and more fun lives.
Many of us grew up with remote-controlled devices. Pointing one object at another to change, temperature, channel volume, etc. The next generation of devices is smart enough to understand voice commands. We simply ask for the lights on or off, more or less volume, etc. Voice interface is an intuitive, simple, and smart way to interact with devices. Currently, there is a set of basic commands that we are familiar with in devices such as Amazon Echo. But the demand for voice user interfaces and other audio-related applications is growing.
This requires special hardware and application-specific chips that can understand language. In the visual world, it needs neural accelerators to identify objects and classify them. For example, if we want to ask our camera system where person X is at the moment inside the house. It means that those accelerators for speech and video need to be fast, accurate, and low-power. We want them to monitor their environments continuously without stressing a battery too much. And we want security and privacy included as well. How would such a smart future be achieved technically? What would that chip architecture look like?
This article explores one opportunity for a smarter future via the Ensemble family of microcontrollers and fusion processors from Alif Semiconductor.
Traditionally, devices rely on general-purpose microcontrollers or microprocessors. Their application range is wide, but it comes with a significant downside - they are slow in performing application-specific tasks. Complex mathematical operations execute sequentially and that takes a lot of time. Today's AI algorithms are math-heavy and therefore consume a lot of time and power when running on a general-purpose processor.
A multi-core processor could be much faster for such applications, however, if one uses -for example- 4 cores, one also consumes 4 times as much power to be 4 times faster. Hence, application-specific ICs such as the Alif Ensemble family of devices use dedicated hardware accelerators for AI, the Ethos U55 microNPU.
Alif Semiconductor was founded in 2019 with a vision to address the rapidly growing market demand for broad, scalable, and connected AI-enabled embedded computing solutions that are genuinely power efficient. The Ethos-U55 accelerator combined with a 160MHz Cortex-M55 core is the high-efficiency subsystem that consumes as little as possible. It can remain always on and listen for voice commands. Natural Language Processing is the ideal application for such a subsystem.
For general purpose applications that require maximum performance yet low power, an ARM Cortex-M55 at 400 MHz is available in a separate power domain. This high-performance subsystem has an accelerator associated with it as well, also an Ethos-U55 core, but this Ethos unit has twice the compute bandwidth of the high-efficiency one. Both the accelerator and the M55 are in a default extremely low power idle state. When the application requires it, it powers the subsystem on. As soon as the task finishes, the subsystem will return to its low power idle state.
Both subsystems require a manager that schedules the tasks and switches off all unused parts of the chip. Thanks to the intuitive and simple software management layer, any application is easy to port to this flexible system of high-performance and high-energy efficient processor-accelerator combination.
To understand the benefit of this hybrid system, we should compare it to a CPU-only system. As an example, let's look at a voice-activated camera system. This application listens for keywords and when it hears a keyword ( in a security application, this could be breaking glass) it triggers the camera to take a picture. It then runs an object detection and classification algorithm on this image. In the first system, two M55 cores run the edge inference models. One M55 core operates at 160 MHz and runs the audio inference model, a Convolutional Neural Network (CNN) model. Another M55 core operates at 400 MHz and runs the image inference model, MobileNet V2.
In the second system, the high-efficiency subsystem is always sampling the audio. The Ethos micro-NPU continuously scans the sampled audio for keywords. When it spots a keyword, it runs the image object detection and classification model. The results are interesting. For the audio task, the Ensemble chip with the two subsystems outperforms the CPU-only system by a factor of almost 29 in speed. When we look at the power consumed, the Ensemble chip is 33 times more energy efficient. And for image object detection and classification, the Ensemble chip is 75 times faster and is 76 times more energy efficient.
Depending on the application, the Ensemble family has one, two, three, or four-core configurations. The family scales well for AI applications with different needs for extra energy-efficient processor cores. We know that maximum battery life relies on extreme power management. And that is one of the major benefits of the Ensemble family. Furthermore, the chip has security built-in from the ground up and it seamlessly integrates with cellular networks.
Another advantage is the local image and the audio processing. It allows privacy measures that cloud-based solutions do not have. The latter needs to send audio and/or images to a data center, which is noticeably slow and prone to privacy breaches. If we need a fast response to a keyword, the delay to the data center can be unacceptably long, thus making local processing the only viable solution.
On top of that, locally, one can do tasks such as anonymize images or videos by using human gait analysis. Instead of identifying faces, we use pose and gait tracking. This method can uniquely identify individuals without associating gender, age, and other privacy-sensitive information with an image. And the same goes for NLP, where we identify keywords locally to trigger an action while the audio always remains internal. The device never exposes the privacy-sensitive audio to the external world.
Cloud-based inference has serious drawbacks for privacy, security, and response delays. Local edge inference is the logical way forward. To conserve energy, we have to look at application-specific accelerators that offload this inference to a dedicated Neural Processing Unit. They are so much faster and require less power than using the more general-purpose ARM M55 cores for inference.
The next generation of IoT devices will take full advantage of the capabilities offered by new application-specific, low-power chips such as the Ensemble family.
Alif Semiconductor was founded in 2019 as a result of our co-founders' longtime observations of a rapidly growing market need, and an inadequate response by suppliers, for broad, scalable, connected, AI-enabled embedded computing solutions that are genuinely power efficient. That’s why we created a new class of products in the industry, that we call fusion processors, to enable our customers to do much more through innovative low-power techniques, extreme integration, accelerated AI/ML, high security, ubiquitous wireless connectivity, and operating system diversity.