The deployment and operation of AI systems and models at the edge come with many benefits for industrial organizations, yet they still pose a host of challenges. For instance, challenges posed by the limited processing power of edge devices, compared to conventional centralized systems, still need to be addressed. Edge deployments also limit the data that are centrally aggregated, which results in the lack of adequate data points for certain applications. Also, there are still issues related to scaling edge AI solutions across diverse environments and challenges to ensuring seamless interoperability between heterogeneous devices.
To address these limitations, there is significant traction around edge AI technology, which leads to a continuous improvement of the main technological enablers of the different edge AI paradigms. These enablers are advanced in directions that address the scaling, interoperability, and distribution challenges of edge AI deployments. Likewise, the development of these enablers spans many different aspects of AI systems, including hardware and software infrastructures, as well as novel AI models and paradigms.
Hybrid Edge-Cloud AI: Optimized Intelligence and Resource Management
The emergence of edge AI has been driven by the proclaimed limitations of traditional cloud-centric AI systems, such as latency, privacy concerns, and bandwidth constraints. Edge AI addresses these issues based on data processing locally close to the source of data, such as within devices at the network's edge. This approach reduces latency through immediate data processing and decision-making while enhancing privacy as sensitive data are not shared to the cloud.
Furthermore, due to the reduced data transfers, edge AI decreases the attack surface and the bandwidth usage of AI applications at the edge. Nevertheless, there are still several AI-based use cases where cloud AI is needed. This is, for example, the case with applications that require many data points and are compute-intensive, such as training and using large language models with tens of billions of parameters. To address such use cases while retaining the benefits of edge AI for real-time applications with sensitive data, the preferred AI deployment model combines cloud and edge computing infrastructures.
Hybrid Edge-Cloud reference architecture for IoT systems (Image credit: Wevolver, adapted from: M. Ashouri et al.)
The integration of edge and cloud computing has created a hybrid model that leverages the strengths of both approaches. Edge AI provides real-time processing capabilities, while cloud computing offers the computational power necessary for training complex models and handling very large-scale data analytics. This synergy allows businesses to optimize resource use: edge devices handle immediate, latency-sensitive tasks while the cloud manages more intensive computations and long-term data storage. Also, hybrid models enable continuous learning and model updates. Data processed at the edge can be aggregated in the cloud to refine AI models, which are then redeployed to edge devices for improved performance. This combination enhances scalability, flexibility, and efficiency for a wide range of use cases in different sectors.
During the early days of hybrid cloud/edge AI deployments, companies had to statically define the most appropriate placement of AI models, functions, and workloads considering the compute, energy efficiency, security, privacy, and latency requirements of their use cases. This placement was destined to optimally resolve performance and security trade-offs in the scope of heterogeneous cloud/edge infrastructure. Nowadays, these edge AI and cloud AI infrastructures come with intelligent resource management and AI services orchestration functions that optimize the placement of AI workloads between cloud and edge, considering parameters, application profiles, and use case requirements.
The state of the art (SOTA) in cloud/edge resource management for AI applications involves cross-layer orchestration of AI workflows, as well as employment of edge functions based on stateless cloud/edge deployment paradigms like Function as a Service (FaaS). Emerging resource management approaches are also employing machine learning towards provisioning and placing AI workloads in dynamic and intelligent ways. The latter will be destined to deliver tangible benefits to both infrastructure providers (e.g., cloud providers) and operators/deployers of AI applications. Yet, to fully realize the benefits of edge computing, dedicated AI hardware is evolving to provide high-performance computing in compact, power-efficient form factors.

The Next Generation of Specialized Edge Hardware
In the years to come, edge AI will enable almost all organizations to access their unique layer of intelligence by leveraging their very own data. This will be empowered by the evolution of edge hardware, which will enable very scalable applications. It is, therefore, no accident that during 2024, an AI hardware enterprise, namely NVIDIA, was the company with the fastest growing market capitalization in the New York Stock Exchange (NYSE).
Specialized hardware will accelerate edge AI by providing the necessary computational power in a compact form. As a prominent example, NVIDIA Jetson modules are delivering high-performance capabilities in edge applications like computer vision. To this end, they integrate Central Processing Units (CPUs), Graphics Processing Units (GPUs), memory, and interfaces into small form factors, which makes them ideal for deploying complex deep learning models on edge devices. As another example, Qualcomm's processors (e.g., chips used in Wi-Fi hubs) incorporate powerful Neural Processing Units (NPUs) that enable efficient on-device AI processing.
Energy efficiency is another concern when it comes to edge AI deployments. The latter must be power efficient, which drives the development of Ultra-Low-Power Hardware. For instance, Ambiq’s Subthreshold Power Optimization Technology (SPOT) platform exemplifies ultra-low-power hardware designed for edge applications. SPOT enables devices to operate at significantly reduced voltage levels, which is key for enhancing battery life without sacrificing performance. This technology is, for example, important for digital health devices that require continuous operation without frequent recharging.
While specialized hardware enables efficient computation at the edge, its full potential is only realized when paired with edge-native algorithms optimized for real-time inference, minimal data dependencies, and energy efficiency.
Ambiq's Apollo510 EVB (Image Credit: Ambiq)
Scalable Edge NPU IP for SoC integration, from Embedded ML and computer vision up to Generative AIThe market for edge AI chips in multiple applications is rapidly expanding, driven by the increasing demand for power-efficient, low-latency AI processing directly on devices. Licensable NPU IP (Intellectual Property) is a crucial enabling technology for edge AI chip designers targeting consumer devices, industrial automation, and vehicle safety. Ceva is leading the way by developing scalable NPU IPs that accelerate the deployment of Smart Edge chips and devices. Ceva-NeuPro-Nano: Highly Efficient, Self-Sufficient Edge NPU for Embedded ML ApplicationsWith over 4 billion inference chips for Embedded ML (TinyML) devices forecasted to ship annually by 2029, this Edge NPU IP is the smallest of Ceva’s NeuPro NPU product family. It delivers the optimal balance of ultra-low power and high performance in a small area to efficiently execute Embedded ML workloads across AIoT product categories, including hearables, wearables, home audio, smart home, smart factory, and more. Ranging from 10 GOPS up to 400 GOPS per core, Ceva-NeuPro-Nano enables energy-efficient, always-on audio, voice, vision, and sensing use cases in battery-operated devices across a wide array of end markets. Ceva-NeuPro-Nano is a standalone, fully programmable NPU, not an AI accelerator, and therefore does not require a host CPU/DSP to operate. The IP core includes all the processing elements of a standalone NPU, including code execution and memory management. Its architecture is fully programmable and efficiently executes neural networks, feature extraction, control code, and DSP code. It also supports the most advanced machine-learning data types and operators, including native transformer computation, sparsity acceleration, and fast quantization, delivering a highly optimized solution with excellent performance. Ceva-NeuPro-M: Scalable NPU Architecture for Transformers and Generative AI ApplicationsCeva-NeuPro-M is a scalable NPU architecture with exceptional power efficiency of up to 3500 Tokens per Second/Watt for Llama 2 and 3.2 models. With 30% of generative AI inference predicted to be on-device in the next 2 years, the Ceva-NeuPro-M NPU IP delivers exceptional energy efficiency tailored for edge computing while offering scalable performance to handle AI models with over a billion parameters. Its award-winning architecture introduces significant advancements in power efficiency and area optimization, enabling it to support massive machine-learning networks, advanced language and vision models, and multimodal generative AI. Even mid-range AI workloads, such as computer vision (object detection and classification), speech recognition, and small-scale NLP (keyword spotting), are becoming dominated by the use of transformers (e.g., ViT, BERT). Transformer support in edge NPUs is becoming mandatory for local text generation, context-aware AI assistants, and multimodal models for AR/VR, robotics, and advanced user-interface applications. With a processing range of 400 GOPS to 200 TOPs per core, leading area efficiency, advanced transformer support, sparsity, and compression, the Ceva-NeuPro-M optimizes key AI models seamlessly. Thanks to its highly scalable design, it provides an ideal IP solution for embedding high-performance AI processing in SoCs across a wide range of edge AI applications. The Ceva NeuPro-M and NeuPro-Nano (Image Credit: Ceva)
AI SDK for Ceva-NeuPro NPUsThe Ceva-NeuPro Studio is a robust tool suite that complements the Ceva-NeuPro NPUs by streamlining the development and deployment of AI models. It includes tools for network optimization, graph compilation, simulation, and emulation, ensuring that developers can train, import, optimize, and deploy AI models with the highest efficiency and precision. There are limitless possibilities to build Edge AI chips with diverse AI capabilities, from Embedded ML in consumer and industrial IoT to multimodal and edge generative AI in personal computing and automotive. Learn more about how Ceva’s licensable NeuPro NPUs and wireless connectivity IPs are helping to build chips that power Smart Edge devices - www.ceva-ip.com |
Edge-Native Models and Algorithms
Since the early days of edge AI, emphasis has been put on shrinking conventional machine learning models to fit and deploy them in edge devices. In recent years, there has been a surge of interest in edge-native AI algorithms, enabling real-time inference in the scope of applications like computer vision and speech recognition.
Edge-native algorithms are tailored for real-time inference on resource-constrained devices. They are optimized to balance accuracy with computational efficiency for edge. In this direction, they employ techniques such as model quantization and pruning, which reduce the size of AI models without any essential drop in AI performance. Edge-native algorithms are suitable for deployment on edge devices with limited resources (e.g., CPU and memory resources).
Some of the edge-native algorithms are also classified as “data-efficient” techniques, as they can perform well with smaller datasets, which makes them particularly useful in scenarios where data is limited or costly to obtain. Data-efficient algorithms maintain high-performance levels without the need for large volumes of data that are typically used in traditional machine-learning methods.
To support the scalable and resource-efficient deployments of edge native algorithms, technologies like Docker and Kubernetes are being adapted for edge deployments. These technologies simplify application management across diverse devices, which boosts scalability and resource optimization. Furthermore, there is a rise in DevEdgeOps techniques, i.e., DevOps practices adapted for Edge Environments. Specifically, such DevEdgeOps practices are adapted specifically for edge computing environments to address unique challenges like connectivity issues and diverse hardware requirements. This ensures efficient deployment and maintenance of edge-native applications.
Nowadays, there are also edge AI frameworks that facilitate the development and deployment of edge native AI models and algorithms. For example, tools like TensorFlow Lite and OpenVINO make it easier to deploy AI models optimized for edge environments. Moreover, Edge AI supports hyper-personalized services, such as dynamic traffic management or smart retail experiences, based on the processing of data locally and the development of instant responses.
TensorFlow Lite helps deploy AI models at the edge (Image Credit: Wevolver,. adapted from: SeedStudio)
There are also opportunities for improving edge native algorithms based on their integration with state-of-the-art edge networking infrastructures like 5G networks. Specifically, the synergy between 5G and edge computing enhances the performance of edge algorithms by providing ultra-low latency and high-speed data transmission. This class of 5G-enabled, enhanced algorithms will play a considerable role in applications like autonomous vehicles, remote surgeries, and various immersive augmented reality applications.
Edge-native models will be increasingly deployed on a smaller scale in miniaturized devices, giving rise to the micro-edge and thin-edge AI paradigm. The latter refers to deploying lightweight AI models on minimal hardware resources, which is essential for extending AI capabilities to smaller devices like sensors or wearables.
Moving LLMs and Generative AI to the Edge
For over two years following the emergence of OpenAI’s ChatGPT, Large Language Models (LLMs) and Generative AI (GenAI) have been considered among the most promising developments of the AI community, especially with the recent surge of competing models like DeepSeek R1, Anthropic’s Claude 3.5, Google’s Gemini 1.5 and 2.0, and many more. Yet, as detailed in Wevolver’s recent report titled “Edge AI Technology Report: Generative AI Edition,” GenAI is increasingly finding its way away from cloud servers toward the edge. The local execution of generative models enables devices to provide personalized experiences without relying heavily on cloud resources. This local processing reduces latency and enhances privacy, which are among the key considerations in environments with intermittent connectivity or stringent security requirements.
Companies like Qualcomm and Arm are leading efforts to make GenAI models smaller and more efficient, in order to make them usable in real-time applications like autonomous vehicles, smart homes, and industrial Internet of Things (IoT) applications.
In the coming years, the integration of GenAI into edge devices will become more ubiquitous across consumer electronics and industrial systems. Furthermore, advances in hardware (e.g., specialized processors) and software frameworks (e.g., ExecuTorch) will continue to drive further improvements in AI model performance and accuracy. Most importantly, GenAI at the edge will enable autonomous AI agents that will be capable of solving problems, handling complex tasks, and collaborating with other agents. These agents will mimic human digital workers at the edge, which will be able to complete complex tasks close to the field in near real time.
Speaking of mimicking humans, a different paradigm is emerging: neuromorphic computing. Inspired by the brain’s event-driven processing, neuromorphic chips aim to unlock ultra-low-power AI capabilities, addressing efficiency concerns that traditional architectures struggle with.

The Role of Neuromorphic Chips
Neuromorphic chips represent an emerging technology that is designed to mimic the human brain's neural architecture. These chips are inherently efficient at processing sensory data in real time due to their event-driven nature. Therefore, they hold promise to advance edge AI based on a new wave of low-power solutions that will be handling complex tasks like pattern recognition or anomaly detection.
In the next few years, neuromorphic chips will become embedded in smartphones, enabling real-time AI capabilities without relying on the cloud. This will allow tasks like speech recognition, image processing, and adaptive learning to be performed locally on these devices with minimal power consumption. Companies like Intel and IBM are advancing neuromorphic chip designs (e.g., Loihi 2 and TrueNorth, respectively) that consume 15–300 times less energy than traditional chips while at the same time delivering exceptional performance.
Emerging technologies like memristors and 3D architectures will improve the scalability and efficiency of neuromorphic chips. Memristors emulate synaptic behavior for more brain-like processing, while 3D integration reduces latency and enhances computational density. Furthermore, event-driven processing models (e.g., spiking neural networks) will be used to further optimize energy efficiency by mimicking the asynchronous nature of biological neurons. As AI models evolve toward brain-inspired architectures, interpretability becomes a key consideration. Ensuring that decisions made by both conventional and neuromorphic AI systems remain transparent and understandable is critical, especially in high-risk domains.
Explainability in Edge AI: Building Trust and Transparency
Edge AI models’ simple and domain-specific nature renders them more interpretable than large cloud-based models. However, explainability remains a key requirement for regulatory compliance, trust, and real-world deployment, especially in industries like healthcare, finance, and industrial automation. In a recent book on “Advancing Edge Artificial Intelligence,” the authors emphasized that edge AI must provide transparent, comprehensible decisions to gain adoption in safety-critical applications, where end-users require the reasoning behind a certain prediction.
Explainable AI (xAI) at the edge refers to the ability of edge AI models to provide transparent, interpretable, and justifiable decisions while operating under resource constraints. It ensures that AI-driven predictions can be understood, audited, and trusted by users, regulators, and stakeholders, particularly in high-risk applications.
Unlike cloud-based AI, which can rely on compute-intensive explainability methods like shapley additive explanations (SHAP) or local interpretable model-agnostic explanations (LIME), explainability in edge AI must balance interpretability with real-time performance. Hardware constraints, processing power, and latency requirements can impact the feasibility of such explainability methods at the edge. Lightweight variants of such techniques, including saliency maps (such as Grad-CAM), precomputed feature attribution (i.e., SHAP or LIME sent from cloud to edge), and context-aware explanations (i.e., rule-based, lightweight interpretable models) can help make decisions more transparent without compromising efficiency.
For example, the explainability technique Grad-CAM helped manufacturing engineers verify defect detection models by ensuring the AI focuses on actual product flaws rather than irrelevant background features. In healthcare, it assisted in medical imaging by confirming that models focus on relevant areas, such as lung regions in pneumonia detection, thereby enhancing trust among medical professionals.
On the regulatory front, edge AI must meet regulatory requirements, particularly in healthcare, finance, and autonomous systems, where legal frameworks like the EU AI Act and GDPR mandate transparency. A 2025 paper on responsible AI highlights explainability as critical for high-stakes applications such as surgical planning and risk assessment, ensuring that AI-generated decisions remain auditable and justifiable. Similarly, Trustful AI underscores that traceability is just as vital as accuracy, enabling regulators and stakeholders to scrutinize AI outcomes.
Beyond compliance, explainability is essential for bias detection and debugging. A study on edge security cameras revealed
Dataset biases in edge devices like security cameras can cause models to misidentify objects, such as individuals in wheelchairs, as these cameras were trained mostly on standing figures. Researchers used xAI techniques like D-RISE, thanks to its adaptability to diverse models, to identify feature dependencies, leading to targeted dataset augmentation and improved fairness. They also demonstrated how feature attribution allowed engineers to pinpoint which sensor readings influenced predictive maintenance models, making AI-driven insights more actionable.
Autonomous vehicles and industrial robots are other applications for edge explainable AI (xEdgeAI), requiring transparency and explanations for their actions, particularly in failure scenarios where human oversight is necessary. However, achieving this without compromising performance remains a challenge. Emerging concept-based explanations and real-time saliency maps are improving interpretability while maintaining efficiency. As edge AI adoption grows in 2025, explainability will become an operational necessity. Organizations must implement explainability frameworks that balance transparency, performance, and trust, ensuring AI-driven decisions remain both reliable and actionable.
Privacy-Preserving Distributed Learning Paradigms for Edge AI
Distributed learning paradigms like federated learning (FL) and swarm learning (SL) enable different actors to share data for AI model training and execution in a way that preserves privacy. As a prominent example, FL enables decentralized model training across distributed data sources, while preserving data privacy and security. The FL paradigm allows multiple entities (e.g., IoT devices and edge servers) to collaboratively learn a shared model without exchanging their local data.
Specifically, federated learning mitigates the issues related to data silos and residency requirements through collaborative learning that does not centralize sensitive information. Currently, FL deployments are in their infancy, as this distributed learning paradigm faces challenges such as data heterogeneity, communication overhead, and vulnerability to attacks. To alleviate these issues, ongoing research focuses on the development of robust federated learning frameworks and secure aggregation protocols.
Swarm learning is another innovative distributed learning paradigm, which is inspired by swarm intelligence. It employs a decentralized network of nodes, each with its own data and AI model, to collaboratively learn from one another without exposing the underlying data. Many practical implementations of the SL approach leverage blockchain technology to ensure trust, security, and consensus among participating nodes.
Specifically, SL implementations allow nodes to exchange encrypted model updates through blockchain-based protocol, which enhances security and reduces the risk of a single point of failure. In the scope of an SL deployment, nodes can form dynamic swarms based on shared interests or goals, which allows them to benefit from collective intelligence without centralized control. SL is expected to offer numerous advantages for AI collaboration. However, it is also associated with various technical, social, and ethical challenges, which ask for further research prior to the widespread deployment and adoption of this technology.
Overall, the technological enablers of edge AI in 2025 are transforming how data is processed across networks. These technology enablers address the main limitations of cloud-centric approaches through local processing capabilities, specialized hardware, optimized algorithms, and innovative chip designs. In the years to come, they will enable Edge AI to play a pivotal role in offering a unique layer of intelligence to organizations in almost all different sectors.
