Limiting Factor in Modern System-on-Chip (SoC) Design
Why Data Movement, Not Compute, Is Now the Limiting Factor in Modern System-on-Chip (SoC) Design
Compute Scaling Has Exposed a New Bottleneck
For decades, system-on-chip (SoC) design has been driven by improvements in compute performance. Advances in transistor scaling enabled higher clock speeds, greater parallelism, and increasingly specialized processing units. As a result, architectural priorities focused on maximizing compute throughput.
That paradigm is now shifting. As SoCs grow in complexity—driven by AI workloads, heterogeneous integration, and evolving system-level requirements—performance is no longer determined solely by compute capability. Instead, it is increasingly limited by the efficiency with which data moves across the system.
This reflects a broader transition: modern SoCs are becoming data-movement-bound rather than compute-bound. Delivering data with the required bandwidth, latency, and energy efficiency has become a central design challenge. In many cases, adding more compute resources does not translate into higher performance if those resources cannot be fed with data efficiently.
This shift has been observed across multiple generations of SoC design, particularly by companies focused on on-chip connectivity, such as Arteris, where evolving system requirements have increasingly exposed data movement as a limiting factor.
What Changed Inside Modern SoCs
Modern SoCs are highly heterogeneous systems, integrating CPUs, GPUs, NPUs, DSPs, and domain-specific accelerators. Each of these blocks has distinct compute characteristics and data access patterns, increasing coordination complexity across the chip.
This complexity is further amplified by the adoption of chiplet-based and multi-die architectures. Instead of a single monolithic die, functionality is often distributed across multiple dies within a package. While this approach improves scalability, yield, and design flexibility, it also introduces additional communication layers—both on-chip and inter-die—making data movement more complex and resource-intensive.
At the same time, memory is becoming more distributed and physically distant from compute. Data may traverse multiple interconnect layers, including on-chip networks, inter-die links, and off-chip memory interfaces, before reaching processing elements. Each step adds latency and energy overhead.
As a result, workloads increasingly stall waiting for data rather than executing instructions. This trend is particularly evident in AI systems, where performance is strongly influenced by memory access patterns, bandwidth availability, and data reuse efficiency [1]. In such systems, the ability to move data efficiently can matter as much as compute capability itself.
The Real Cost of Moving Data
The cost of moving data is now a dominant factor in system design, both in terms of energy and performance.
Accessing data from memory can consume significantly more energy than performing arithmetic operations on that data [2] [3]. As system complexity increases, this imbalance becomes more pronounced, making data movement a primary contributor to overall power consumption.
This leads to several interconnected system-level challenges:
Energy consumption: Data movement can account for a large share of total power usage, particularly in data-intensive workloads
Latency: Delays in data delivery reduce effective compute utilization and increase idle cycles
Throughput limitations: Bandwidth contention between multiple agents constrains overall system performance
Traditional interconnect approaches, such as shared buses or simple point-to-point links, do not scale effectively under these conditions. As the number of components grows, these approaches introduce contention, congestion, and inefficiencies that limit performance gains.
Industry discussions increasingly frame this as a “data movement problem,” where optimizing compute alone is insufficient. Instead, improving how data flows through the system becomes essential for achieving performance, efficiency, and scalability [4].
Network-on-Chip as System Infrastructure
To address these challenges, modern SoCs rely on network-on-chip (NoC) architectures. A NoC provides structured communication between components through routing, arbitration, and flow control mechanisms.
Rather than serving as simple wiring, the NoC functions as core system infrastructure—much like a biological circulatory system distributes resources within an organism. It enables data to move efficiently between distributed compute and memory elements while managing contention and prioritization.
The design of the NoC directly affects:
End-to-end latency and quality of service
Achievable bandwidth and throughput
Power and energy efficiency of communication
Scalability as system complexity increases
Because of this, NoC design is increasingly treated as an early architectural decision rather than a late-stage integration task. Early consideration allows designers to align compute placement, memory hierarchy, and communication topology more effectively.
Companies specializing in interconnect IP, such as Arteris, have highlighted this shift across multiple SoC generations, emphasizing that NoC design is foundational to achieving performance, power, and scalability targets in complex systems [5].
Why This Matters Across Markets
The constraints imposed by data movement are not theoretical—they translate directly into measurable system-level trade-offs across industries, affecting cost, performance, and efficiency:
AI systems: Performance is frequently limited by memory bandwidth and data reuse efficiency. Even highly optimized accelerators can experience reduced utilization when data cannot be delivered efficiently, leading to underused compute resources [6].
Automotive systems: In addition to latency and isolation requirements, systems must comply with functional safety standards such as ISO 26262, including Automotive Safety Integrity Levels (ASIL). These requirements impose strict constraints on communication predictability, determinism, and fault isolation, making reliable data movement critical [7].
Edge and mobile devices: Tight power and thermal budgets mean inefficient data movement directly impacts battery life, thermal limits, and system responsiveness.
Data centres: Energy efficiency and total cost of ownership are increasingly influenced by how effectively data is moved and managed, rather than by raw compute scaling alone.
Across these domains, inefficient communication infrastructure leads to higher costs, reduced throughput, and underutilized compute resources. Conversely, architectures that optimize data movement enable better system performance, improved energy efficiency, and more predictable behaviour. This is where specialized interconnect solutions, such as those developed by Arteris, help teams manage complexity, optimize data flow, and meet performance and efficiency targets.
Takeaway: Data Movement Is Now an Architectural Concern
The transition from compute-centric to data-movement-centric design represents a fundamental shift in SoC architecture.
Data movement is no longer a secondary concern—it is a primary determinant of system performance, efficiency, and scalability. As compute resources continue to scale and diversify, the ability to deliver data efficiently has become just as important as the ability to process it.
This makes interconnect design, and particularly network-on-chip architecture, a strategic decision that must be addressed early in the design process. Choices made at this stage influence not only performance and power, but also system flexibility, scalability, and long-term design viability.
In this context, effective data movement becomes a key enabler and competitive differentiator for SoC teams, enabling them to better utilize compute resources, meet performance targets, and manage power and system complexity. Systems that can efficiently connect distributed compute and memory resources are better positioned to meet the demands of modern workloads across AI, automotive, edge, and data center applications.
As SoCs continue to scale in complexity, the ability to design for efficient data movement will play a central role in defining next-generation system architectures.
References
Hennessy, J. L., & Patterson, D. A. (2019). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.
Horowitz, M. (2014). Computing’s energy problem (and what we can do about it). IEEE ISSCC.
NVIDIA, “NVIDIA Hopper Architecture In-Depth,” NVIDIA, Mar. 22, 2022. [Online]. Available: https://resources.nvidia.com/en-us-hpc-ai/nvidia-hopper-architecture
EE Times Asia. The Data Dilemma: Cracking the Code of Data Movement for the Next Wave of Semiconductor Innovation.
Arteris. Data Movement in SoCs: NoC Performance, Power & Scalability.
Jouppi, N. P., et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. ISCA. https://doi.org/10.1145/3079856.3080246
ISO 26262. (2018). Road vehicles – Functional safety. International Organization for Standardization.