The Evolution of SoC Architectures: AI Introduced Paradigm Shift

Daniel Ezekiel
Mar 25
3 min read

Updated: Mar 27

Introduction

As David Patterson stated, ‘’We are entering the ‘Golden Age of Computer Architecture''.

Driving advancements across technology and business sectors powered by computational evolution. With Moore’s Law slowing and evolving into concepts like chiplets, several critical inflection points are shaping the future of semiconductors:

1. New ISA: RISC-V

2. HW Accelerators: AI Inferencing & Multi-Modal Sensing

3. Leading Nodes: Entering the Angstrom Era

4. Logic and Memory Integration on the same advanced node (die)

5. Alternatives to Von Neumann architecture are :

- Neuromorphic computing

- Analog Computing

- Wafer Scale Engines (Cerebras)

- At-Memory-Compute (Untethered AI)

- CPU-GPU-TPU-XPU on a common ISA (Libre-SOC)

SOC Pillars

The rise of mobile phones transformed the SoC landscape, extending its reach from IoT devices on the lower end to laptops, tablets, and even robotics, drones, automotive and servers on the higher end. This shift enabled a unified SoC solution catering to diverse market segments.

Initially, mobile SoCs were defined by three key pillars:

Compute-Centric: General-purpose processing and application execution.

Visual-Centric: Graphics, imaging, display, and video processing.

Modem-Centric: Cellular and Non-cellular connectivity (WiFi, Bluetooth, RFID, etc.).

Over the past decade, a fourth pillar emerged—Sensor-Centric AI Inferencing, integrating multimodal sensing and Inferencing. These advancements are reshaping mobile platforms, demanding higher computational efficiency and real-time processing.

Breaking away from the von Neumann architecture and moving toward an architecture where memory and compute are co-located would be the paradigm shift needed in the future. Moving different elements of compute viz., CPU, GPU, (X)PU, IP Accelerators

While mobile SoCs evolved toward greater integration, PC-class chipsets continued catering to lower-end server SoCs. The demand for AI inferencing and distributed (hybrid) AI training and inferencing led to architectures requiring high data movement with minimal latency. This shift drove the need for heterogeneous compute models.

Comparing Traditional vs. Emerging SOC Architectures for AI

Since 90% of the processor energy consumption is attributed to the movement of data between the processor and the memories. The paradigm shift comes from the following principles:

Collocate memory and compute to minimise data movement - either In-memory-compute or At-memory-compute
Ease of data movement/common memory between various elements of compute - Common memory across CPU/GPU/etc…

Companies like Cerebras, Groq, IBM, and Untether AI are pioneering architectures that rethink traditional memory and compute paradigms to overcome existing bottlenecks.

Traditional AI Accelerator Architecture (NVIDIA, AMD, Intel, TPU)

Most AI workloads today rely on GPUs (e.g., NVIDIA A100/H100, AMD MI300X) or custom AI accelerators (Google TPU, AWS Inferentia, Intel Gaudi). These architectures optimize matrix multiplication for AI training and inference but are limited by memory bandwidth constraints.

Key Characteristics:

Memory Hierarchy: L1/L2/L3 cache, HBM2e/HBM3, DDR5

Interconnect: PCIe 5.0, NVLink, UCIe

Processing Units: thousands of GPU cores, tensor cores, and AI accelerators

Bottleneck: Expensive HBM and slow off-chip data movement

Groq’s Tensor Streaming Processor (TSP) Architecture

Groq introduces a dataflow-based AI inference model, eliminating traditional memory hierarchies. Instead of frequent data fetches from external memory (HBM), Groq’s Tensor Streaming Processor (TSP) executes AI workloads with near-deterministic latency.

Key Characteristics:

Eliminates Memory Bottlenecks: Data moves through compute elements in a structured pipeline.

Lower Power Consumption: Reduces energy due to memory access.

High Deterministic Latency: Ideal for real-time AI applications like robotics and industrial automation.

Libre-SOC (IBM): Unified CPU-GPU ISA

Libre-SOC aims to eliminate CPU-GPU memory transfer bottlenecks by adopting a common memory and instruction set architecture (ISA). This allows:

Transparent parallelization handled by GPU units.

Efficient sequential execution managed by the CPU

By integrating heterogeneous compute elements under a unified ISA and using a common memory, Libre-SOC tackles the challenge of scaling AI workloads efficiently while optimizing power by reducing movement of memory across the compute elements and reducing overall latency.

Future Trends in AI Hardware

The AI landscape is rapidly shifting from traditional GPU/CPU-based acceleration to innovative architectures like dataflow processing (Groq), unified ISA & memory models (Libre-SOC), and At-memory computing (Untether AI). These approaches promise lower latency, higher efficiency, and scalable AI inferencing, reshaping the next generation of semiconductor designs.

Key Developments On The Horizon:

Hybrid AI Chips: A combination of GPUs, CPUs, XPUs, dataflow accelerators, and near-memory computing to balance training and inference.

Memory-Centric Computing: Moving beyond traditional von Neumann architectures to enable compute-in-memory solutions.

AI-Optimized Interconnects: New interfaces like NVLink, UCIe, Optical interconnects, and advanced chiplet designs to reduce bottlenecks.

As AI workloads continue to evolve, the industry will see a transition from traditional power-hungry models to more efficient, specialized hardware tailored for real-time inference and large-scale AI applications. The rise of multiple processor architectures clearly is enabled by the advent of AI workloads.

ps: On a similar thread - refer to my earlier article on : Chiplet Evolution and Neuromorphic computing