Google Unveils Ironwood TPU with Record 1.77PB Shared Memory

Key Points
- Ironwood is Google’s seventh‑generation TPU, featuring a dual‑die architecture.
- Each chip delivers 4,614 TFLOPs of FP8 performance and 192 GB of HBM3e memory.
- A full pod of 9,216 chips provides a record 1.77 PB of directly addressable shared memory.
- The system achieves 42.5 exaflops of performance while improving power efficiency twofold over the previous generation.
- Advanced RAS features include on‑chip root of trust, self‑test functions, and silent‑data‑corruption mitigation.
- Liquid‑cooling with cold‑plate technology supports the high‑density design.
- AI‑assisted design optimized the chip’s ALU and floor plan, and a fourth‑gen SparseCore accelerates recommendation workloads.
- Ironwood is already being rolled out in Google Cloud data centers for large‑scale inference.
Google introduced its seventh‑generation Tensor Processing Unit, dubbed Ironwood, at a recent Hot Chips event. The dual‑die chip delivers 4,614 TFLOPs of FP8 performance and pairs each die with eight stacks of HBM3e, providing 192 GB of memory per chip. When scaled to a 9,216‑chip pod, the system reaches 1.77 PB of directly addressable memory—the largest shared‑memory configuration ever recorded for a supercomputer. The architecture includes advanced reliability features, liquid‑cooling infrastructure, and AI‑assisted design optimizations, and is already being deployed in Google Cloud data centers for large‑scale inference workloads.
Google revealed its latest Tensor Processing Unit, named Ironwood, as the company’s first TPU built primarily for massive inference workloads rather than training. The chip integrates two compute dies, each delivering 4,614 TFLOPs of FP8 performance. Eight stacks of HBM3e memory provide 192 GB per chip, delivering 7.3 TB/s bandwidth. The dual‑die design enables the system to scale without glue logic, supporting up to 9,216 chips per pod.
Record‑Setting Shared Memory
When fully assembled, the Ironwood pod offers 1.77 PB of directly addressable HBM memory, setting a new world record for shared‑memory supercomputers. The massive memory pool is linked via optical circuit switches that connect the racks, allowing the system to maintain high bandwidth while scaling.
Performance and Efficiency
Across the full pod, the configuration reaches 42.5 exaflops of performance. Google claims a two‑fold improvement in performance per watt compared with its previous generation, Trillium, thanks to dynamic voltage‑frequency scaling and a cold‑plate liquid‑cooling solution that leverages the company’s third‑generation cooling infrastructure.
Reliability, Availability, and Serviceability (RAS)
Ironwood incorporates several on‑chip reliability features, including a root of trust, built‑in self‑test functions, and mechanisms to mitigate silent data corruption. Logic‑repair functions improve manufacturing yield, and the system can reconfigure around failed nodes, restoring workloads from checkpoints.
AI‑Assisted Design and SparseCore
Google used AI techniques to optimize the ALU circuits and floor plan of the Ironwood chip. A fourth‑generation SparseCore is added to accelerate embeddings and collective operations, targeting workloads such as recommendation engines.
Deployment and Availability
Google has begun deploying Ironwood in its hyperscale cloud data centers, though the TPU remains an internal platform not directly offered to external customers. The design reflects Google’s long‑term strategy to build high‑end AI compute across chip, interconnect, and physical infrastructure layers.