Google Ironwood TPU Powers Advanced AI Inference

High-Throughput Inference with Ironwood: A Technical Overview

Google shows its strength in artificial intelligence through its latest release of the seventh-generation Tensor Processing Unit called Ironwood. Google has created a specialized chip that demonstrates a significant step forward in its hardware development approach by addressing the complex requirements of its top Gemini AI models beyond mere incremental changes. Ironwood specializes in simulated reasoning tasks, which Google defines as “thinking,” and it represents the beginning of a new AI age.

Ironwood’s Design and Purpose

Ironwood’s capabilities derive from major improvements in both its performance and architectural design. Ironwood achieves greatly improved throughput capabilities and functions specifically within expansive liquid-cooled clusters when compared to earlier TPU models. Individual chips within these clusters number up to 9,216 as they connect through an improved Inter-Chip Interconnect (ICI), which enables fast and efficient data communication and exchange. The scalable system architecture enables both Google’s internal research teams and external Google Cloud developers to work with configurations that vary from 256-chip servers to full 9,216-chip clusters.

Google’s Vision for AI

Google predicts that the improved speed and power efficiency of Ironwood’s memory architecture will transform its AI ecosystem and enable substantial advancements. Ironwood offers a powerful computational base for advanced AI models, which should lead to breakthrough developments across natural language processing and machine learning domains as well as agentic AI technology. The upcoming generation of AI will function proactively with autonomous data collection abilities to reason about information and execute tasks for users through minimal explicit instructions. Ironwood plays a crucial role in Google’s ongoing quest to push AI technology to new heights.

The Driving Force Behind Ironwood

Through Ironwood’s development, Google demonstrates its belief that advanced AI models require dedicated infrastructure to reach their highest potential. Ironwood transcends its faster processing capabilities to function as the foundation of Google’s strategy, which aims to boost AI inference speeds and widen context windows for AI models to achieve agentic AI potential. Google’s new “age of inference” paradigm shift describes AI systems that take proactive actions to serve users.

Ironwood’s Technical Specifications

The core specifications of Ironwood demonstrate its high computational capabilities. An Ironwood pod with full configuration reaches 42.5 Exaflops in inference computing performance. Ironwood chips can achieve peak performance levels up to 4,614 TFLOPs which represents a major advancement from earlier TPU generations. The enhanced processing capabilities in Ironwood benefit from a substantially improved memory architecture. The new chip includes 192GB of high-bandwidth memory which represents six times more capacity than the Trillium TPU. The memory bandwidth now achieves 7.2 Tbps which represents a 4.5 times improvement.

Benchmarking Ironwood

Google has published benchmark data to evaluate Ironwood’s performance using FP8 precision as its main measurement standard. The company claims Ironwood “pods” deliver a 24-fold speed advantage over similar segments of top supercomputers, but this statement requires careful interpretation. Google recognizes that not all supercomputing systems inherently support FP8 precision, which affects the comparison. The evaluation did not contain direct performance comparisons with Google’s TPU v6 (Trillium). Ironwood delivers double the performance per watt relative to Trillium, according to Google’s claims, which show improved energy efficiency. A Google representative stated that Ironwood succeeds the TPU v5p system and Trillium comes after the TPU v5e system. The maximum FP8 computing power of Trillium reached about 918 TFLOPS.