VOOZH about

URL: https://tech-insider.org/google-tpu-8t-8i-broadcom-mediatek-nvidia-2026/

⇱ Google TPU 8t and 8i: 121 Exaflops, $21B Nvidia Challenge


Skip to content
April 23, 2026
19 min read

Google split its future silicon in two this week. At Cloud Next ’26 on April 22, 2026, the company unveiled its eighth-generation Tensor Processing Units – the TPU 8t for training and the TPU 8i for inference – ending a decade of general-purpose AI accelerators in favor of a bifurcated architecture designed specifically for the agentic era. The announcement came just six months after Ironwood, the seventh-generation TPU, reached general availability, and it arrives as Anthropic pours up to 1 million chips into Claude training and Broadcom’s Google-linked AI revenue races toward a projected $42 billion in 2027.

The TPU 8t superpod packs 9,600 liquid-cooled chips to deliver 121 FP4 exaflops of peak compute – roughly a 3x leap over Ironwood’s 42.5 exaflops – while the TPU 8i targets the latency-sensitive inference workloads that now account for more than 70% of AI accelerator cycles. Each TPU 8i carries 288 GB of HBM, 8.6 TB/s of memory bandwidth, 384 MB of on-chip SRAM, and 10.1 petaflops of FP4 compute. Google says the chip delivers 80% better inference performance per dollar than Ironwood, a metric that matters more with every Gemini query, YouTube recommendation, and Anthropic Claude request.

The rollout is also a supply-chain story. Broadcom designs TPU 8t (codename Sunfish), MediaTek designs TPU 8i (codename Zebrafish), and TSMC will fabricate both on its 2nm process, with late-2027 availability targeted for customers outside Google. The structural shift redistributes billions of dollars in silicon contracts and positions Google’s custom-chip empire as the most credible counterweight to Nvidia’s data center GPU monopoly – which still controls more than 80% of the $400 billion AI accelerator market in 2026.

Inside the TPU 8t: 121 Exaflops and the 9,600-Chip Superpod

The TPU 8t is Google’s most aggressive bet on training scale. Each superpod aggregates 9,600 chips through a 3D torus interconnect with 19.2 Tbps of bidirectional scale-up bandwidth and 400 Gbps of scale-out networking, producing 121 FP4 exaflops of peak performance and 2 petabytes of shared unified high-bandwidth memory. That is 2.85x the compute of an Ironwood superpod at 42.5 exaflops, a jump Google attributes to a new networking fabric that supports near-linear scaling to one million chips in a single logical cluster.

Google’s announcement stresses price-performance rather than raw speed: TPU 8t offers up to 2.8x better training price-performance than Ironwood, a KPI that matters more than peak flops once hyperscaler customers move from research clusters to amortized production training. Storage access is also 10x faster, an under-discussed feature that cuts idle GPU time during checkpointing of trillion-parameter models – the dominant efficiency killer in LLM training runs according to internal MLPerf data shared by Google engineers.

Memory bandwidth per chip is roughly 30% higher than Ironwood’s 7.37 TB/s, and Google has doubled the inter-chip bandwidth of the prior generation. Broadcom is the design partner – continuing the relationship that began with TPU v5 in 2023 – and sources at Moor Insights & Strategy estimate Broadcom’s ASIC content per TPU 8t exceeds $4,000, meaning a single 9,600-chip superpod carries more than $38 million in Broadcom silicon alone. The broader Mizuho estimate, cited in Google’s own Cloud Next briefing, projects Broadcom’s Google-and-Anthropic AI revenue at $21 billion in 2026 and $42 billion in 2027.

TPU 8i: The Inference Chip Built for the Agentic Era

The TPU 8i (codename Zebrafish) is the more technically distinctive of the two chips. Rather than chase training flops, it is engineered around latency – the metric that decides whether an AI agent feels real-time or sluggish. Each TPU 8i delivers 10.1 FP4 petaflops of compute, fed by 384 MB of on-chip SRAM (triple the prior generation), 288 GB of HBM, and 8.6 TB/s of memory bandwidth. A pod aggregates 1,152 chips for 11.6 FP8 exaflops of compute, a 9.6x increase over Ironwood’s 1.2 FP8 exaflops on a 256-chip pod.

The real breakthrough is topology. Google introduced a new Boardfly interconnect design that doubles inter-chip bandwidth to 19.2 Tbps while cutting the ICI (inter-chip interconnect) network diameter by more than 50%. A dedicated Collectives Acceleration Engine reduces on-chip collective latency by up to 5x, a feature engineered specifically for agentic workloads where thousands of small tool-use calls dominate latency budgets. The net result: 80% better inference performance per dollar versus Ironwood.

MediaTek is the surprise design partner. The Taiwanese firm, best known for smartphone SoCs, has spent two years building a data-center silicon team and now owns the physical implementation of TPU 8i. The deal makes MediaTek the second-largest beneficiary of Google’s custom-silicon spend and signals that Broadcom’s near-monopoly on hyperscaler AI ASICs has ended. Hyperframe Research called the dual-partner model “the end of general-purpose silicon” in a research note published the same day as the announcement.

TPU Specification Comparison: Ironwood, 8t, 8i, and Nvidia Blackwell

SpecificationTPU v7 IronwoodTPU 8t (Training)TPU 8i (Inference)Nvidia B200 (reference)
AnnouncedApril 2025 (GA Nov 2025)April 22, 2026April 22, 2026March 2024
Design PartnerBroadcomBroadcom (Sunfish)MediaTek (Zebrafish)Nvidia / TSMC
Fab ProcessTSMC 3nmTSMC 2nmTSMC 2nmTSMC 4NP
FP4 per Chip~4.6 PFLOPS~12.6 PFLOPS10.1 PFLOPS~20 PFLOPS
Superpod Compute42.5 FP4 EFLOPS121 FP4 EFLOPS11.6 FP8 EFLOPSN/A single chip
Chips per Pod9,2169,6001,15272 (NVL72)
HBM per Chip192 GB HBM3E~250 GB (est.)288 GB192 GB HBM3E
HBM Bandwidth7.37 TB/s~9.5 TB/s8.6 TB/s8 TB/s
On-chip SRAM128 MB256 MB (est.)384 MB~256 MB
Interconnect9.6 Tbps19.2 Tbps19.2 Tbps1.8 TB/s NVLink
Customer AvailabilityGA nowLate 2027 previewLate 2027 previewShipping

The table exposes Google’s strategic choice. Nvidia’s B200 still edges out TPU 8t on raw per-chip FP4 throughput, but Google wins on pod-scale aggregation thanks to its proprietary ICI interconnect – a fabric that scales to 9,600 chips without the NVLink switching overhead that caps Nvidia NVL72 systems at 72 chips before dropping to slower InfiniBand. For trillion-parameter training runs, that scale advantage compresses wall-clock training time by an estimated 35-45% relative to equivalently priced Nvidia systems.

Anthropic’s 1 Million-TPU Commitment and the Compute Pipeline

Anthropic is the anchor customer that validates Google’s silicon roadmap. In October 2025, the Claude maker committed to use up to 1 million Ironwood TPUs, with access to more than 1 gigawatt of compute in 2026 and 3.5 gigawatts by 2027. The deal, structured as a multi-year Google Cloud commitment reportedly valued in the tens of billions of dollars, makes Anthropic the single largest customer of Google’s custom silicon and the only external hyperscaler tenant with guaranteed access to TPU 8t when it ships.

That commitment is a lifeline for Google Cloud, which still trails AWS’s 31% market share and Azure’s 24% share in the $200-billion-plus infrastructure-as-a-service market. Google Cloud’s share hovered around 12% in Q1 2026, but its TPU-driven AI workload growth has outpaced rivals: AI compute revenue at Google Cloud grew 71% year-over-year in the quarter, and TPU utilization has run above 90% at every major data center region since January. Google CEO Sundar Pichai told investors on the most recent earnings call: “We are seeing substantial demand for our AI infrastructure products, including TPU-based and GPU-based solutions. It’s been one of the key drivers of our growth over the past year, and we continue to see very strong demand ahead.”

Anthropic is not alone. Meta has a standing TPU rental arrangement with Google Cloud, an unusual concession from Mark Zuckerberg’s company given Meta’s own $35 billion MTIA deal with Broadcom. Salesforce, Midjourney, and Replit also operate production workloads on Ironwood. Google declined to disclose new customer commitments for TPU 8t or 8i, but Cloud CEO Thomas Kurian told reporters on April 20 that the dual-chip design “is a natural evolution – we felt that power efficiency would become a constraint as people continue to scale both training and inference,” a hint that several existing hyperscale customers have already pre-committed capacity.

The Four-Partner Supply Chain Reshaping AI Silicon

Google’s decision to split design work across Broadcom and MediaTek is part of a broader four-partner silicon strategy disclosed at Cloud Next. Beyond the two TPU design houses, Google has added Marvell as a networking ASIC partner and Intel Foundry Services as a secondary fabrication option for next-generation chips starting in 2028. Each partner owns a specific layer of the stack, a deliberate diversification away from the single-supplier model that left Nvidia with pricing clout across the industry.

Broadcom remains the biggest winner. Mizuho analysts estimate Broadcom’s AI custom-silicon revenue from Google and Anthropic hits $21 billion in 2026, more than doubling to $42 billion in 2027 as TPU 8t ramps. The firm’s total AI revenue guidance now implies that custom ASICs will surpass merchant networking chips as Broadcom’s largest growth engine by 2028. Shares of Broadcom climbed 4.2% in after-hours trading on April 22, 2026, after the TPU 8t announcement confirmed its role as the training-chip design partner.

MediaTek’s entry into hyperscale AI silicon is arguably the bigger disruption. The company’s data-center business was a rounding error in 2024 but is projected to reach $3.2 billion in 2027 revenue, according to analyst estimates cited by DataCenterDynamics. MediaTek shares jumped 8.7% in Taipei trading the morning after the announcement, the largest single-day move in two years. Marvell – which also designs Amazon’s Trainium2 – extends its hyperscaler ASIC footprint across every major cloud except Microsoft Azure, which still relies on its internal Maia team and AMD Instinct MI400 deployments.

Performance vs Nvidia Blackwell: The Inference Showdown

Nvidia’s Blackwell architecture – B200 today, GB300 and Vera Rubin R100 coming – remains the performance benchmark Google measures itself against. On raw FP4 per-chip compute, B200 still wins: roughly 20 petaflops of FP4 versus 10.1 petaflops for TPU 8i and an estimated 12.6 petaflops for TPU 8t. The GB200 NVL72 rack delivers approximately 1.44 exaflops of FP4 compute across 72 chips, a configuration that has dominated generative AI training since mid-2024.

But Google’s answer is scale. A 9,600-chip TPU 8t superpod hits 121 FP4 exaflops – roughly 84x the FP4 compute of a single NVL72 rack – and does it inside a single coherent memory domain. Nvidia customers who need that scale must stitch together multiple NVL72 racks through slower InfiniBand or Spectrum-X networking, typically losing 20-30% of theoretical throughput to synchronization overhead. For the largest foundation model training runs – think GPT-6, Gemini 4, and Claude 5 scale – Google’s architectural choice meaningfully changes the economics.

Pierre Ferragu of New Street Research summarized the competitive picture for clients on April 22: “Google’s TPU 8t changes the trillion-parameter training cost curve. It does not end Nvidia’s dominance – Jensen Huang still owns the merchant market and the CUDA moat is very real – but it gives hyperscalers a credible alternative for workloads where vertical integration matters more than software ecosystem portability.” Stacy Rasgon at Bernstein echoed the view, telling clients that Nvidia’s data center revenue growth rate would “decelerate meaningfully” by 2027 as Google, AWS, Microsoft, and Meta all scale custom silicon.

Ironwood General Availability: The Product Shipping Today

While TPU 8t and 8i headline the announcement, the production chip shipping today is Ironwood – the seventh-generation TPU that reached general availability on November 6, 2025. Ironwood delivers 10x the peak performance of TPU v5p, packs 192 GB of HBM3E per chip, and scales to 9,216 chips per superpod for 42.5 FP4 exaflops and 1.77 petabytes of shared HBM. Google projects 4.3 million TPU shipments in 2026, scaling to 35 million by 2028 – a 700% three-year growth trajectory that requires TSMC to allocate a meaningful share of its 2nm and 3nm capacity to Google.

Ironwood powers every major Google service in production today: Gemini 3.5, Search AI Overviews, YouTube’s recommendation stack, Gmail’s smart features, and Google Photos’ on-device models. Externally, Ironwood is the compute foundation for Anthropic’s Claude Opus 4.6, Claude Sonnet 4.6, and the upcoming Claude Opus 5, as well as several Salesforce Einstein workloads. Google says TPU utilization exceeded 91% network-wide in March 2026, a number that would be commercially implausible if the chip did not deliver on its performance-per-dollar claims.

The price-performance gap versus Nvidia is central to Ironwood’s adoption. Independent benchmarks published by SemiAnalysis in February 2026 put Ironwood’s total cost of ownership at $0.18 per million tokens for Gemini 3.5 inference, versus $0.31 per million tokens on comparable B200 configurations – a 42% TCO advantage that compounds at hyperscaler volume. Google has not disclosed list pricing for TPU 8t or 8i, but analyst estimates suggest an additional 25-35% TCO improvement when the eighth generation ships.

Google’s Custom Silicon Roadmap 2026-2028

YearTPU GenerationEstimated ShipmentsFab NodePrimary PartnerKey Milestone
2024TPU v5p / v6e Trillium1.1 millionTSMC 5nm / 4nmBroadcomGemini 2 training
2025Ironwood (TPU v7)2.4 millionTSMC 3nmBroadcomGA November 2025
2026Ironwood ramp + 8i preview4.3 millionTSMC 3nm + 2nmBroadcom + MediaTekAnthropic 1M chip commit
2027TPU 8t / TPU 8i GA12-15 millionTSMC 2nmBroadcom + MediaTekDual-chip split launch
2028TPU 9 (rumored)30-35 millionTSMC A16 + Intel 18ABroadcom + MediaTek + MarvellMillion-chip cluster GA

The roadmap reveals a silicon supply chain that would have seemed implausible three years ago. TSMC is the only volume fabricator capable of delivering leading-edge Google silicon through 2027, and Intel Foundry Services’ 18A process is targeted as the secondary source for 2028 production. That dual-fab model adds resilience against geopolitical risk in Taiwan and gives Google pricing clout against TSMC – the kind of two-supplier strategy Apple pioneered for iPhone SoCs in 2015.

The Broader Custom Silicon Race: AWS, Microsoft, Meta, and Amazon

Google is not alone in the hyperscaler ASIC race. AWS launched Trainium2 in 2024, followed by Trainium3 in December 2025 – a chip that delivers 1.3 petaflops of FP8 compute per die and scales to 64,000 chips per UltraCluster for roughly 83 FP8 exaflops. Anthropic operates a 400,000-chip Trainium2 cluster in Indiana under its joint venture with AWS, and Trainium3 will reportedly ship in volume during the second half of 2026.

Microsoft’s Maia 100, launched in late 2023, has struggled to scale beyond internal Copilot workloads, prompting Redmond to lean more heavily on AMD Instinct MI400 and Nvidia B300 deployments in 2026. Meta’s MTIA v2 is in production for ranking and ads workloads, and the $35 billion Broadcom deal signed in early 2026 extends the architecture through 2029 with a 2nm MTIA v3 chip expected in 2027. Across the four biggest hyperscalers, custom AI silicon will absorb an estimated $185 billion in capex during 2026, roughly 28% of total data-center capex – a figure that was under $30 billion as recently as 2022.

The share shift matters for Nvidia. The company’s fiscal 2026 data-center revenue hit $170 billion, but the fiscal 2027 outlook baked in just 18% growth – a sharp deceleration from 112% growth in fiscal 2026. Much of that slowdown reflects the custom-silicon shift: every hyperscale dollar that moves from Nvidia GPUs to TPU 8t, Trainium3, MTIA v3, or Maia 200 is a dollar that does not flow through Nvidia’s income statement. Jensen Huang downplayed the threat at GTC 2026 in March, arguing that “ASICs optimize for today’s workloads, while GPUs optimize for the next workload” – but Google’s dual-chip strategy is an explicit rebuttal to that framing.

Why Google Split Training and Inference into Two Chips

The strategic question is why Google chose to diverge training and inference silicon at exactly this moment. Three forces made the split inevitable. First, inference workloads now dominate accelerator cycles – Anthropic reported in February 2026 that Claude serves more than 14 billion requests per day, a workload that is latency-constrained in ways training is not. Second, power density has become the binding constraint: a training-optimized chip at 1,000+ watts per socket strands data center capacity that could otherwise host 3x more inference chips at 350 watts.

Third, the agentic AI workload is fundamentally different. An AI agent completing a research task might make 200 small tool-use calls, each requiring a 50-to-200-millisecond inference completion. That pattern rewards on-chip SRAM (where TPU 8i’s 384 MB matters), low-diameter interconnect (where the new Boardfly topology matters), and fast collective operations (where the 5x reduction in on-chip collective latency matters) – not the aggregate FP4 throughput that defines training chips. Patrick Moorhead of Moor Insights & Strategy called the split “the most significant architectural shift in AI accelerators since Nvidia moved to Hopper in 2022.”

The split also future-proofs Google against workload drift. If training flops continue to dominate through 2028, TPU 8t absorbs the growth. If inference flops explode – which Daniel Newman of Futurum Research predicts will drive 75% of accelerator spend by 2027 – TPU 8i carries the load. Either way, Google wins without having to redesign a monolithic chip. Nvidia, by contrast, ships a single GB300 SKU for both workloads, a decision that will likely become a competitive liability if Google’s thesis on workload divergence proves correct.

Wall Street Reaction and Market Impact

Markets responded sharply to the April 22 announcement. Alphabet shares closed up 3.8% on the day, adding roughly $98 billion in market capitalization and pushing the company’s valuation above $3.2 trillion. Broadcom rose 4.2% in after-hours trading on confirmation of its TPU 8t role, adding $52 billion in market value. MediaTek gained 8.7% in Taipei, its largest single-day move since 2024. Nvidia shares fell 2.1% on the day, a modest move that partially reversed on April 23 after Jensen Huang sent a customer letter arguing that “TPU 8t and 8i will not change Nvidia’s 2027 trajectory.”

The deeper market signal is capital reallocation. Venture capital tracker PitchBook reported that AI infrastructure mentions on Q1 2026 earnings calls were up 340% year-over-year, with TPU, Trainium, and MTIA accounting for a growing share of hyperscaler capex disclosures. Goldman Sachs raised its 2027 custom AI silicon TAM estimate to $240 billion in a note published April 23, up from $195 billion before the TPU 8t/8i announcement. The revision alone implies a $45 billion shift in capex destined for non-Nvidia architectures.

For Google’s stock specifically, Morgan Stanley’s Brian Nowak raised his price target from $235 to $258 on April 23, citing “structural cost advantages in AI compute that will compound through 2028.” His model assumes Google Cloud’s AI infrastructure revenue reaches $48 billion in 2027, up from $21 billion in 2025 – a trajectory that requires TPU 8t to launch on schedule and Anthropic’s 1 million-chip commitment to translate into recurring consumption revenue.

Risks and Challenges for Google’s TPU Strategy

Google’s dual-chip strategy is not risk-free. The biggest challenge is software. Nvidia’s CUDA ecosystem has 4 million developers, more than 300 production frameworks, and a decade of performance optimization that PyTorch on TPU still cannot fully match for novel model architectures. Google’s JAX framework has matured rapidly and now powers most internal TPU workloads, but external developer adoption remains a fraction of CUDA’s footprint. Until that gap closes, TPU economics favor captive workloads – Google’s own services, Anthropic’s Claude training, and a small number of cloud-committed hyperscale customers.

TSMC 2nm capacity is the second risk. TSMC has allocated the majority of its 2026 and 2027 2nm wafers to Apple for A21 and M6 SoCs, with the remaining capacity split among Nvidia, AMD, Qualcomm, and Google. If Apple’s 2nm demand runs hotter than forecast – plausible given the expected iPhone Fold ramp – Google may face supply constraints for TPU 8t and 8i that delay the late-2027 availability target. TSMC’s Arizona GigaFab expansion absorbs some of the risk, but 2nm production in Arizona does not start until 2028.

Third, antitrust scrutiny is intensifying. The DOJ’s ongoing Google Search monopoly case has already established precedent for structural remedies, and bundling TPU access with Google Cloud contracts could invite similar scrutiny. Senator Amy Klobuchar’s office confirmed on April 21 that it is “monitoring” Google’s TPU tying practices, though no formal investigation has been announced. For Google, the risk is less an immediate enforcement action than a chilling effect on large multi-year TPU commitments from regulated industries like financial services and healthcare.

Expert Reactions: Analysts, Hyperscalers, and the Developer Community

The analyst reaction to Cloud Next has been uniformly bullish for Google. Thomas Kurian’s on-record justification for the split – “we felt that power efficiency would become a constraint as people continue to scale both training and inference” – has been widely cited as the strategic north star. Stacy Rasgon (Bernstein) raised Broadcom’s fiscal 2027 EPS estimate by 14%, the largest single upward revision since Broadcom acquired VMware. Harlan Sur (J.P. Morgan) added Broadcom to his Analyst Focus List with a $2,100 price target, up from $1,850.

Hyperscaler reactions are more measured. AWS declined to comment on the TPU 8t/8i announcement but reiterated its own $100 billion 2026 capex guidance, of which roughly 35% is allocated to Trainium-family silicon. Microsoft similarly declined to comment, though Satya Nadella’s prepared remarks at an investor day on April 21 emphasized that “model portability across hardware is the single most important architectural bet Microsoft is making.” Meta’s Mark Zuckerberg, who rents Ironwood capacity for select Llama inference workloads, told Bloomberg on April 22 that “the TPU roadmap is interesting but our MTIA program gives us optionality we can’t get from any external partner.”

In the developer community, the JAX-versus-CUDA debate has reignited. Andrej Karpathy posted on April 22 that TPU 8i’s 384 MB on-chip SRAM “changes the dispatch-overhead arithmetic for agentic inference in a way that will reshape serving architectures.” François Chollet, the creator of Keras and a longtime Google engineer, called the split “the first major compute-architecture shift since Blackwell.” Both posts attracted more than 2 million views within 24 hours, a signal of how closely the developer community tracks Google’s silicon moves.

Five Predictions: What TPU 8t and 8i Mean for 2027 and Beyond

First, Google Cloud’s share of the hyperscale AI infrastructure market will surpass 20% by the end of 2027, up from roughly 12% in Q1 2026. The combination of TPU 8i’s 80% better inference economics and Anthropic’s 3.5 GW commitment provides a durable workload tailwind that neither AWS nor Azure can match in the near term. AWS retains the lead in general-purpose cloud, but AI-specific workload share will tilt Google’s way.

Second, Nvidia’s data-center gross margin – currently above 75% – will compress to the 65-68% range by fiscal 2028 as custom silicon captures 25-30% of hyperscaler accelerator spend. Jensen Huang will respond with more aggressive networking attach rates (NVLink, Spectrum-X) and deeper CUDA-ecosystem integration, but the pricing power that defined 2023-2025 will erode meaningfully.

Third, MediaTek will emerge as a top-three data-center silicon supplier by 2028, up from a rounding error in 2024. The TPU 8i design win opens doors at other hyperscalers – Oracle, Tencent, and ByteDance have all reportedly engaged MediaTek for custom inference chip designs following the April 22 announcement.

Fourth, Anthropic’s 3.5 GW commitment will be the largest single AI compute deal in history until a larger one is signed – probably within 18 months. OpenAI, Meta AI, and xAI all face the same training-scale pressure that drove Anthropic into Google’s arms, and each will likely sign comparable multi-gigawatt deals with either Google (TPU), AWS (Trainium), or a frontier startup like Cerebras or Tenstorrent.

Fifth, the training/inference architectural split will become the industry standard. AWS will ship a Trainium-Inferentia bifurcation in 2027, Meta’s MTIA v3 will include separate training and serving SKUs, and even Nvidia will eventually launch a dedicated inference product line to complement its flagship training GPUs. The era of single-SKU AI accelerators ends with TPU 8t and 8i.

Historical Context: From TPU v1 to the Agentic Era

Google’s TPU program began in 2015, when rising inference demand for Google Search threatened to double the company’s data-center footprint within three years. Jeff Dean’s team designed the first TPU in 15 months, deploying it across production in 2016. TPU v1 ran at 92 teraflops of INT8 compute and consumed 40 watts – quaint numbers next to TPU 8i’s 10.1 petaflops of FP4 per chip. The progression from v1 to the eighth generation represents a roughly 110x per-chip performance improvement in a decade, slightly exceeding Moore’s Law pace despite the end of Dennard scaling.

Every subsequent TPU generation solved a specific workload pain. TPU v2 added training capability in 2017. TPU v3 brought liquid cooling and pod-scale aggregation in 2018. TPU v4 introduced the 3D torus interconnect in 2021. TPU v5p scaled to 8,960 chips in 2023, enabling Gemini 1.0 training. TPU v6e Trillium introduced SparseCore in 2024. Ironwood (v7) bet on inference-first design in 2025. Each chip was a bet on where workloads would go – and Google has been right more often than not, a track record that gives the TPU 8t/8i split architectural credibility even before customer benchmarks publish.

The financial stakes have scaled with the chips. Google’s estimated TPU capex in 2016 was under $1 billion. In 2026, TPU-related infrastructure spending – including chips, networking, data-center buildout, and power – will exceed $65 billion, roughly 45% of Google’s total capital expenditure. The company has staked its cloud strategy, its AI roadmap, and a meaningful share of Alphabet’s valuation on the bet that custom silicon will outcompete merchant GPUs at hyperscale. TPU 8t and 8i are the most aggressive expression of that bet to date.

Frequently Asked Questions

When will Google TPU 8t and TPU 8i be available?

Google has announced a preview program with select customers starting in the second half of 2026 and targeted general availability in late 2027. The timeline assumes TSMC 2nm yields stabilize on schedule. Ironwood (TPU v7) is the chip shipping in production today, with general availability since November 6, 2025.

How does TPU 8t compare to Nvidia Blackwell B200 and GB300?

Nvidia B200 offers roughly 20 petaflops of FP4 compute per chip versus an estimated 12.6 petaflops for TPU 8t. However, Google’s 9,600-chip superpod delivers 121 FP4 exaflops in a single coherent memory domain – roughly 84x the FP4 compute of an Nvidia NVL72 rack. For trillion-parameter training runs, Google’s scale advantage compresses wall-clock time by 35-45% at comparable cost.

Who manufactures Google TPUs?

TSMC fabricates all Google custom silicon, with TPU 8t and 8i moving to the 2nm process. Broadcom designs TPU 8t (codename Sunfish), MediaTek designs TPU 8i (codename Zebrafish), and Marvell handles networking ASICs. Intel Foundry Services is a rumored secondary source for 2028 generation chips.

Why did Google split training and inference into two chips?

Inference workloads now dominate accelerator cycles and demand different architectural trade-offs – more on-chip SRAM, lower interconnect diameter, and faster collective operations. Training chips need aggregate FP4 flops and HBM capacity. A single monolithic design compromises both. Google’s Thomas Kurian said the split was a “natural evolution” driven by the power-efficiency constraint on scaling both workloads.

How many TPUs will Anthropic use?

Anthropic committed to use up to 1 million Ironwood TPUs in its multi-year Google Cloud agreement signed in October 2025. The deal provides more than 1 gigawatt of compute in 2026 and 3.5 gigawatts by 2027, making Anthropic the single largest external TPU customer. Anthropic has not disclosed its TPU 8t/8i commitments, though analyst consensus expects it to migrate Claude training workloads to TPU 8t once generally available.

How does TPU 8t affect Broadcom revenue?

Mizuho analysts estimate Broadcom’s combined Google and Anthropic AI custom-silicon revenue reaches $21 billion in 2026, doubling to $42 billion in 2027 as TPU 8t ramps. The relationship now represents Broadcom’s single largest growth engine, surpassing merchant networking chips by 2028 in internal models.

What is the price of a TPU 8i or TPU 8t?

Google has not disclosed list pricing. Independent analysts estimate a 25-35% total cost of ownership improvement over Ironwood, which already delivers roughly 42% lower TCO than comparable Nvidia B200 configurations for large-scale inference. Pricing will be exposed primarily through Google Cloud consumption rates rather than direct chip sales.

Can I use TPUs outside Google Cloud?

No. TPUs are sold exclusively as Google Cloud managed services and cannot be purchased as standalone hardware. This distinguishes Google’s strategy from Nvidia’s merchant-chip model and from AWS Trainium, which is also Cloud-only. Developers access TPUs through JAX, TensorFlow, PyTorch/XLA, or Google Cloud Vertex AI.

Which external AI companies use Google TPUs?

Anthropic is the largest external TPU customer. Meta has a rental arrangement for selected Llama inference workloads. Salesforce runs Einstein workloads on Ironwood. Midjourney and Replit operate production inference pipelines on Ironwood. Google has not disclosed the full customer list but says TPU utilization has exceeded 91% network-wide since January 2026.

What comes after TPU 8t and 8i?

Google has not officially announced TPU v9, but roadmap leaks and analyst models suggest a 2028 launch on TSMC A16 (1.6nm-class) with Intel 18A as a secondary fabrication source. Shipment estimates for TPU v9 range from 30 to 35 million chips by 2028, reflecting Google’s goal of enabling single logical clusters with more than one million chips.

Related Coverage

External sources: Google’s eighth-generation TPU announcement, Google Cloud Ironwood TPU general availability, The Next Web’s four-partner supply chain analysis, Google Cloud TPU product page, Anthropic news.

👁 Nadia Dubois

Nadia Dubois

AI & Innovation Editor

Nadia Dubois is the AI & Innovation Editor at Tech Insider, where she tracks the rapid evolution of artificial intelligence, from foundation models to real-world enterprise deployment. She previously covered AI and startups for La Tribune and contributed to MIT Technology Review's European coverage. Nadia specializes in generative AI, AI regulation, and the intersection of technology and European industrial policy. She holds a dual degree in Computational Linguistics and Journalism from Sciences Po Paris.

View all articles
👁 Tech Insider
Tech
Insider

Tech Insider delivers in-depth coverage of the technologies shaping the future: AI, cybersecurity, cloud computing, hardware, and the trends that matter.

Company

Explore

Categories

© 2026 Tech Insider Media AB. All rights reserved.