Spheron GPU Catalog

NVIDIA A100 GPU: 80GB Specs, Pricing & Rental. Rent A100 GPU from $0.82/hr

80GB HBM2e, 2.0 TB/s bandwidth, NVLink 600 GB/s, MIG. Per-minute billing on A100 GPU rentals, live in under 2 minutes.

At a glance

Renting an NVIDIA A100 80GB on Spheron starts at $0.82/hr per GPU per hour, the lowest live marketplace rate. There is no minimum commit, billing is per minute, and most instances are live inside two minutes. The A100 has 80GB of HBM2e and 2.0 TB/s of memory bandwidth, enough to train or fine-tune models up to about 30B parameters on a single card and serve quantized 70B models at production latency. SXM variants add 600 GB/s NVLink between GPUs for multi-GPU training. Hyperscaler on-demand A100 80GB pricing runs roughly $3.40 per GPU per hour on AWS p4de, $4.10 on Azure ND A100 v4, and about $5.00 on GCP a2-ultragpu.

GPU ArchitectureNVIDIA Ampere

VRAM80 GB HBM2e

Memory Bandwidth2.0 TB/s

NVIDIA A100 specifications

GPU Architecture

NVIDIA Ampere

VRAM

80 GB HBM2e

Memory Bandwidth

2.0 TB/s

Tensor Cores

432 (3rd Gen)

CUDA Cores

6,912

FP64 Performance

9.7 TFLOPS

FP32 Performance

19.5 TFLOPS

TF32 Performance

156 TFLOPS

FP16 Performance

312 TFLOPS

INT8 Performance

624 TOPS

NVLink Bandwidth

600 GB/s (SXM)

MIG Instances

Up to 7 per GPU

System RAM

100 GB DDR4

vCPUs

14 vCPUs

Storage

625 GB NVMe SSD

Form Factor

SXM4 / PCIe Gen4

TDP

400W SXM / 300W PCIe

NVIDIA A100 pricing

Provider	Price/hr	Savings
SpheronYour price	$1.48/hrDEDICATED$0.82/hrSpot	-
Jarvislabs	$1.49/hr	1.8x more expensive
TensorDock	$1.57/hr	1.9x more expensive
Lambda Labs	$2.49/hr	3.0x more expensive
AWS p4de	$3.43/hr	4.2x more expensive
Azure ND A100 v4	$4.10/hr	5.0x more expensive
Google Cloud	$5.07/hr	6.2x more expensive

Custom & Reserved

Need More A100 Than What's Listed?

Large A100 clusters, custom configs, or guaranteed long-term capacity.

Reserved Capacity

Commit to a duration, lock in availability and better rates

Custom Clusters

8 to 512+ GPUs, specific hardware, InfiniBand configs on request

Supplier Matchmaking

Spheron sources from its certified data center network, negotiates pricing, handles setup

Need more A100 capacity? Tell us your requirements and we'll source it from our certified data center network.

Typical turnaround: 24–48 hours

When to pick the A100

Scenario 01

Pick the A100 if

You are training or fine-tuning a 7B to 30B parameter model, serving a quantized 70B model, or running classic workloads like BERT, ResNet, recommender systems, and RAPIDS analytics. The A100 is also the right call when you want the most mature ML stack on the market and are happy trading a bit of FP8 throughput for 40 to 60 percent lower hourly cost than H100.

Recommended fit

Scenario 02

Pick the H100 instead if

Your workload is FP8-native (Llama 3 / DeepSeek inference, FP8 training runs) or you need Transformer Engine speedups. H100 is roughly 2.5 to 3x faster on Tensor Core math and 1.7x more memory bandwidth, but it costs about 2x as much. If the speedup pays for itself, make the jump.

Recommended fit

Scenario 03

Pick the L40S instead if

You are running pure inference on sub-30B models, or batch image and video generation. L40S has 48GB GDDR6 and a much lower hourly cost, with strong FP8 and Ada Lovelace Tensor Cores. It has no NVLink, so it is not the right pick for multi-GPU training.

Recommended fit

Scenario 04

Pick the RTX 4090 instead if

You are doing development, small-scale fine-tuning, or sub-13B inference on a budget. The 4090 has 24GB VRAM and no NVLink, but it is the cheapest way to run modern AI stacks. Step up to A100 once you need more memory or multi-GPU scaling.

Recommended fit

NVIDIA A100 use cases

Use case / 01

Optimized

🤖

LLM training and fine-tuning

Train or fine-tune models in the 7B to 30B range with mixed precision. FSDP and DeepSpeed ZeRO scale cleanly across 8x A100 with NVLink, and LoRA / QLoRA bring 70B within reach on a single card.

Continued pre-training on Llama 3.1 8B / Mistral 7BSupervised fine-tunes on Qwen 14B, CodeLlama 13BLoRA and QLoRA fine-tunes of Llama 2/3 70BMulti-GPU ZeRO-3 training up to 30B parameters

Use case / 02

Optimized

⚡

Production LLM inference

Serve models at steady latency with vLLM, TensorRT-LLM, or Triton. INT8 and FP16 paths are well optimized, and MIG lets you carve one A100 into up to 7 isolated inference slots.

Llama 3.1 8B / Mistral 7B at high concurrencyQuantized Llama 2 70B (INT4) serving on single A100Multi-model serving via MIG partitioningBERT / T5 embedding and reranker pipelines

Use case / 03

Optimized

🎯

Classic ML and computer vision

The A100 still holds the line on computer vision and recommender workloads that predate the LLM wave. Mature CUDA kernels, stable ecosystem, predictable throughput.

ResNet, EfficientNet, ViT, DETR trainingRecommender systems (DLRM, two-tower)Speech recognition and TTS pipelinesAutoML, NAS, and hyperparameter sweeps

Use case / 04

Optimized

📊

GPU data analytics and HPC

RAPIDS, cuDF, cuGraph, and GPU-accelerated SQL engines all target A100 first. FP64 throughput is 9.7 TFLOPS, enough for most simulation work that does not need Hopper-class double precision.

ETL and feature engineering with cuDFLarge-scale graph analytics with cuGraphMolecular dynamics and bioinformaticsSignal processing and time-series analytics

NVIDIA A100 benchmarks

METRIC 01

BERT time-to-solution (TF32)

up to 5x faster

vs V100 FP32

METRIC 02

TF32 cross-network speedup

~2.6x avg

23 networks vs V100 FP32

METRIC 03

Llama 2 70B inference (INT4)

fits single A100 80GB

~35 GB weights

METRIC 04

FP16 Tensor throughput

312 TFLOPS

624 TFLOPS with sparsity

METRIC 05

TF32 Tensor throughput

156 TFLOPS

~10x V100 FP32 (15.7 TFLOPS)

METRIC 06

Memory bandwidth

2.0 TB/s

vs 1.55 TB/s on A100 40GB

Serve Llama 3.1 8B on an A100 in under 2 minutes

Spin up a Spheron A100 80GB, pull the vLLM image, and serve Llama 3.1 8B with an OpenAI-compatible API. Point any OpenAI SDK client at the endpoint and you are done.

bash

Spheron

01# 1. Provision an A100 80GB from the Spheron CLI (or use the dashboard)02spheron deploy --gpu a100-80gb --image vllm/vllm-openai:latest0304# 2. Inside the instance, serve Llama 3.1 8B Instruct05vllm serve meta-llama/Llama-3.1-8B-Instruct \06 --max-model-len 8192 \07 --gpu-memory-utilization 0.92 \08 --port 80000910# 3. Hit the endpoint from any OpenAI-compatible client11curl http://<instance-ip>:8000/v1/chat/completions \12 -H "Content-Type: application/json" \13 -d '{14 "model": "meta-llama/Llama-3.1-8B-Instruct",15 "messages": [{"role": "user", "content": "Summarize MIG partitioning on A100."}]16 }'

For 70B inference, add --tensor-parallel-size 2 and rent 2x A100 80GB with NVLink. For multi-node training, contact us for InfiniBand-connected clusters.

Interconnect fabric

Multi-GPU A100 with NVLink and InfiniBand

A100 SXM4 nodes on Spheron link 8 GPUs with NVLink at 600 GB/s intra-node, and multi-node jobs use 200 Gb/s HDR InfiniBand with GPUDirect RDMA. That is the same fabric NVIDIA ships in DGX A100 systems, so PyTorch DDP, DeepSpeed ZeRO, and Megatron-LM run at close to linear scaling.

01600 GB/s NVLink between GPUs inside a node

02200 Gb/s HDR InfiniBand across nodes

03GPUDirect RDMA for zero-copy GPU-to-GPU transfers

04NCCL pre-tuned for A100 topology

05MIG support for splitting into up to 7 instances per GPU

068x A100 per node, multi-node clusters on request

07Tested with PyTorch DDP, DeepSpeed ZeRO-3, and Megatron-LM

08Both SXM4 and PCIe Gen4 form factors available

Scale

Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.

A100 vs alternatives

A100 vs V100

A100 is roughly 2.5 to 3x faster on training and inference, with 2.5x the memory. V100 is effectively end-of-life for modern LLM work.

A100 vs L40S

L40S is cheaper and strong at single-GPU inference with FP8, but has no NVLink. A100 wins for multi-GPU training and 70B INT4 serving that needs 80GB.

NVIDIA A100 guides and resources

01Read

NVIDIA A100 vs V100: Specs, Benchmarks, and When to Upgrade

Side-by-side Ampere vs Volta comparison with benchmarks and migration guidance.

02Read

A100 Deployment Guide: SXM vs PCIe, Spot vs Dedicated, MIG

Deep dive on A100 configurations, interconnects, MIG partitioning, and deployment patterns on Spheron.

03Read

Best NVIDIA GPUs for LLMs

Framework for matching GPU choice to model size, from 7B on A100 to 670B on B200.

04Read

GPU Memory Requirements for Large Language Models

Calculate VRAM needs across precision levels and KV-cache pressure for every major model class.

05Read

How a 12-Person Startup Trained a 70B Model for $11,200

Cost breakdown for training a 70B model using spot A100 instances with aggressive checkpointing.

06Read

GPU Cost Optimization Playbook

Practical tactics to cut A100 spend: spot scheduling, MIG, batching, right-sizing.

01Technical Brief

NVIDIA A100 Release Date and Cloud Availability

The NVIDIA A100 Tensor Core GPU was announced at GTC May 2020 and started shipping that same year. The 80GB HBM2e variant followed in November 2020, doubling the VRAM of the original 40GB launch SKU. Cloud availability arrived quickly: AWS p4d (A100 40GB) launched November 2020, Azure NDv4 in May 2021, Google Cloud A2 in mid-2020. By 2022 every major cloud and every GPU-focused neo-cloud offered the A100 80GB in both SXM4 and PCIe form factors.

On Spheron the A100 80GB is available in both SXM4 and PCIe with per-minute billing and no contract. Capacity is sourced from data center partners across multiple regions. Spot pricing is meaningfully below the dedicated rate for fault-tolerant workloads. Live availability and current pricing is on the pricing page. The A100 successor is the H100 (Hopper, 80GB HBM3 at 3.35 TB/s, FP8 Transformer Engine); for teams on a budget the A100 80GB remains the cost-efficient choice for sub-70B training and INT8 inference.

02Technical Brief

A100 VRAM and Memory Bandwidth: 80GB HBM2e at 2.0 TB/s

The A100 80GB ships with 80GB of HBM2e memory at 2.0 TB/s of bandwidth on the SXM4 variant (1.94 TB/s on PCIe). That is the same VRAM as the H100 80GB but with roughly 60% of the bandwidth (3.35 TB/s on H100 HBM3). For LLM inference on memory-bandwidth-bound decode, the A100 delivers a 70B FP16 tokens-per-second ceiling around 14, versus 24 on the H100 and 34 on the H200. For training, where compute is the bottleneck more often, the gap is smaller and the A100's price-per-hour advantage often wins.

Where the 80GB VRAM matters: a Llama 3 70B model fits in INT8 quantization with KV cache headroom for moderate batch sizes, a 30B-class model fits in BF16, and most LoRA / QLoRA fine-tuning workloads under 70B parameters run comfortably on a single GPU. Multi-Instance GPU (MIG) support lets you partition one A100 into up to 7 isolated GPU slices, useful for multi-tenant inference platforms. For FP8 inference workloads or anything requiring more than 80GB on a single GPU, the H100 80GB or H200 141GB is the step up.

FAQ / 14

NVIDIA A100 FAQ

On Spheron the A100 80GB starts at $0.82/hr per GPU per hour, the lowest live marketplace rate. There is no minimum commit and billing is per minute. For reference, Lambda Labs runs ~$2.49/hr, AWS p4de ~$3.43/hr per GPU, Azure ND A100 v4 ~$4.10/hr per GPU, and Google Cloud a2-ultragpu around $5/hr.

Spot instances on Spheron are the cheapest path, often 50 to 70 percent below the dedicated rate. The trade-off is that the instance can be reclaimed when demand spikes, so checkpoint every 15 to 30 minutes and treat spot as a fit for fault-tolerant training, batch jobs, and experimentation. For steady production serving, stay on dedicated (99.99% SLA, non-interruptible). Both are on-demand tiers with per-minute billing.

Yes. Spheron bills per minute with no minimum. A one-hour benchmark costs you one hour. No contracts, no reserved-instance lock-in on dedicated or spot, and no commit fees.

Most A100 instances are live in 45 to 90 seconds. Hardware is pre-warmed, so provisioning behaves more like a container start than a VM boot. If your Docker image is ready, you can be running a training script inside two minutes of hitting deploy.

SXM4 is the higher-power variant (400W) with NVLink between GPUs at 600 GB/s, which matters for multi-GPU training and model parallelism. PCIe is lower-power (300W) and easier to mix with standard servers, but has no NVLink. Pick SXM for distributed training or 70B FP16 inference across 2+ GPUs. Pick PCIe for single-GPU inference or data processing.

The 80GB variant doubles VRAM and bumps memory bandwidth from 1.55 TB/s to 2.0 TB/s. That matters for larger batch sizes, long-context inference, and 70B-class quantized models. Spheron defaults to the 80GB SKU because the memory headroom usually pays for itself.

Yes. A single A100 splits into up to 7 isolated MIG instances, each with dedicated compute, memory, and bandwidth. MIG is perfect for running multiple small inference workloads on one card without noisy-neighbor effects. It is exposed on both SXM and PCIe variants.

Yes. Spheron offers 8x A100 per node with NVLink, and multi-node clusters connected by 200 Gb/s HDR InfiniBand with GPUDirect RDMA. Clusters are tested with PyTorch DDP, DeepSpeed ZeRO-3, and Megatron-LM. Larger configurations are available on request.

A100 capacity is online across North America, Europe, and Asia, sourced from data center partners. Availability shifts with demand and the dashboard shows live capacity per region.

PyTorch, TensorFlow, JAX, and the major serving stacks (vLLM, TensorRT-LLM, Triton, SGLang) all ship in the default images. CUDA 12.6+, cuDNN, NCCL, and RAPIDS are pre-tuned for A100. You can also bring your own Docker image.

For inference, yes. A 70B model in INT4 (~35GB) runs on a single A100 80GB. At INT8 you need two A100 80GBs with tensor parallelism. FP16 training or inference at 70B requires 2+ A100 80GB with NVLink. Sweet spot for the A100 remains 7B to 30B parameters.

If your workload is FP8-native or memory-bandwidth-bound, H100 pays for itself. If you are doing classic training or fine-tuning up to 30B parameters, or inference on models that fit in 80GB without FP8, A100 usually wins on dollars per token. Start on A100, move to H100 when the speedup justifies the cost.

For 100+ GPU deployments and production-critical workloads, Spheron offers dedicated Slack or Discord support, sourcing assistance, and SLA-backed instances. Smaller deployments are self-serve through the dashboard.

Talk to our team→

For the same A100 80GB hardware, Spheron is meaningfully cheaper than AWS p4de, Azure ND A100 v4, and GCP a2-ultragpu on-demand. As of April 2026, hyperscaler on-demand A100 80GB pricing runs roughly $3.43/hr per GPU on AWS p4de, $4.10/hr on Azure ND A100 v4, and about $5/hr on GCP. Spheron starts at $0.82/hr. Same silicon, different pricing model.