L40S 48GB · live marketplace — from $0.78 / hr

Rent an L40S 48GB
by the minute.
Not the month.

Rent an NVIDIA L40S for production-grade FP8 70B inference on Ada silicon. 48 GB ECC + Ada FP8 tensor cores — Llama-3 70B FP8 single-card serving at ~720 tok/s with 16-request KV, Hunyuan-Video FP8 at 4.5 min per 5-second clip, LLaVA-1.6 multimodal serving with vision encoder resident. Billed per-minute, paid in BTC, USDT/USDC or CLORE. The default 70B serving target when datacenter HBM supply is tight.

Rent an L40S now See pricing

●Per-minute billing ●SSH + Docker + Jupyter ●Spot & on-demand ●15 regions

INFERENCE · LIVE POD #82144 · L40S ×1 · us-east-2 UPTIME 14h 22m

THROUGHPUT

2,847 tok/s

▲ 4.2% vs 5m

LATENCY p50

38 ms

▬ steady

LATENCY p99

112 ms

▲ 6 ms

REQUESTS / SEC 274 RPS

GPU UTIL 87%

VRAM 35.2 / 48 GB

QUEUE 3 reqs

MODEL

Llama-3.1-70B · INT8

BATCH

32 · cont.

RATE

$1.05/hr

COST/1M tok

$0.10

workloads

Datacenter Ada,
built to serve.

L40S is what happens when Nvidia takes 4090-class silicon, doubles the VRAM, adds ECC, locks the clocks for 24/7 operation, and ships it in a passive datacenter form factor.

70B FP8 serving at sub-H100 rates

FP8 quant fits Llama-3 70B in 48 GB with KV cache room — and the L40S delivers it at typically 40–60% of an H100's rental price. The pragmatic pick for production inference teams who don't need HBM3 bandwidth or NVLink fabric, just consistent token throughput.

Llama-3 70B INT4 210 tok/s/user

Generative media

SDXL, Flux, Stable Video Diffusion, HunyuanVideo. RT cores 3rd-gen + OptiX make it the fastest non-HBM card for diffusion.

SDXL 1024² batch-8 ~1.6× vs 4090

Mid-scale fine-tuning

13B–34B QLoRA on a single card. 48 GB ECC means stable long runs without the corruption risk of consumer GDDR6X.

Llama-3 8B SFT ~17k tok/s

why L40S

FP8 throughput
without HBM pricing.

L40S delivers Hopper-class FP8 inference at a fraction of the H100 hourly rate. The right card when latency matters but you're not training from scratch.

L40S 48GB A100 80GB RTX 4090 H100 80GB

Architecture Ada Lovelace Ampere Ada Lovelace Hopper

CUDA cores 18,176 6,912 16,384 14,592

VRAM 48 GB GDDR6 ECC 80 GB HBM2e 24 GB GDDR6X 80 GB HBM3

Memory bandwidth 864 GB/s 1,935 GB/s 1,008 GB/s 3,350 GB/s

FP16 / BF16 (dense) 362 TFLOPS 312 TFLOPS ~165 TFLOPS 756 TFLOPS

From / hr (on-demand) $0.78 $0.92 $0.31 $1.89

// prices are spot-market lows · refreshed every 60 s

pricing

Two ways to rent.
Pay only for the minutes you use.

Every server is priced by its host. These are the live floors across the marketplace — you'll see hundreds of variants once you're in.

Spot

$0.78 / hr

≈ 0.0000059 BTC · 433 CLORE

Lowest possible rate
Per-minute billing
Can be interrupted by on-demand renter
Best for batch training, rendering

Browse spot L40Ss

MOST RENTED

On-demand

$1.05 / hr

≈ 0.0000109 BTC · 800 CLORE

Guaranteed availability
No preemption, ever
Per-minute billing
Best for inference, dev work, demos

Rent on-demand

Pay with

Bitcoin on-chain

CLORE native token

USDT / USDC ERC-20 · BEP-20

workflow

Four steps to a running L40S.

No sales call. No quota request. No three-week procurement. The first four commands are all you need.

01 / FILTER

Pick your card

Filter the marketplace by L40S 48GB, country, GPU count, reliability score, network speed.

02 / RENT

Click rent

Choose a Docker image — PyTorch, vLLM, ComfyUI, Blender — or paste your own.

$ clore rent --gpu "L40S 48GB"

03 / CONNECT

SSH or Jupyter

You get a public endpoint, an SSH key, and Jupyter on port 8888 in under 90 s.

04 / STOP

Stop anytime

Per-minute billing rounds to the second. Stop the instance and the meter stops with it.

faq

Questions hosts and renters ask.

Is L40S a substitute for H100?

For inference, often yes — FP8 throughput on Llama-3 70B is competitive at a fraction of the rental price. For training, the H100's HBM3 bandwidth and NVLink fabric still win. Pick L40S for serving, H100 for pretraining.

What's the rough cost-per-million-tokens for Llama-3 70B FP8 on L40S?

An L40S serving Llama-3 70B FP8 with vLLM and continuous batching pushes roughly 3,000-4,500 output tokens/second at batch saturation. At a $0.78/hr spot rate, that lands near $0.05-$0.07 per million output tokens before the 2.5% spot fee. PoH staking knocks the fee in half; reserved spot floors land you closer to $0.04/M. Numbers vary with prompt length and batch shape - benchmark on your traffic.

Can I run vLLM with continuous batching on these GPUs?

Yes. The inference tier (T4, L4, L40S, A10) is exactly what vLLM's PagedAttention and continuous batching are tuned for. L40S handles 70B FP8 single-card with KV-cache headroom; A10 and L4 serve 7B-13B at high throughput; T4 covers Whisper, embeddings, and 7B INT8. Pull the official vLLM Docker image, point it at your model, expose port 8000.

Does the GPU support FP8 and INT8 inference?

L40S has Ada FP8 tensor cores - the same architecture as H100 for inference math, at a fraction of the rental price. L4 also supports FP8. T4 and A10 are pre-FP8 but have INT8 (T4 added INT8 in Turing, A10 in Ampere) and excel at quantized 7B-13B serving. Pick L40S when FP8 throughput matters; pick A10 or T4 when $/request matters more.

What's typical p99 latency for 7B-class models on this tier?

On A10 or L4 with vLLM and batch-1, time-to-first-token for a 7B FP16 model lands around 80-150 ms; p99 inter-token latency is 25-40 ms. L40S with FP8 cuts both roughly in half. T4 doubles them. Real numbers depend on prompt length and concurrent batch size - low-batch interactive serving is fastest, high-batch saturation maximizes throughput.

Can I MIG-partition this card for multi-tenant serving?

MIG (Multi-Instance GPU) is supported on A100, A30, and H100/H200 - not on L4, L40S, T4, or A10. For consumer-tier multi-tenancy on the inference tier, run multiple model replicas inside a single Docker container or use container-level resource limits. If you need hardware-isolated MIG slices, rent A100 40GB and partition into up to 7 instances.

workload spotlight

Real numbers on the NVIDIA L40S.

48 GB ECC + Ada FP8 + 350 W — the production substitute for H100 inference when supply is tight.

Llama-3 70B FP8 single-card serving

vLLM + TensorRT-LLM FP8

~720 tok/s aggregated, 16 concurrent

FP8 quant + 48 GB fits 70B with room for KV cache — typically 40–60% the price of an H100 for inference workloads.

Read the guide →

Hunyuan-Video FP8 production

ComfyUI + Ada FP8 + sequence parallel

~4.5 min per 5 s @ 720p

FP8 path nearly doubles Hunyuan throughput vs A6000 at the same VRAM — production-grade gen-video card.

Read the guide →

LLaVA-1.6 multimodal serving

vLLM + fp16

~24 images/s + 800 tok/s text

Vision-language SaaS pipeline — 48 GB holds vision encoder + Llama backbone + 16-batch KV simultaneously.

Read the guide →

Rent an L40S 48GB
by the minute.
Not the month.

Datacenter Ada,
built to serve.

70B FP8 serving at sub-H100 rates

Generative media

Mid-scale fine-tuning

FP8 throughput
without HBM pricing.

Two ways to rent.
Pay only for the minutes you use.

Spot

On-demand

Four steps to a running L40S.

Pick your card

Click rent

SSH or Jupyter

Stop anytime

Questions hosts and renters ask.

Is L40S a substitute for H100?

What's the rough cost-per-million-tokens for Llama-3 70B FP8 on L40S?

Can I run vLLM with continuous batching on these GPUs?

Does the GPU support FP8 and INT8 inference?

What's typical p99 latency for 7B-class models on this tier?

Can I MIG-partition this card for multi-tenant serving?

Real numbers on the NVIDIA L40S.

Inference-tier comparison.

Run these on your rented NVIDIA L40S.

Compare with similar cards.

Have an NVIDIA L40S? Earn ~$647/mo by listing it.

Your training run
is 90 seconds away.

Rent an L40S 48GB by the minute. Not the month.

Datacenter Ada,built to serve.

70B FP8 serving at sub-H100 rates

Generative media

Mid-scale fine-tuning

FP8 throughputwithout HBM pricing.

Two ways to rent.Pay only for the minutes you use.

Spot

On-demand

Four steps to a running L40S.

Pick your card

Click rent

SSH or Jupyter

Stop anytime

Questions hosts and renters ask.

Is L40S a substitute for H100?

What's the rough cost-per-million-tokens for Llama-3 70B FP8 on L40S?

Can I run vLLM with continuous batching on these GPUs?

Does the GPU support FP8 and INT8 inference?

What's typical p99 latency for 7B-class models on this tier?

Can I MIG-partition this card for multi-tenant serving?

Real numbers on the NVIDIA L40S.

Inference-tier comparison.

Run these on your rented NVIDIA L40S.

Compare with similar cards.

Have an NVIDIA L40S? Earn ~$647/mo by listing it.

Your training runis 90 seconds away.

Rent an L40S 48GB
by the minute.
Not the month.

Datacenter Ada,
built to serve.

FP8 throughput
without HBM pricing.

Two ways to rent.
Pay only for the minutes you use.

Your training run
is 90 seconds away.