AI RANKINGS

LMArena / Arena.ai

The central arena where community blind votes shape Elo rankings across text, vision, code, and video.

Multi-ModalBlind VoteElo

Artificial Analysis

Independent benchmark matrix for practical model comparison, including latency and provider tradeoffs.

QualitySpeedPrice

llm-stats.com

Single dashboard that tracks LLM, image, video, TTS, STT, and embedding leaderboards together.

DashboardMulti-Modal

LLM360 Decentralized Arena

Open decentralized arena project hosted on Hugging Face as an alternative ranking ecosystem.

Open SourceDecentralized

Text

4 destinations

LMArena Text

Large-scale blind-vote Elo rankings for frontier chat models and reasoning performance.

ChatbotEloHuman Vote

OpenLM.ai

Aggregates Chatbot Arena with ARC-AGI and AAII-style evaluation signals in one view.

AggregatorReasoning

Scale AI SEAL

Expert-led benchmarking for coding and reasoning, designed for enterprise-grade comparisons.

Expert EvalPrivate

Vellum LLM Leaderboard

Prioritizes non-saturated benchmarks so rankings reflect current differentiation among top models.

Non-saturated

Image

4 destinations

LMArena Text-to-Image

Human blind-vote leaderboard focused on text-to-image quality and output preference.

GenerationBlind Vote

AA Image Arena

Tracks image generation by specialized categories such as anime, typography, UI, and photorealism.

Sub-categoriesGeneration

AA Image Edit

Dedicated lane for image editing and inpainting performance, separate from generation-only benchmarks.

EditingInpainting

HF Image Arena

Hugging Face hosted mirror for quick text-to-image leaderboard access and comparisons.

HuggingFaceMirror

Video

3 destinations

Video

LMArena Text-to-Video

Text-to-video leaderboard that includes tracks for outputs with and without generated audio.

GenerationAudio Output

Video

LMArena Image-to-Video

Ranking track for animating static images into motion clips with prompt control.

AnimationImage Input

Video

AA Video Arena

Pairwise blind-vote video benchmark for practical text-to-video model quality.

Blind Vote

Audio

4 destinations

AA Speech Arena

Blind TTS evaluation across assistant, customer service, knowledge, and entertainment scenarios.

TTSUse-cases

HF TTS Arena

Community voting interface for speech models with hidden identities during judgment.

TTSCommunity

AudioArena.ai

Conversation-first benchmark answering which voice model performs best in realistic calls.

VoiceConversational

Music Arena

Research-focused blind-vote ranking for text-to-music generation systems.

Music GenerationResearch

UI GenerationVision-Language

Code

2 destinations

Code

LMArena WebDev

Head-to-head web app generation benchmark including visual replication and interaction quality.

Code

LMArena Code

General coding leaderboard for implementation quality, correctness, and developer preference.

ProgrammingBlind Vote

Vision

1 destinations

Vision

LMArena Vision

Leaderboard for multimodal models that interpret and reason over visual inputs.

MultimodalImage Input

Search

1 destinations

LMArena Search

Search and retrieval benchmark focused on grounding quality and factual answer integrity.

RAGGrounding

Embeddings

2 destinations

Embeddings

MTEB Leaderboard

Classic embedding benchmark with broad task coverage and strong research adoption.

AcademicBenchmark

Embeddings

AA Embeddings

Deployment-minded embedding comparisons that balance quality, throughput, and cost.

QualitySpeedPrice

Reasoning

2 destinations

Reasoning

ARC-AGI

Fluid-intelligence benchmark designed to resist simple memorization and benchmark gaming.

AGIFluid Intelligence

Reasoning

LMArena Expert

Focuses on the hardest prompt tier and occupational slices for advanced reasoning evaluation.

Hard PromptsOccupational

Design

2 destinations

Design

Design Arena

Large design-output hub covering web, UI components, logos, slides, 3D, and more.

UI/UXWeb3DSVG

Design

SVG Arena

Specialized leaderboard dedicated to vector and SVG generation quality.

VectorSVG