Last updated: May 2026Not sure which to pick? Take the 60-second quiz →

State of AI Models — May 2026

Every major AI model compared in one place. Pricing, capabilities, context windows, and honest ratings — updated monthly. Bookmark this page.

Feature	🟢ChatGPT GPT-5.5	🟠Claude Opus 4.8	🔵Gemini Gemini 3.5 Flash	🔷Perplexity Multi-model (GPT-5.5, Claude, etc.)	🟦Copilot GPT-5.5 (Microsoft hosted)	⚫Grok Grok 4.3
Flagship Model	GPT-5.5	Opus 4.8	Gemini 3.5 Flash	Multi-model (GPT-5.5, Claude, etc.)	GPT-5.5 (Microsoft hosted)	Grok 4.3
Consumer Price	$20/mo (Plus) / $200/mo (Pro)	$20/mo (Pro) / $100 (Max 5x) / $200 (Max 20x)	$19.99/mo (AI Pro) / $99.99 (Ultra)	$20/mo (Pro)	$20/mo (Pro) / $30/mo (M365)	Included with X Premium+ ($16/mo)
Free Tier	✓	✓	✓	✓	✓	—
Context Window	256K tokens	200K standard / 1M beta	1M tokens	N/A (search-based)	128K tokens	128K tokens
Best For	Generalists, beginners, creative tasks	Writers, developers, agents, long docs	Google users, speed-sensitive work, multimodal	Researchers, fact-checkers, journalists	Microsoft ecosystem users, Office workers	X/Twitter power users, unfiltered analysis
Writing	4/5	5/5	4/5	3/5	3/5	3/5
Coding	5/5	5/5	4/5	2/5	4/5	4/5
Research	4/5	4/5	4/5	5/5	3/5	4/5
Speed	4/5	4/5	5/5	4/5	4/5	4/5
Value	4/5	4/5	5/5	4/5	3/5	3/5
Web Search	✓	✓	✓	✓	✓	✓
Image Gen	✓	—	✓	✓	✓	✓
Voice Mode	✓	—	✓	—	✓	—
File Upload	✓	✓	✓	✓	✓	✓
Code Exec	✓	✓	✓	—	—	—
API Access	✓	✓	✓	✓	—	✓
Mobile App	✓	✓	✓	✓	✓	✓
Strengths	+ Plugin ecosystem + Image gen (DALL-E) + Voice mode + Broadest feature set + Improved factuality (GPT-5.5)	+ Agentic coding leader + Calibrated honesty + Superior writing + Instruction following + API $5/$25 per M tokens	+ 4x faster output vs frontier + Google Workspace integration + Best free tier + Multimodal (video/audio) + API $1.50/$9 per M tokens	+ Real-time citations + Source transparency + Research-grade accuracy + Multi-model access	+ Office 365 integration + Windows native + Free web access + Enterprise-friendly	+ Real-time X/Twitter data + Unfiltered responses + Image generation + Strong reasoning + Competitive API pricing
Weaknesses	– Limits on Thinking model – No long-context flat pricing – Generic writing style	– No image generation – Smaller plugin ecosystem – Fewer integrations	– Writing still behind Claude – 2x price above 200K context – Quality varies by task	– Not ideal for creative writing – Limited long conversations – No custom workflows	– Relies on OpenAI models – Less customizable – Aggressive upselling	– Tied to X ecosystem – Less polished UX – Smaller developer ecosystem

Build a custom comparison table →Take the Model Picker Quiz →

Quick Verdicts

🟢

ChatGPT

OpenAI

GPT-5.5 is the broadest feature set and biggest ecosystem. If you want one tool that does everything — search, images, voice, code — this is still the default. Hallucination rates improved vs GPT-5.4, but writing quality is fine, rarely great.

🟠

Claude

Anthropic

Opus 4.8 leads agentic coding and calibrated honesty — the model that flags its own uncertainty instead of bluffing. Best writer and coder for hard work; 200K standard context (1M in beta). No image generation.

🔵

Gemini

Google

Gemini 3.5 Flash is Google's default model in 2026: roughly 4x faster output than other frontier models, the best free tier, and tight Workspace integration. Writing still trails Claude; for a three-way benchmark read, see our Opus 4.8 vs GPT-5.5 vs Gemini breakdown.

🔷

Perplexity

Perplexity AI

Built for research, not conversation. Every answer comes with citations. If you need facts with sources, this is the tool. Not great for creative work or coding.

🟦

Copilot

Microsoft

Makes sense if you're already paying for Microsoft 365 — you get GPT-5.5 (Microsoft hosted) inside Office. Otherwise, you're getting repackaged OpenAI with fewer features. Office integration is the main edge.

⚫

Grok

xAI

Grok 4.3 is the X/Twitter-native option: real-time social data and unfiltered analysis. The ecosystem is small and the UX is rough, but reasoning and API pricing are competitive.

How We Rate

Ratings are 1–5 dots for Writing, Coding, Research, Speed, and Value.

They reflect practical, day-to-day usefulness for knowledge work (not benchmark scores). When models ship major updates, we revise these monthly.

Want to optimize prompts across multiple AI models?

Get TresPrompt →

Updated monthly. Last verified: May 2026. Something wrong? Let us know.