MRCR v2

Long context%

Multi-Round Conversational Reasoning — tests whether a model can maintain facts and context across long multi-turn dialogues.

At a glance

🏆 Top score

Total results

Models tested

Providers

Verified · Self-reported

9 · 0

Average

28.67 %

Median

23.76 %

Range

15.41 – 50.78 %

Conversational chains of 8–32 turns requiring the model to track entities, preferences, and reasoning steps.

Synthetic conversations may not capture organic dialogue patterns.

Showing 9 of 9

ProviderSourceSort by

#	Model	Provider	Score (%)	Source	Date
1	Claude Sonnet 4.6	Anthropic	50.78	Third-party llm-stats.com	Apr 18, 2026
2	Claude Sonnet 4	Anthropic	47.54	Third-party llm-stats.com	Apr 18, 2026
3	Gemma 4 31B	Google	33.17	Third-party llm-stats.com	Apr 18, 2026
4	GPT-5.4	OpenAI	29.84	Third-party llm-stats.com	Apr 18, 2026
5	GPT-5.4 mini	OpenAI	23.76	Third-party llm-stats.com	Apr 18, 2026
6	Qwen3.5-27B	Alibaba	20.72	Third-party llm-stats.com	Apr 18, 2026
7	Grok 4.20	xAI	20.19	Third-party llm-stats.com	Apr 18, 2026
8	Claude Haiku 4.5	Anthropic	16.65	Third-party llm-stats.com	Apr 18, 2026
9	GPT-5.4 nano	OpenAI	15.41	Third-party llm-stats.com	Apr 18, 2026