NoLiMa

Long context%

Long-context information retrieval without literal matching — requires semantic reasoning to find relevant facts, not string-matching.

At a glance

🏆 Top score

Total results

Models tested

Providers

Verified · Self-reported

6 · 0

Average

31.72 %

Median

11.94 %

Range

2.92 – 83.46 %

Retrieval tasks where the query and relevant passage share no literal token overlap; model must use semantic understanding.

Focuses on retrieval — does not measure downstream reasoning on retrieved facts.

Showing 6 of 6

ProviderSourceSort by

#	Model	Provider	Score (%)	Source	Date
1	Claude Opus 4.7	Anthropic	83.46	Third-party llm-stats.com	Apr 18, 2026
2	Qwen3.5 397B A17B	Alibaba	73.46	Third-party llm-stats.com	Apr 18, 2026
3	Grok 4.20	xAI	14.02	Third-party llm-stats.com	Apr 18, 2026
4	GPT-5.4 nano	OpenAI	9.85	Third-party llm-stats.com	Apr 18, 2026
5	GPT-5.4 mini	OpenAI	6.62	Third-party llm-stats.com	Apr 18, 2026
6	Claude Haiku 4.5	Anthropic	2.92	Third-party llm-stats.com	Apr 18, 2026