Start Assessment

Neomanex Research Lab

Reproducible benchmarks, methodology, and leaderboards for conversational AI and AI operations.

1 publication

Conversational AIThis researchbenchmarksGnosari
Conversational Extraction: Which LLMs actually capture structured data from real dialogue?
11 LLMs from 3 providers on 44 multi-turn dialogues. gemini-2.5-pro enters at #3 (97.9%), tied with gpt-4o. Long-context stays the biggest discriminator (16.7 pp gap, n=6).
v1.1·11 models