xAI's Grok models have shown incremental gains on the demanding FrontierMath benchmark, which features unpublished, research-level math problems where even leading systems like early Grok variants and o1-preview scored below 2% in initial evaluations. Grok-3 mini recently posted one of the stronger results at 6%, while Grok 4 variants hover in the low-to-mid teens in select Tier 4 tests amid competition from GPT-5 Pro and Gemini derivatives. With the June 30 resolution deadline just weeks away and no confirmed xAI model release or Epoch AI re-evaluation scheduled, trader sentiment hinges on whether ongoing training runs or tool-augmented inference can push scores higher before cutoff, in a landscape where math reasoning advances remain uneven across frontier labs.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · Actualizado$21,913 Vol.
25%+
56%
30%+
40%
40%+
60%
50%+
29%
$21,913 Vol.
25%+
56%
30%+
40%
40%+
60%
50%+
29%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Mercado abierto: Jan 30, 2026, 12:01 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...xAI's Grok models have shown incremental gains on the demanding FrontierMath benchmark, which features unpublished, research-level math problems where even leading systems like early Grok variants and o1-preview scored below 2% in initial evaluations. Grok-3 mini recently posted one of the stronger results at 6%, while Grok 4 variants hover in the low-to-mid teens in select Tier 4 tests amid competition from GPT-5 Pro and Gemini derivatives. With the June 30 resolution deadline just weeks away and no confirmed xAI model release or Epoch AI re-evaluation scheduled, trader sentiment hinges on whether ongoing training runs or tool-augmented inference can push scores higher before cutoff, in a landscape where math reasoning advances remain uneven across frontier labs.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes