Lmst

Despite not yet being a benchmark, the First Proof project is by far the best measure of model usefulness for science and math research available today, and I very much hope that frontier labs continue to take future rounds seriously.

https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel

#firstProof #mathematics #AI #machineLearning #research

OpenAI veröffentlicht Lösungsansätze für den First Proof Wettbewerb.

Der Test enthält unveröffentlichte Mathematikaufgaben, um Reasoning ohne Trainingsdaten-Vorwissen zu testen. Laut OpenAI wurden mehrere Probleme gelöst. Die externe Validierung der formalen Beweise durch die Initiatoren steht derzeit noch aus. #OpenAI #FirstProof #JamesRLee
https://www.all-ai.de/news/beitrage2026/mathe-first-proof

Kimon Fountoulakis (@kfountou)

작성자는 해당 결과가 진정한 일반화였는지, 어떤 의미에서 일반화인지 의문을 제기합니다. 사람들이 'first proof'라 말할 때 보통 문헌에서 완전한 종단 간(end-to-end) 증명을 스스로 찾지 못했을 뿐 핵심 단계들은 이미 존재했을 가능성이 크다고 지적하며, '첫 증명'의 정의와 주장 검증의 중요성을 강조합니다.

https://x.com/kfountou/status/2022670003191902263

#research #proofs #ai #firstproof

Jakub Pachocki (@merettm)

저자는 'First Proof' 챌린지에 큰 기대를 표하며, 차세대 AI 모델 능력 평가에 있어 새로운 최전선 연구가 중요하다고 강조합니다. 내부적으로 제한된 인간 감독 하에 제안된 10개 문제에 대해 자사 모델을 실행했다고 밝히며, 이는 AI의 수학적 증명 능력과 자율성 평가에 관한 중요한 실험임을 시사합니다.

https://x.com/merettm/status/2022517085193277874

#firstproof #ai #theoremproving #research #ml

si que engancha esto si... #firstProof

Please help promote this project called "First Proof" led by Mohammed Abouzaid (Stanford), Nikhil Srivastava (Cal), Rachel Ward (UT Austin), and Lauren Williams (Harvard). The goal is to understand the capabilities of AI systems on problems that come up in math research. We have a collection of research problems for which solutions have not yet been posted online, so it's a good testbed. The solutions will come out in just one week. Take a crack at it! #FirstProof #1stProof

https://arxiv.org/abs/2602.05192

#firstproof

Client Info