Top Multi-SWE-bench score in 2025?
19
Ṁ29k
Dec 31

Invalid contract

SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.

Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)

The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.

See also /SG/top-swebench-verified-score-in-2025

Get Ṁ1,000 play money
Sort by:

Have you tried gemini 2.5 pro experimental on it yet?

@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:

https://multi-swe-bench.github.io/#/

(Not sure what Mopenhands is...)