Project Bench Update: 50/200 Solved — All 100% Correct, EcoKure Blog

A quick update on Project Bench progress.

As of today, Jarvi3 AI has solved 50 of 200 Project Bench tasks. Every solution — all 50 — has been verified as 100% correct.

What is Project Bench?

Project Bench is an internal benchmark of 200 complex, multi-file software engineering tasks. Where SWE-bench Verified tests isolated bug fixes, Project Bench tasks require:

Understanding large, interconnected codebases
Reasoning across multiple files and modules
Making architectural decisions, not just patches
Producing solutions that pass comprehensive test suites

These are the kinds of tasks that take experienced engineers hours — not minutes.

Why 100% Correctness Matters

Most AI benchmarks accept partial credit. A model that gets 60% of the test cases right still gets counted as "solving" the problem.

We don't accept partial solutions. Jarvi3 either solves a Project Bench task completely correctly, or it doesn't solve it at all. The 50 tasks it has solved: all correct. No partial marks.

This is a deliberate product decision, not a benchmark gaming strategy. In real engineering work, a half-correct answer is often worse than no answer — it misleads, it breaks production, it wastes review time.

The Architecture Behind It

The improvement from SWE-bench to Project Bench required two things:

Deeper codebase traversal — Project Bench tasks span multiple files. We extended the code generation lane to maintain context across files and modules, not just within a single function.
SuperMath Brain integration — A significant portion of Project Bench tasks involve algorithms with provable correctness properties. Routing these through the deterministic logical reasoning layer — rather than generation — eliminates a major source of errors.

More technical detail on SuperMath Brain is coming. Open-source possibilities are being evaluated.

Timeline

We're 25% through the benchmark with 0% error rate. We're on track for 200/200 by June 2026.

When that happens, we'll publish a full technical write-up.

Follow along at jvi3.com.