AI Research Monthly: Feb-Mar 2026 — The Exam Everyone Trusted Was Broken
AI Research Monthly: Feb-Mar 2026 — The Exam Everyone Trusted Was Broken Your friend who reads AI papers so you don't have to. Only findings with real numbers — no hype, no "vibe coding is a trend"...

Source: DEV Community
AI Research Monthly: Feb-Mar 2026 — The Exam Everyone Trusted Was Broken Your friend who reads AI papers so you don't have to. Only findings with real numbers — no hype, no "vibe coding is a trend". 1. The Most-Used AI Coding Test Had Broken Answer Keys (And Nobody Noticed for Months) What is SWE-bench Verified? It's a benchmark (standardized test) for measuring how well AI can write code. Here's how it works: it takes 500 real GitHub issues — actual bugs reported by real developers in real open-source projects — gives the AI the buggy source code, and asks it to write a patch that fixes the bug. Then it runs the project's own test suite (automated tests) to check if the fix actually works. Your score, called the "resolve rate," is what percentage of the 500 bugs you fixed correctly. Think of it as a coding exam, but instead of textbook exercises, the questions are real bugs from real projects. Every major AI company (OpenAI, Google, Anthropic) used this test to claim "our model is the