The Laude Institute just dropped its first batch of Slingshots grants, targeting one of AI's thorniest problems: how to actually measure what these systems can do. The accelerator program is backing 15 projects focused on AI evaluation, offering the kind of compute power and engineering support that most academic researchers can only dream of.
The Laude Institute just made a major play in the AI evaluation space with its debut Slingshots program, and the timing couldn't be more critical. As AI capabilities explode across every sector, the industry is wrestling with a fundamental question: how do you actually measure what these systems can do?
The institute announced 15 projects on Thursday, each tackling different pieces of the AI evaluation puzzle. Unlike traditional academic grants that leave researchers scrambling for compute resources, Slingshots offers the full package - funding, massive compute power, and dedicated engineering support that would make most university labs jealous.
The catch? Recipients need to deliver something concrete, whether that's a startup, open-source code, or another tangible artifact. It's a hybrid model that bridges the gap between academic research and Silicon Valley's move-fast mentality.
Several projects in the cohort should ring bells for anyone following AI development. Terminal Bench is back with its command-line coding benchmark, while the ARC-AGI project continues its long-running quest to create meaningful AGI tests.
But the really interesting action is happening with the newer approaches. Formula Code, a collaboration between CalTech and UT Austin researchers, is building evaluations specifically for AI agents' code optimization skills. Meanwhile, Columbia's BizBench wants to create comprehensive benchmarks for "white-collar AI agents" - the kind that might soon be handling your expense reports or client emails.
The star power extends beyond just the projects. SWE-Bench co-founder John Boda Yang is leading CodeClash, a dynamic competition-based framework that builds on his previous success in AI code evaluation. Yang's worried about something that should keep the entire industry up at night: benchmarks becoming proprietary company tools rather than shared scientific standards.
"I do think people continuing to evaluate on core third-party benchmarks drives progress," Yang told TechCrunch. "I'm a little bit worried about a future where benchmarks just become specific to companies."
That concern hits at the heart of why Slingshots matters. As OpenAI, Google, and Microsoft race to build increasingly capable AI systems, independent evaluation becomes crucial for understanding what these models can actually do - and more importantly, what they can't.
The program's focus on AI evaluation isn't accidental. While flashy demos and impressive benchmarks grab headlines, the hard work of rigorous testing often gets overlooked. Yet without reliable evaluation methods, the industry is essentially flying blind, making claims about AI capabilities that may not hold up under scrutiny.
Other Slingshots projects are exploring equally critical areas. Some are diving into reinforcement learning structures, while others tackle model compression - the art of making AI systems smaller and more efficient without sacrificing performance. Each represents a different bet on where AI development needs the most help.
The accelerator model itself signals a shift in how AI research gets funded and executed. Traditional academic timelines, measured in years, don't match the breakneck pace of AI development. By offering startup-level resources with academic rigor, Slingshots could become a template for bridging that gap.
The Laude Institute's Slingshots program arrives at a pivotal moment for AI development. As the technology races ahead, the ability to rigorously evaluate AI systems becomes more critical than ever. By funding 15 diverse projects with serious resources, the program could help ensure that AI evaluation keeps pace with AI development - preventing the industry from building systems we can't properly understand or measure.