Question 1

What is VCBench?

Accepted Answer

VCBench is the first AGI benchmark for venture capital: a standardized, anonymized, publicly hosted dataset of 9,000 founder profiles with ground-truth success labels ($500M+ IPO, acquisition, or total funding), designed so language models, human investors, and specialized quant VC models can all be measured on the same task. Released in structured JSON and anonymized prose. Public leaderboard at vcbench.com. arXiv:2509.14448.

Question 2

What is quant VC?

Accepted Answer

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. It treats venture capital as a rare-event prediction problem and brings the same rigor to it that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. A quant VC firm requires quantitative scoring against honest baselines, reproducible methodology, and interpretability so every prediction can be audited.

Question 3

How is VCBench constructed?

Accepted Answer

Four-stage pipeline. (1) Coverage improvement from LinkedIn and Crunchbase. (2) Format standardization and filtering, industry vocabulary reduced 80.6% (314 → 61 clusters), education degrees 81.3% (2,155 → 404). (3) Anonymization, names, companies, locations, and exact dates removed; education prestige preserved via QS rankings; career durations bucketed. (4) Iterative anonymization driven by adversarial re-identification testing with DeepSeek-R1 (offline) and Gemini-2.5-Pro web-search (online). Online re-identification dropped from 77.0% to 15.1%, offline from 17.2% to 1.3% (92% total reduction). Half the 9,000 founders are held back as a private fold to prevent pre-training contamination.

Question 4

How do current LLMs perform on VCBench?

Accepted Answer

Across nine state-of-the-art LLMs evaluated on six folds of 1,500 founders each: GPT-4o leads on F0.5 (25.1) with 29.1% precision, roughly 3.2x the 9% index, above both Y Combinator (~1.7x index) and tier-1 VC firms (~2.9x index). DeepSeek-V3 leads on raw precision (59.1%) at the cost of recall (3.0%). Other strong results: DeepSeek-R1 (F0.5 22.1), GPT-4o-mini (21.2), o3 (20.9), Gemini-2.5-Pro (20.1). The headline: vanilla LLMs already match or beat human experts on standardized quant VC screening, once evaluated on a clean benchmark that prevents identity leakage.

Question 5

What are the human baselines on VCBench?

Accepted Answer

A random classifier matches the 9.0% prevalence. Y Combinator is approximately 1.7x the index. Tier-1 VC firms are approximately 2.9x the index. These are the reference points every quant VC model must clear to justify deployment. GPT-4o's 29.1% precision clears both.

Question 6

Why does VCBench use F0.5 as its primary metric?

Accepted Answer

F0.5 weights precision twice as heavily as recall. That matches the economics of venture investing: a false positive (funding a bad founder) costs far more than a false negative (missing a good one). Optimizing for F1 or accuracy would misrepresent the real-world utility of a quant VC model.

Question 7

How does VCBench fit into Vela's quant VC research program?

Accepted Answer

VCBench is the shared evaluation infrastructure for everything Vela publishes. All four Think-Reason-Learn papers (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) report VCBench results. SSFF (the multi-agent architecture that became V) uses VCBench-compatible evaluation. Founder-GPT predates the benchmark but its descendants all report VCBench scores. Externally, VCBench is the quant VC equivalent of SWE-bench for software engineering or SDBench for medical diagnosis.

Question 8

What are VCBench's limitations?

Accepted Answer

Five explicit limitations. (1) The 9% prevalence is higher than the 1.9% real-world base rate, so precision multipliers may not hold exactly under true distribution. (2) VC firms self-select their deal flow in ways a benchmark cannot replicate. (3) The dataset covers US founders 2010-2018 and inherits LinkedIn and Crunchbase biases. (4) An eight-year success horizon introduces right-censoring for newer cohorts. (5) Residual noise remains after multistage cleaning.

VCBench — the AGI benchmark for quant VC.

What VCBench contributes to quant VC

What is quant VC, and where does VCBench fit?

How is VCBench constructed?

Who's on the VCBench leaderboard?

How VCBench measures quant VC progress

What makes VCBench auditable for quant VC research

How VCBench fits into Vela's quant VC research program

Limitations

Read the paper, use the benchmark