Question 1

What is Learning What to Ask?

Accepted Answer

Learning What to Ask is a quant VC founder-screening architecture that treats evaluation as a sequential, cost-aware decision process. A reinforcement learning policy picks which founder attribute to query next or commits to a prediction; an LLM (ChatGPT-5.2) acts as an uncertainty-triggered supervisor that biases the policy during training but never overrides it. Developed by Vela Research and the University of Oxford. Preprint, 2026.

Question 2

What is quant VC?

Accepted Answer

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. It treats venture capital as a rare-event prediction problem and brings the same rigor to it that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. A quant VC firm requires quantitative scoring against honest baselines, reproducible methodology, and interpretability so every prediction can be audited.

Question 3

Where does Learning What to Ask fit in Vela's quant VC research program?

Accepted Answer

Learning What to Ask sits in Vela's LLM-Augmented ML thread alongside LLM-AR, GPT-HTree, Rare-Event Prediction, and Verifiable Reasoning. The thread treats the LLM as a component inside classical ML, not the reasoning substrate. Here the LLM advises (uncertainty-triggered supervisor), and the RL policy decides. Policy Induction is the closest methodological sibling on VCBench, and Learning What to Ask improves on it by +3.1 F0.5 points while using roughly half the information a static full-information model would require.

Question 4

How does Learning What to Ask work?

Accepted Answer

Founder evaluation is formalized as a finite-horizon Markov Decision Process. Five components work together: (1) cost-aware reward design with asymmetric TP/FP/TN/FN magnitudes and per-query shaping penalties; (2) Monte Carlo rollouts for non-myopic action evaluation under sparse terminal rewards; (3) a dual-scale policy head (softmax over information actions, sigmoid over the stop action) so reversible and irreversible choices have different normalizations; (4) supervised distillation of a rollout-improved teacher into the policy network instead of PPO, which collapses to 0% precision under sparse asymmetric rewards; (5) uncertainty-triggered LLM supervision that multiplicatively biases the policy when top-1 and top-2 actions are close in probability.

Question 5

How accurate is Learning What to Ask on VCBench?

Accepted Answer

Averaged over 10 random test seeds on VCBench (9% base rate): precision 43.9%, recall 23.0%, F0.5 = 37.1%, the highest F0.5 reported on VCBench at the time of writing, a 4.88x lift. Information used per founder averages 54.0%. Beats Policy Induction (F0.5 34.0), GPT-4o (F0.5 25.7), and a static neural net with all features (F0.5 34.8, 100% information). Ablations: removing the LLM supervisor drops F0.5 to 36.5% and raises information usage to 66.6%; myopic RL drops F0.5 to 22.8%; PPO collapses to 0% precision.

Question 6

Why does cost-aware sequential evaluation matter for quant VC?

Accepted Answer

A quant VC firm screening thousands of founders a year cannot afford full due diligence on every candidate. Each additional interview, reference call, or background check costs time and money. Learning What to Ask makes that cost explicit in the objective. Founders with clear negative signals get rejected after two or three queries; borderline cases get deeper investigation; obvious positives trigger exploration into execution history and industry fit. The policy allocates diligence effort in proportion to evidential return, which gives the firm a principled way to set diligence budgets.

Question 7

What makes Learning What to Ask auditable?

Accepted Answer

Every prediction comes with a full query trajectory: which attributes were requested, in what order, and where the policy decided to stop. A partner can read the trajectory directly, positive predictions tend to explore execution experience and industry fit, negative predictions stop early after education and role history. The classifier is a small MLP over observed fields, so counterfactual probing does not require additional LLM calls. The decision threshold τ is an explicit hyperparameter exposed for tuning against a firm's own cost preferences.

Question 8

What are Learning What to Ask's limitations?

Accepted Answer

Five explicit limitations. (1) Performance varies meaningfully across seeds, F0.5 ranges from 26.3% to 40.8% across 10 test splits, indicating substantial sampling variance from the small positive class. (2) Monte Carlo rollouts are computationally expensive at training time (not required at inference). (3) The decision threshold τ must be tuned to firm cost preferences; default τ = 0.5 yields almost no true positives at 9% prevalence. (4) Evaluation is limited to VCBench. (5) The framework assumes a fixed set of K queryable information fields rather than supporting arbitrary new fields at inference time.

Learning What to Ask: Cost-Aware Sequential Founder Evaluation

What Learning What to Ask contributes to quant VC

What is quant VC, and where does Learning What to Ask fit?

How does Learning What to Ask work as a quant VC decision system?

How accurate is Learning What to Ask?

Why cost-aware sequential evaluation matters for quant VC

What makes Learning What to Ask auditable for quant VC decisions

How Learning What to Ask fits into Vela's quant VC research program

Limitations

Read the paper