Learning What to Ask: Cost-Aware Sequential Founder Evaluation for Quant VC

Paper
Learning What to Ask and When to Stop: Cost-Aware Sequential Founder Evaluation.
Authors
Yuhang Ye (University of Oxford), Fuat Alican (Vela Research), Ben Griffin (University of Oxford), Aaron Ontoyin Yin (Vela Research), Yigit Ihlamur (Vela Research).
Venue
Preprint, 2026 (venue to be confirmed).
Status
Preprint under review.
Research program
Part of Vela's LLM-Augmented ML thread, a core pillar of quant VC at Vela.

What Learning What to Ask contributes to quant VC

Most quant VC models treat founder evaluation as a static classification problem where every feature is observed at once. That assumption breaks in the real world. Investors gather information progressively through interviews, reference calls, and background checks, and each additional inquiry costs time and money. Learning What to Ask reformulates quant VC founder screening as a sequential, cost-aware decision process: at each step, the system decides which founder attribute to query next, and learns when to stop investigating and commit to a prediction. The result is a quant VC pipeline that produces per-founder decision trajectories rather than one-shot predictions.

On VCBench, Vela's standardized quant VC benchmark of 9,000 anonymized founder-company pairs with a 9% base rate, Learning What to Ask reaches an F0.5 score of 37.1%, the highest F0.5 reported on the benchmark at the time of writing. Precision is 43.9% and recall is 23.0%, a 4.88x lift over base rate, achieved while using only 54.0% of the available founder information on average.

What is quant VC, and where does Learning What to Ask fit?

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. Quant VC treats venture capital as a rare-event prediction problem that can be modeled, measured, and improved with the same rigor that quantitative finance brings to credit risk or that quantitative medicine brings to diagnostic screening. Quant VC requires quantitative scoring against honest baselines, reproducible methodology, and interpretability that allows every prediction to be audited.

Learning What to Ask sits in Vela's LLM-Augmented ML research thread, where the LLM is a component inside a classical machine learning pipeline rather than the reasoning substrate itself. The final classifier is a small multi-layer perceptron trained inside a reinforcement learning loop. The LLM (ChatGPT-5.2) acts as an uncertainty-triggered supervisor that biases the policy toward sensible actions during early training, but never overrides it. This distinguishes Learning What to Ask from two adjacent Vela threads: the Think-Reason-Learn family uses LLMs to emit the decision logic itself, while the Multi-Agent Framework behind Vela's V agent orchestrates many LLM calls across a pipeline. Here the LLM advises, and the RL policy decides.

How does Learning What to Ask work as a quant VC decision system?

The paper formalizes founder evaluation as a finite-horizon Markov Decision Process. At each step, the agent picks either a query action (reveal one founder attribute such as education, role, or industry) or a stop action (commit to a prediction from the classifier). Five components work together.

Cost-aware reward design. A terminal reward encodes the confusion outcome, with separate magnitudes for true positives, false positives, true negatives, and false negatives. This turns the RL objective into a cost-sensitive classification problem that reflects the asymmetric costs of venture investment: false positives carry capital risk, false negatives carry opportunity cost. Shaping penalties on each query and on redundant queries model the real cost of due diligence in time and money.

Monte Carlo rollouts for non-myopic action evaluation. With only a terminal reward, the learning signal is sparse. The paper estimates action utilities via Monte Carlo rollouts starting from each candidate action and simulating the rest of the trajectory under the current policy. This reveals which information queries have long-horizon value in quant VC workflows, where the signal from any one diligence item only pays off when combined with later ones.

Dual-scale policy head. Information actions and the stop action are not normalized together. Information actions are reversible and meaningful relative to one another (softmax over action values), while stopping is irreversible and must be evaluated on an absolute confidence scale (sigmoid over the stop value). This dual-scale construction is what makes the system risk-aware: it distinguishes “which thing to ask next” from “do I have enough to commit.”

Supervised distillation in place of policy gradient. The paper shows that PPO collapses to a degenerate majority-class strategy (0% precision) under sparse, asymmetric rewards. Instead, the rollout-improved teacher policy is distilled into the policy network via cross-entropy loss, while the classifier is trained in parallel via binary cross-entropy on terminal states. This is lower-variance and more stable than policy gradient for this setting.

LLM supervisor as uncertainty-triggered advisor. When the top-1 and top-2 actions under the current policy are close in probability, the LLM is queried for a recommendation and the policy distribution is biased multiplicatively toward that action. The LLM never sets rewards, outputs stopping decisions, or modifies the MDP. Ablation shows the supervisor shapes training dynamics and pushes the final model to use 54.0% of available information instead of 66.6%.

How accurate is Learning What to Ask?

Evaluation runs on VCBench, the world's first AGI benchmark for venture capital. The dataset contains 9,000 anonymized U.S. founder-company pairs from 2010 to 2018 with a 9% success rate, where success is defined as an IPO, an acquisition, or cumulative fundraising above $500M within eight years of founding.

Headline numbers from the paper, averaged across 10 random test seeds:

  • Precision: 43.9%.
  • Recall: 23.0%.
  • F0.5: 37.1%, the highest reported on VCBench at the time of writing.
  • Information used per founder: 54.0%.
  • Lift over the 9% VCBench base rate: 4.88x.

Baseline comparisons on VCBench at 9% prevalence:

  • Random classifier: 9.0% precision, F0.5 of 9.0.
  • Tier-1 VCs: 23.0% precision, F0.5 of 10.7.
  • Reasoned Rule Mining: 87.5% precision, F0.5 of 21.0 (extreme precision but very low recall).
  • GPT-4o: 30.0% precision, F0.5 of 25.7.
  • Policy Induction: 41.0% precision, F0.5 of 34.0.
  • Learning What to Ask: 43.9% precision, F0.5 of 37.1.

Ablation studies on VCBench confirm each design choice:

  • One-shot neural net with all features: 34.8% F0.5, 100% of information used.
  • PPO (policy gradient baseline): 0% precision, collapses to majority class under sparse rewards.
  • Myopic RL (one-step lookahead): 22.8% F0.5, 21.7% of information used.
  • No LLM supervisor: 36.5% F0.5, 66.6% of information used.
  • Full model: 37.1% F0.5, 54.0% of information used.

The full model beats the static neural net (37.1% vs 34.8% F0.5) while using roughly half the information, and improves on Policy Induction, the closest interpretable sibling on the benchmark, by +3.1 F0.5 points.

At Vela's real-world founder-screening prevalence of 1.9%, the production quant VC stack (Think-Reason-Learn and LLM-Augmented ML families combined) reaches 19% to 38% precision, a 10x to 20x lift over the 1.9% US unicorn base rate. Learning What to Ask contributes an adaptive, cost-aware layer to that stack.

Why cost-aware sequential evaluation matters for quant VC

A quant VC firm that screens thousands of founders a year cannot afford full due diligence on every candidate. The marginal cost of each additional query is non-trivial. Learning What to Ask makes that cost explicit in the objective function. Founders with clear negative signals are rejected after two or three queries, borderline cases get investigated more deeply, and obvious positives trigger exploration into execution history and industry fit. The policy learns to allocate due-diligence effort in proportion to evidential return, which gives the firm a principled way to set diligence budgets instead of a global “always collect X data points” rule.

What makes Learning What to Ask auditable for quant VC decisions

Every prediction comes with a full query trajectory: which attributes were requested, in what order, and where the policy decided to stop. The paper reports representative trajectories where positive predictions explore deep into execution experience and industry fit, while negative predictions stop early after seeing education and role history alone. A partner can read these trajectories directly. The system does not hide its reasoning in an attention map or a single score. Because the classifier is a small MLP over observed fields, individual predictions can also be probed by counterfactual without additional LLM calls, and the decision threshold τ is an explicit hyperparameter exposed for tuning against a firm's own cost preferences rather than being hard-coded at 0.5.

How Learning What to Ask fits into Vela's quant VC research program

Learning What to Ask extends Vela's LLM-Augmented ML thread by showing that LLMs can be useful as soft supervisors inside reinforcement learning systems, not only as feature generators or rule writers. The papers in this thread are:

  • LLM-AR: neural-symbolic screening that converts LLM heuristics into ProbLog rules, reaching 59.5% precision at a 5.9x lift over random.
  • GPT-HTree: hierarchical clustering combined with LLM-derived founder personas for interpretable segmentation.
  • Rare-event prediction: LLM-powered feature engineering combined with an XGBoost, Random Forest, and Logistic Regression ensemble, reaching an 11.1x lift over random.
  • Verifiable Reasoning: LLMs as deterministic code generators, reaching 37.5% precision on VCBench with 99% lower API cost than per-sample LLM evaluation.
  • Learning What to Ask (this paper): LLM as uncertainty-triggered supervisor inside a cost-aware RL policy for founder screening.

The broader quant VC program at Vela also includes the Think-Reason-Learn family (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction), the VCBench benchmark, and the Multi-Agent Framework that became Vela's V agent. Policy Induction is the closest methodological sibling on the benchmark, and Learning What to Ask improves on it by +3.1 F0.5 points while using roughly half the information that a static full-information model would require.

Limitations

The paper is explicit about several limitations. Performance varies meaningfully across random seeds: F0.5 ranges from 26.3% to 40.8% across the 10 test splits reported in Table 1, indicating that the small positive class in VCBench creates substantial sampling variance. Monte Carlo rollouts are computationally expensive at training time, though they are not required at inference. The decision threshold τ is a hyperparameter that must be tuned to a firm's cost preferences, and the paper notes that a default τ = 0.5 yields almost no true positives under 9% prevalence. Evaluation is limited to VCBench, so generalization to other datasets or success definitions requires further validation, and the framework assumes a fixed set of K queryable information fields rather than supporting arbitrary new fields at inference time.

Read the paper

Learning What to Ask and When to Stop: Cost-Aware Sequential Founder Evaluation.
Yuhang Ye, Fuat Alican, Ben Griffin, Aaron Ontoyin Yin, Yigit Ihlamur.
Preprint, 2026. Venue to be confirmed.

Learning What to Ask is part of Vela's quant VC research program, anchored by Think-Reason-Learn and benchmarked on VCBench. For adjacent work in sequential and LLM-augmented quant VC screening, see Policy Induction (the closest sibling on the benchmark), Verifiable Reasoning, LLM-AR, and GPT-HTree.

Authored by members of the Vela team. See the full roster of contributors.

For research collaboration on quant VC, cost-aware decision systems, reinforcement learning for founder screening, or sequential information acquisition, email engage@vela.partners.

Privacy