Question 1

What is LLM-AR?

Accepted Answer

LLM-AR (LLM-powered Automated Reasoning) is a neural-symbolic architecture for quant VC founder evaluation developed by Vela Research and the University of Oxford. An LLM proposes prediction rules in natural language, the rules are translated into ProbLog (a probabilistic Prolog extension), and ProbLog executes them as a deterministic reasoning engine. Published on arXiv in October 2025 as arXiv:2510.22034.

Question 2

What is quant VC?

Accepted Answer

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. It treats venture capital as a rare-event prediction problem and brings the same rigor to it that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. A quant VC firm requires quantitative scoring against honest baselines, reproducible methodology, and interpretability so every prediction can be audited.

Question 3

Where does LLM-AR fit in Vela's quant VC research program?

Accepted Answer

LLM-AR sits in the LLM-Augmented ML strand of Vela's quant VC research, which wraps LLMs inside classical ML pipelines rather than making them the final decision-maker. This is distinct from the Multi-Agent Systems line (Founder-GPT, SSFF) that orchestrates LLMs as reasoning agents and the Think-Reason-Learn family (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) that uses LLMs to induce auditable rules evaluated directly. LLM-AR splits the difference, the LLM induces rules, but ProbLog executes them symbolically.

Question 4

How does LLM-AR predict founder success?

Accepted Answer

Four components. (1) Policy generation: the LLM writes explanations for each training founder, then summarizes them into probabilistic IF-THEN rules. (2) Statistical calibration: association-rule mining surfaces feature combinations; rule probabilities are recalibrated against empirical success rates from 1,000 training founders. (3) Iterative refinement: the LLM reflects on the statistical analysis of its own policy and revises it, with F0.25-based evaluation every 5 iterations. (4) ProbLog execution: the final policy is translated into ProbLog syntax; two queries (success, failure) run with tuned thresholds.

Question 5

How accurate is LLM-AR?

Accepted Answer

On a 4-fold cross-validated 6,000-founder dataset at 10% prevalence: 59.5% precision at 8.7% recall, F0.25 = 41.6%, a 5.9x lift over random. Beats every vanilla LLM tested: GPT-4o mini (49.5%), GPT-4o (32.3%), DeepSeek-V3 (31.0%), o3-mini (21.6%). Also beats the tier-1 seed-fund baseline of 29.5%. Ablations: without iterative training 36.1%; with GPT-4o mini replacing ProbLog 46.2%.

Question 6

Why does neural-symbolic reasoning matter for quant VC?

Accepted Answer

Pure LLMs are unreliable at high-stakes precision, the same prompt can yield different answers, and the decision path is not inspectable. LLM-AR shows this empirically: every vanilla LLM tested had either inconsistent precision or unacceptable recall, and the best (GPT-4o mini at 49.5%) still lagged LLM-AR by 10 points. Neural-symbolic architectures let the LLM do heuristic pattern-finding and a symbolic engine (ProbLog) do deterministic, transparent execution, exactly the split a quant VC firm needs for auditable decisions.

Question 7

Is LLM-AR tunable without retraining?

Accepted Answer

Yes. By optimizing the success-threshold and failure-threshold pair against different F-beta values, the same policy produces a spectrum of precision-recall trade-offs: 100% precision at 2% recall when optimizing for F0.125; 92% recall at 12.5% precision when optimizing for F4. Investors can dial this to match their portfolio construction without re-running the training loop.

Question 8

What makes LLM-AR auditable?

Accepted Answer

Three inspectable artifacts per decision: a human-readable policy (the set of IF-THEN prediction rules), per-rule probabilities calibrated against real data, and a ProbLog execution trace showing which rules fired with what probabilities. A partner can open the policy, edit a rule by hand, redeploy, and get a new prediction immediately. Expert-in-the-loop editing is an explicit design goal of the paper.

Question 9

What are LLM-AR's limitations?

Accepted Answer

Five. (1) The 10% success prevalence is inflated from the 1.9% real-world base rate, so absolute precision numbers should be interpreted cautiously for deployment. (2) Policy size and evaluation speed are constrained by the Python ProbLog implementation. (3) The training loop does not guarantee monotone improvement across iterations; authors mitigate by testing multiple candidate policies. (4) The LLM reasons only over the 52 engineered features in the fixed dataset because statistical calibration requires numeric representations. (5) All features are existing founder attributes rather than newly discovered ones.

LLM-AR: Neural-Symbolic Reasoning for Quant VC Founder Evaluation

What LLM-AR contributes to quant VC

What is quant VC, and where does LLM-AR fit?

How does LLM-AR predict founder success?

How accurate is LLM-AR?

Why neural-symbolic reasoning matters for quant VC

What makes LLM-AR auditable for quant VC decisions

How LLM-AR fits into Vela's quant VC research program

Limitations

Read the paper