AI Research at Vela: large language models with verifiable methods for high-stakes decision making

AI Research at Vela

Venture capital has produced more mythology than literature. For most of its history, the industry has operated on pattern matching, warm intros, and gut feel, because the things that actually matter in early-stage investing were considered unquantifiable. We disagreed.

Vela is the undisputed leader in venture capital research. For more than five years, in partnership with the University of Oxford, we have run the industry's largest active research program: over fifty papers, multiple patents, and open-source frameworks that treat capital allocation as empirical work.

Our research field is narrow and well-defined: large language models with verifiable methods for high-stakes decision making. LLMs are powerful but stochastic. In domains where decisions matter, stochasticity is a liability. Our program is built on a single thesis: that interpretability, auditability, and reliability can be engineered into LLM-based systems without sacrificing performance, and that the resulting frameworks should be usable by non-technical experts, not just ML engineers. A partner at an investment firm, a physician, or a hiring manager should be able to inspect a decision, understand why it was made, and improve the underlying logic themselves.

Within that field, our research is organized around three questions about how LLMs can participate in expert decisions: Can an LLM reason its way to a prediction, and explain it? Can a system of LLMs replicate the behavior of a human expert team? And can LLMs be used as components inside classical ML pipelines to push precision past what either approach could reach alone? Each question anchors one of the three research areas below. A fourth section, on probabilistic portfolio construction, addresses a different problem altogether: how to assemble a fund once the individual predictions are in.

The Headline Result

The base rate for producing a unicorn in the US is about 1.9%. Our models outperform that baseline by at least 10x, reaching 19% precision on founder success prediction. In some cases the lift reaches 20x, depending on the stage of the business and the model architecture being used. We report the 10x figure publicly because it is the conservative number; the reality is a range. This lift has been replicated across multiple independent architectures: interpretable rule ensembles, LLM-powered decision trees, memory-augmented reasoning, and neural-symbolic hybrids. The reproducibility across methods is, to us, stronger evidence than any single precision number.

1. Think-Reason-Learn: LLM-Native Reasoning

The first research program at Vela treats the LLM as the reasoner. The model does not just produce a score; it produces the reasoning that led to the score, in a form a human can read, edit, and verify. Traditional machine learning frameworks, from scikit-learn to PyTorch, were built around numerical features and gradient descent. They are not designed for reasoning in natural language over qualitative inputs. Think-Reason-Learn is.

The framework generalizes four core architectures, each published as an independent paper. Together they span rule-based, tree-based, Bayesian, and in-context-learning approaches to LLM-native reasoning.

  • GPTree. LLM-powered decision trees. The foundational work for the entire Think-Reason-Learn framework. Decisions are made by walking an LLM-constructed tree whose branches are natural-language questions. Non-provisional patent in progress.
  • Random Rule Forest (RRF). Interpretable ensembles of LLM-generated YES/NO rules combined via voting. 8x lift over random chance, 41% relative F0.5 improvement over zero- and few-shot LLM baselines. ECML PKDD submission, provisional patent.
  • Reasoned Rule Mining (RRM). Bayesian, precision-optimized classification with stage-wise evidence calibration and log-odds fusion under conditional independence. 12.25x the market index in precision at 97.4% accuracy. Commendation Award, ICIM 2026, Oxford.
  • Policy Induction. Memory-augmented in-context learning with editable natural-language policy embedded directly in prompts. No gradient-based training. 20x precision over random chance, 7.1x over top-tier VC firms. IEEE CSCloud’25, New York.

Think of Think-Reason-Learn as scikit-learn for problems that used to be considered unquantifiable: strategic choices, qualitative assessments, expert decisions. We built it for venture capital, but it generalizes to any domain where humans make high-stakes judgment calls with incomplete information: medical diagnosis, hiring, legal strategy, policy analysis, investment research.

Think-Reason-Learn is open source. Every algorithm in the family above (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) ships as an importable class with the same fit / predict pattern as scikit-learn. Read the full overview on the Think-Reason-Learn page.

2. Multi-Agent Systems and AGI Benchmarks

The second research program at Vela treats the LLM as an expert. Complex real-world decisions are not the work of a single reasoner; they are the work of a team of specialists coordinated by a generalist. This is also how human investment decisions are made at a partner meeting. The research in this area designs LLM-based systems that replicate that structure, and builds the benchmarks needed to measure whether those systems actually match human expert performance.

  • Multi-Agent Framework for Startup Evaluation (SSFF). The research origin of V, the master agent that runs Vela OS. SSFF is an end-to-end multi-agent system that replicates the work of a VC analyst: sourcing, research, synthesis, and recommendation. We pioneered this pattern before OpenAI's DeepResearch and similar automated research agent frameworks. Best Poster, NeurIPS 2025 Workshop on Generative AI in Finance.
  • Founder-GPT. Self-play evaluation of founder-idea fit using tree-of-thought prompting and critique-based refinement. Multiple LLM roles debate and converge on an assessment, treating the decision as a multi-agent reasoning problem rather than a single-shot classification.
  • VCBench. The world's first AGI benchmark for venture capital, with 9,000 anonymized founder profiles and 90%+ re-identification risk reduction. State-of-the-art LLMs outperform Y Combinator and tier-1 firms on it. Public at vcbench.com.

3. LLM-Augmented Machine Learning

The third research program at Vela treats the LLM as a component. Classical machine learning is extraordinarily good at optimization, regularization, and ensembling, but it has always been limited by the features you can hand it. LLMs break that limit. They can read unstructured text and produce structured features, logic rules, cluster personas, labels, or even executable code that can then run inside a deterministic pipeline. The result is a hybrid system that is often more precise, more interpretable, and more reproducible than either the LLM or the classical model alone.

  • LLM-AR. A neural-symbolic framework that uses an LLM to generate heuristics and converts them into probabilistic logic rules via ProbLog. Fully interpretable, 59.5% precision (5.9x random) on idea-stage prediction.
  • GPT-HTree. An explainable framework that combines hierarchical clustering with LLM-derived personas for persona-based classification.
  • LLM-Powered Feature Engineering for Rare-Event Prediction. Integration of LLM-generated features with an ensemble of classical models (XGBoost, Random Forest, Linear Regression) for low-base-rate outcome prediction. 11.1x random chance.
  • From Stochastic Answers to Verifiable Reasoning. Reframing LLMs as code generators rather than per-sample evaluators. A single LLM call produces executable, human-readable decision logic that then runs deterministically over structured data, eliminating per-sample LLM queries and removing hallucination risk at inference time.
  • Learning when to ask and when to stop: Cost-aware sequential founder evaluation. Reformulates early-stage founder evaluation as a sequential, cost-aware decision-making problem, where a reinforcement learning agent adaptively decides which founder attributes to query and when to stop, guided by Monte Carlo rollouts and an uncertainty-triggered LLM supervisor. On the VCBench benchmark, it achieves an F0.5 of 37.1% (vs. 34.8% for a full-information neural baseline) while using only 54% of the available information, outperforming all published benchmarks including Policy-Induction (34.0%), GPT-4o (25.7%), and Tier-1 VCs (10.7%). In the process of academic publishing.

Probabilistic Portfolio Construction

Once the individual predictions are in, a different problem begins: how to assemble them into a portfolio. The worst thing that can happen to a venture fund is to hold zero unicorns across its entire portfolio. This is a rare-event risk problem at the portfolio level, mathematically analogous to loan default risk in banking, and almost no one in venture capital treats it quantitatively. Our portfolio research adapts the probabilistic frameworks used for loan portfolios to the outlier-driven dynamics of venture capital.

  • Probabilistic Modeling of Venture Capital Portfolio Outliers. Probabilistic measures for portfolio performance based on individual outlier probability and inter-deal dependence. Minimizes the probability of the zero-unicorn scenario and increases the probability of hitting a target number of outliers at the portfolio level.

Methodological Commitments

Across all four programs, we commit to the same principles:

Reinforcement learning. We train agents that evaluate founders sequentially, under the same constraints and information asymmetries a human investor faces.

Verifiable learning loops. Every model in Vela OS is measured against real-world outcomes as they unfold, not just historical backtests. Models that drift are retrained. Models that stop working are retired.

Precision over recall. Venture capital is a rare-event prediction problem with asymmetric costs. Missing a good founder is expensive. Funding a bad one is much more so. We optimize for F0.5, not F1.

Interpretability by default. If a model cannot explain its prediction in natural language, we do not deploy it.

Beyond the Flagship Portfolio

Beyond the papers above, Vela maintains a deep internal research portfolio covering investor similarity, market mapping, founder segmentation, persona discovery, knowledge graphs, graph neural networks, temporal features, and many other directions. Much of this work informs Vela OS and its subagents without being published externally. When an internal research thread matures into a patent or a peer-reviewed paper, it joins the flagship list.

Collaborate

We publish because capital allocation is too important to hide behind mystique, and because we want other firms and researchers to build on our work. If you are working on LLM-native reasoning, multi-agent systems, AGI benchmarks, neural-symbolic methods, hybrid ML, or quantitative portfolio theory, we want to collaborate.

Drop us a message at engage@vela.partners.

This research is built by a team. See the people who built Vela and read our vision.

Privacy