Q: Why does RRF outperform direct LLM prompting?

RRF tests the hypothesis that LLMs are most valuable as feature generators under extreme class imbalance. Direct LLM prompting (zero- and few-shot) underperforms and produces erratic precision-recall trade-offs. A learned-weight elastic net on the same binary features also underperforms (F0.5 0.080). The gain comes from the combination: LLMs generate diverse interpretable binary features, and a simple unit-weight voting rule aggregates them. Unit weights act as regularization in noisy, low-data regimes, echoing classical results on improper linear models (Dawes 1979).

Q: What makes RRF auditable?

Every RRF prediction decomposes into a short, human-readable list of YES/NO questions and answers. A founder is classified as likely to succeed because they answered YES to at least T of N questions. The modal classifier is 7 to 8 questions with a threshold of 5 to 6. A partner at a quant VC firm can read the questions, see each answer, and agree or disagree on the specific grounds the model used, without any post-hoc explanation layer.

Q: Does RRF generalize beyond venture capital?

Yes. On the TOP Phase I clinical-trial benchmark (Fu et al., 2022), RRF achieves PR-AUC 0.638 and ROC-AUC 0.596, outperforming HINT (0.567), COMPOSE (0.564), DeepEnroll (0.568), and FFNN (0.547). Any domain with rare events, unstructured text, and a need for auditable predictions, grant review, legal triage, hiring, medical screening, is a candidate.

Q: How does RRF fit into Vela's Think-Reason-Learn program?

RRF is one of four core architectures. GPTree introduced LLM-powered decision trees (hierarchical). RRF adds flat ensembles with voting. Reasoned Rule Mining adds Bayesian calibration and log-odds fusion. Policy Induction moves reasoning into editable natural-language policies. All four are implemented as modules inside Think-Reason-Learn, Vela's open-source framework.

Q: What are RRF's limitations?

Four limitations. (1) Question generation is stochastic: different prompts, temperatures, or seeds can produce different question pools. (2) The final predictor is interpretable but the LLM question-generation step is not. (3) Learned signals may shift across time, geography, or data pipelines and require re-validation. (4) RRF is evaluated as a screening task and does not model downstream constraints like deal access, allocation limits, or portfolio-level trade-offs.

Question 1

What is Random Rule Forest (RRF)?

Accepted Answer

Random Rule Forest is a prediction architecture for quant VC developed by Vela Research with the University of Oxford and Oliver Wyman. An LLM generates a large pool of natural-language YES/NO questions from labeled founder profiles; a unit-weight voting rule over the top-ranked, non-redundant questions classifies new founders. RRF operationalizes the claim that LLMs are most useful as feature generators, not end-to-end judges, under extreme class imbalance.

Question 2

How does RRF predict startup success?

Accepted Answer

RRF is a seven-stage pipeline. An LLM generates YES/NO questions from labeled founders. Each question is evaluated on every founder to produce a binary response matrix. Predictively redundant questions are filtered using Hamming distance. Remaining questions are ranked by F0.5 score, and the top N are combined via a threshold rule: a founder is classified as likely to succeed if they answer YES to at least T of N questions. N and T are tuned via nested cross-validation. The modal configuration is 7 to 8 questions with a threshold of 5 to 6.

Question 3

How accurate is RRF on founder-success prediction?

Accepted Answer

On 9,892 US founders at a 1.9% unicorn base rate, RRF (LLM-only) reaches 13.1% precision, a 6.9x lift over random. With expert-informed questions added, precision rises to 15.3% (8x lift) and 17.6% at peak across folds. On F0.5, RRF scores 0.118 versus 0.088 for the best LLM prompting baseline, a 34% relative improvement statistically significant at p < 0.001. Current Vela production models built on RRF and its Think-Reason-Learn siblings reach 19% to 38% precision on the same scaled basis.

Question 4

Why does RRF outperform direct LLM prompting?

Accepted Answer

RRF tests the hypothesis that LLMs are most valuable as feature generators under extreme class imbalance. Direct LLM prompting (zero- and few-shot) underperforms and produces erratic precision-recall trade-offs. A learned-weight elastic net on the same binary features also underperforms (F0.5 0.080). The gain comes from the combination: LLMs generate diverse interpretable binary features, and a simple unit-weight voting rule aggregates them. Unit weights act as regularization in noisy, low-data regimes, echoing classical results on improper linear models (Dawes 1979).

Question 5

What makes RRF auditable?

Accepted Answer

Every RRF prediction decomposes into a short, human-readable list of YES/NO questions and answers. A founder is classified as likely to succeed because they answered YES to at least T of N questions. The modal classifier is 7 to 8 questions with a threshold of 5 to 6. A partner at a quant VC firm can read the questions, see each answer, and agree or disagree on the specific grounds the model used, without any post-hoc explanation layer.

Question 6

Does RRF generalize beyond venture capital?

Accepted Answer

Yes. On the TOP Phase I clinical-trial benchmark (Fu et al., 2022), RRF achieves PR-AUC 0.638 and ROC-AUC 0.596, outperforming HINT (0.567), COMPOSE (0.564), DeepEnroll (0.568), and FFNN (0.547). Any domain with rare events, unstructured text, and a need for auditable predictions, grant review, legal triage, hiring, medical screening, is a candidate.

Question 7

How does RRF fit into Vela's Think-Reason-Learn program?

Accepted Answer

RRF is one of four core architectures. GPTree introduced LLM-powered decision trees (hierarchical). RRF adds flat ensembles with voting. Reasoned Rule Mining adds Bayesian calibration and log-odds fusion. Policy Induction moves reasoning into editable natural-language policies. All four are implemented as modules inside Think-Reason-Learn, Vela's open-source framework.

Question 8

What are RRF's limitations?

Accepted Answer

Four limitations. (1) Question generation is stochastic: different prompts, temperatures, or seeds can produce different question pools. (2) The final predictor is interpretable but the LLM question-generation step is not. (3) Learned signals may shift across time, geography, or data pipelines and require re-validation. (4) RRF is evaluated as a screening task and does not model downstream constraints like deal access, allocation limits, or portfolio-level trade-offs.

Random Rule Forest: interpretable ensembles

What Random Rule Forest contributes to quant VC

What is quant VC, and how does RRF exemplify it?

How does RRF predict startup success?

How accurate is RRF in quant VC terms?

Why RRF works as a quant VC architecture

What makes RRF auditable for quant VC decision-making

Does RRF generalize beyond quant VC?

How RRF fits into Vela's quant VC research program

Limitations

Read the paper