Founder-GPT: Self-Play and Tree-of-Thought for Quant VC Founder Evaluation

Paper
Founder-GPT: Self-Play to Evaluate the Founder-Idea Fit.
Authors
Sichao Xiong (University of Oxford), Yigit Ihlamur (Vela Partners).
Venue
arXiv preprint, December 2023.
Status
Preprint. arXiv:2312.12037. The methodological root of Vela's Multi-Agent Systems research program.
Research program
First-generation quant VC research at Vela. The self-play and tree-of-thought techniques introduced here became load-bearing components of SSFF and V, Vela's master agent.

What Founder-GPT contributes to quant VC

Founder-GPT (Xiong and Ihlamur, December 2023) is the methodological origin of Vela's multi-agent approach to quant VC. It introduced three ideas that now run through Vela's production stack: (1) explicit scoring of founder-idea fit as a separate quantity from founder quality and idea quality, (2) self-play among simulated VC analysts to generate diverse reasoning paths before aggregation, and (3) tree-of-thought prompting with critique-based refinement to prevent the LLM from collapsing to safe middle-of-the-range scores.

These three ideas now appear, in evolved form, across Vela's quant VC research program. SSFF, the multi-agent framework that won Best Poster at the NeurIPS 2025 Workshop on Generative AI for Finance, extends Founder-GPT into a full three-block decision system. The Founder-Idea Fit Score concept initially defined in Founder-GPT became the FIF-Network component inside SSFF.

What is quant VC, and where does Founder-GPT fit?

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. Quant VC treats venture capital as a rare-event prediction problem that can be modeled, measured, and improved with the same rigor that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. Quant VC requires quantitative scoring against honest baselines, reproducible methodology, and interpretability that allows every prediction to be audited.

Founder-GPT sits in the Multi-Agent Systems strand of Vela's quant VC research, alongside SSFF. The Multi-Agent strand treats the LLM as an orchestrator of specialized reasoning roles. This is distinct from Vela's Think-Reason-Learn family (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction), which treats the LLM as a reasoner that induces explicit auditable rules. Founder-GPT's contribution is an early demonstration that multi-agent self-play could deliver the calibration and auditability quant VC needs.

How does Founder-GPT evaluate founder-idea fit?

Founder-GPT takes a founder's LinkedIn profile and a startup description, retrieves the most similar cases from a curated dataset, and runs a structured three-step reasoning process to output a calibrated probability of success.

Step 1: Retrieve similar cases. Founder profiles are embedded using the all-MiniLM-L6-v2 sentence-transformers model (384 dimensions) applied to founder descriptions and prior employment. The paper defines a composite founder similarity formula that combines description cosine similarity, employment cosine similarity, major overlap, highest degree, and top-university flag. The top three most similar successes and top three most similar failures are both retrieved. Sampling from both outcome classes is a deliberate design choice, because quant VC reasoning requires exposure to near-miss failures as well as confirmed successes.

Step 2: Induce success features via self-play. Three simulated VC analysts examine the retrieved cases together, brainstorm pros and cons of each, critique each other's reasoning, and backtrack when they identify flaws. The self-play converges on 4 to 6 bullet-point features characterizing what separates the successful from the unsuccessful cases in that specific neighborhood. The induction is unsupervised. Features are not predefined by the analyst team. They are derived from the data at inference time.

Step 3: Score the founder and the idea against the induced features. Each of the three analysts rates the input founder or idea against each induced feature on a 0 to 1 scale, discusses until consensus, and emits a final score. Nucleus sampling with top-p set to 0.3 is used to maintain reasoning coherence across the self-play transcript.

The final aggregated score combines the founder score, the idea score, and the explicit founder-idea fit score through a formula that zeros out the aggregated prediction whenever the idea score or the fit score is zero. This is an ethics and viability gate: an unethical or unviable idea cannot score high no matter how strong the founder.

What does Founder-GPT demonstrate?

Founder-GPT is a methodological and proof-of-concept contribution rather than a benchmark evaluation. The paper runs its pipeline on four illustrative cases and reports the aggregated scores:

  • Unethical business idea (“selling implausible ideas to deceive customers”): aggregated score 0.00. The idea score was correctly driven to 0 by the ethics gate, zeroing out an otherwise high founder score of 0.85.
  • Low-likelihood real startup (Mercaris, an organic commodities trading platform): aggregated score 0.66, founder 0.71, idea 0.66, fit 0.63.
  • Moderate-likelihood real startup (Noah, a home equity co-investment platform): aggregated score 0.78, founder 0.78, idea 0.68, fit 0.75.
  • High-likelihood hypothetical (a deeply credentialed NLP founder paired with an NLP enterprise idea): aggregated score 0.90, founder 0.855, idea 0.81, fit 0.85.

The dataset underlying the retrieval step contained 2,180 successful companies (greater than $500M valuation through IPO, M&A, or a funding round above $150M) and 3,901 unsuccessful companies (raised between $4M and $10M, founded 2010 to 2016). This was the first iteration of Vela's founder-outcome corpus and the direct ancestor of the 9,892-founder dataset later used by GPTree, Random Rule Forest, and Reasoned Rule Mining.

Large-scale benchmarked evaluation of the Founder-GPT research direction came later. On VCBench, which did not exist at the time Founder-GPT was published, descendant methods developed at Vela reach 19 to 38 percent precision at real-world base rates, a 10x to 20x lift over the 1.9 percent US unicorn rate. Founder-GPT is the methodological seed those numbers grew from.

Why founder-idea fit matters for quant VC

Founder quality and idea quality are necessary but not sufficient conditions for startup success. A deeply credentialed AI researcher paired with a pet-grooming marketplace is a weaker bet than the same founder paired with an AI infrastructure company, even though the founder and the idea might score identically in isolation. Quant VC pipelines that score founder and idea separately and then take a linear combination miss this interaction effect entirely.

Founder-GPT formalized founder-idea fit as a distinct scalar quantity, computed from the compatibility between the founder's actual experience and the specific idea being evaluated. This single conceptual move, making fit a first-class variable, is now standard across Vela's quant VC stack and is the reason SSFF's FIF-Network exists.

What makes Founder-GPT auditable for quant VC decisions

Every Founder-GPT decision is a readable transcript. The three simulated analysts produce named reasoning traces. The induced 4 to 6 founder features are explicit and inspectable. The founder score, idea score, and founder-idea fit score are each visible before aggregation. A partner reviewing a Founder-GPT output can identify exactly which feature drove the recommendation up or down, which retrieved case anchored the reasoning, and where the experts disagreed. This auditability standard carries forward into SSFF and the rest of Vela's quant VC stack.

How Founder-GPT fits into Vela's quant VC research program

Founder-GPT is the methodological root of Vela's Multi-Agent Systems research. Its direct descendants and cross-family relatives include:

  • Downstream within Multi-Agent Systems: SSFF (Wang, Alican, Ihlamur, 2024, NeurIPS 2025 Workshop Best Poster) extends the Founder-GPT reasoning pattern into a full three-block multi-agent decision framework, grounded by a live retrieval pipeline and a classical ML backbone. V, Vela's master agent, is the productionized descendant of this line of work.
  • Adjacent: The Think-Reason-Learn family (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) pursues a complementary thesis: induce auditable symbolic rules rather than orchestrate agent conversations. Both families share the same quant VC auditability commitment.
  • Benchmarking: Descendants of Founder-GPT are evaluated on VCBench, the world's first AGI benchmark for venture capital.

Founder-GPT is the paper that established the pattern. Everything multi-agent at Vela traces back to it.

Limitations

The paper is explicit about what it does not yet establish. The underlying dataset is heavily biased toward US-based companies, so observed success and failure patterns may not generalize internationally. The method was not backtested against a held-out test set at the time of publication, so the four reported case study scores are illustrative rather than statistically validated. Scraped LinkedIn profiles contain quality issues that propagate into the similarity-based retrieval step. The subject mapping used to categorize majors is coarse and manually defined, and some category choices (grouping political science, sociology, law, and consulting together, for example) are debatable. Founder-idea fit features are manually specified rather than learned. These limitations motivated the more rigorous evaluation protocols that followed in SSFF and across the Think-Reason-Learn family.

Read the paper

Founder-GPT: Self-Play to Evaluate the Founder-Idea Fit.
Sichao Xiong, Yigit Ihlamur.
arXiv preprint, December 2023.
arXiv:2312.12037.

Founder-GPT is the seed of Vela's Multi-Agent Systems research. The full family of work it grew into includes SSFF and V, Vela's master agent. For Vela's complementary rule-induction line, see the Think-Reason-Learn family, including GPTree, Random Rule Forest, Reasoned Rule Mining, and Policy Induction. For the benchmark Vela uses to measure quant VC progress, see VCBench.

Authored by members of the Vela team. See the full roster of contributors.

For research collaboration in quant VC, multi-agent systems, and LLM-based founder evaluation, email engage@vela.partners.

Privacy