SSFF: The Multi-Agent Architecture Behind Vela's Quant VC Master Agent
What SSFF contributes to quant VC
SSFF (Startup Success Forecasting Framework) is the first multi-agent architecture designed for early-stage venture evaluation. Published in May 2024 and honored with the Best Poster Award at the NeurIPS 2025 Workshop on Generative AI for Finance, SSFF is the research foundation of V, Vela's master agent for quant VC deal sourcing and founder evaluation. Vela shipped this orchestrated multi-agent pattern before systems like OpenAI's DeepResearch popularized it.
The paper exposes a specific failure mode that matters for any quant VC pipeline: when applied directly to startup evaluation, vanilla large language models systematically over-predict success. On a realistic test set with a 10 percent success rate, GPT-4o-mini reached only 10.1 percent precision while classifying nearly every startup as a winner. SSFF corrects this bias through structured multi-agent reasoning, reaching 67.8 percent weighted F1 against 18.3 for the vanilla baseline, a nearly 6x improvement.
What is quant VC, and where does SSFF fit?
Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. Quant VC treats venture capital as a rare-event prediction problem that can be modeled, measured, and improved with the same rigor that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. Quant VC requires quantitative scoring against honest baselines, reproducible methodology, and interpretability that allows every prediction to be audited.
SSFF occupies a distinct position in Vela's quant VC research program. While the Think-Reason-Learn family (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) treats the LLM as a reasoner that induces auditable decision rules, SSFF treats the LLM as an orchestrator of specialized agents augmented by classical ML. It is the architectural blueprint that Vela uses to compose multi-agent quant VC workflows end to end, from sourcing through dossier generation to committee-ready investment memos.
How does SSFF evaluate a startup?
SSFF decomposes evaluation into three cooperating blocks. Each block is modular and independently auditable, which is a non-negotiable design constraint for any production quant VC system at Vela.
Analyst Block. A VC Scout agent categorizes each startup along 18 dimensions (growth rate, market size, regulatory posture, patents, timing, and more). The work is then routed to three domain specialists: a Market Analyst (competitive dynamics, timing, product-market fit), a Product Analyst (innovation, scalability, user traction), and a Founder Analyst (leadership, track record, vision alignment). A Chief Analyst synthesizes the sub-reports into a unified assessment, mirroring how a venture partnership actually operates.
Prediction Block. Two quantitative models provide statistical ballast. The LLM-Enhanced Random Forest (LLM-RF) uses GPT-4o to extract 14 categorical features (industry growth, product-market fit, investor backing, timing, and others) and then trains a Random Forest on those features. LLM-RF reaches 77 percent accuracy on a balanced dataset and 80 percent accuracy at a 1:4 imbalance ratio with an F1 score of 54.6. The Founder-Idea Fit (FIF) Network embeds founder and startup descriptions with text-embedding-3-large and learns a nonlinear mapping from embedding cosine similarity to founder-idea fit, reducing mean squared error from 0.718 (linear baseline) to 0.041.
External Knowledge Block. A retrieval-augmented generation pipeline issues targeted queries to SerpAPI, synthesizes live market intelligence from news, reports, and competitive data, and injects the synthesized reports into the Analyst Block as grounded context. This prevents quant VC scoring from drifting into parametric hallucination.
A final Integration Agent combines the three blocks through a quantitative decision layer that explicitly weighs founder segmentation level, founder-idea fit, and the Random Forest prediction before issuing an Invest or Hold recommendation with a calibrated probability.
How accurate is SSFF?
SSFF was evaluated on a stratified test set of 1,000 unseen startups with a realistic 10 percent success rate, matching the observed prevalence of successful outcomes in early-stage venture. Success was defined as raising $500M+, being acquired for $500M+, or reaching a $500M+ IPO valuation, consistent with all Vela founder-outcome datasets.
Performance against baselines on the 10 percent prevalence test set:
- Vanilla GPT-4o-mini: 10.8 percent accuracy, 10.1 percent precision, 18.3 weighted F1.
- Chain-of-Thought prompting: 11.8 percent accuracy, 10.0 percent precision, 18.5 weighted F1.
- R.A.I.S.E. (2024): 11.9 percent accuracy, 10.1 percent precision, 18.4 weighted F1.
- FounderGPT (2023): 90.0 percent accuracy but 0.0 percent recall (the model collapses to predicting universal failure).
- SSFF-REG Basic: 28.2 percent accuracy, 34.3 weighted F1.
- SSFF-REG Pro: 59.7 percent accuracy, 67.8 weighted F1.
Founder segmentation, one of the Pro-only signals, is predictive enough to structure early quant VC screens on its own. L1 founders succeeded 24.2 percent of the time in the training data, L5 founders 92.1 percent of the time, a 3.79x lift. The 67.8 weighted F1 of SSFF-REG Pro is roughly 3.7x the 18.3 of the vanilla GPT baseline and close to 6x when measured on the per-success prediction dimension. Vela's production models (the Think-Reason-Learn family running on real-world base rates) reach 19 to 38 percent precision, a 10x to 20x lift over the 1.9 percent US unicorn base rate. SSFF is the orchestration architecture that makes that range reachable in deployment.
Why multi-agent reasoning matters for quant VC
The headline finding in the paper is that vanilla LLMs fail open. On the 10 percent prevalence test set, GPT-4o-mini achieves 100 percent recall and 10.1 percent precision, meaning it recommends nearly every startup. FounderGPT fails closed, rejecting nearly every startup. Neither is useful for quant VC, because quant VC requires calibrated rankings across thousands of candidates, not binary enthusiasm or binary skepticism.
Multi-agent reasoning corrects both failure modes through specialization and quantitative grounding. When the Market Analyst, Product Analyst, and Founder Analyst each produce independent sub-scores, and when classical ML components add statistical discipline, the combined system recovers calibrated precision. Removing the quantitative signals (the Basic variant) drops accuracy from 59.7 to 28.2 percent, confirming that pure agentic reasoning without ML grounding is insufficient.
What makes SSFF auditable for quant VC decisions
Every SSFF decision decomposes into inspectable intermediate artifacts: 14 LLM-extracted categorical features, three agent sub-reports with sub-scores, a founder segmentation level (L1 through L5), a founder-idea fit score in the [-1, 1] range, and a Random Forest class prediction. A non-technical partner can read the sub-reports, interrogate the segmentation, challenge the founder-idea fit score, and override the final Invest or Hold recommendation without re-running any model. This matches the auditability standard Vela applies to all quant VC systems. A prediction a partner cannot audit is not a Vela-grade prediction.
How SSFF fits into Vela's quant VC research program
SSFF is the multi-agent orchestration backbone for Vela's quant VC stack and the direct ancestor of V, Vela's master agent. It connects to the rest of the research program along three axes:
- Upstream: SSFF extends Founder-GPT (Xiong and Ihlamur, 2023), which introduced self-play with tree-of-thought for founder-idea fit evaluation. Founder-GPT provides the founder-fit signal. SSFF wraps it in a full multi-agent decision system.
- Adjacent: SSFF benchmarks against R.A.I.S.E. (Preuveneers et al., 2025), a memory-augmented reasoning baseline, and outperforms it substantially under realistic class imbalance.
- Downstream: SSFF is the agent substrate on top of which the Think-Reason-Learn family plugs in as reasoning modules. The Prediction Block can accept any TRL-family model (GPTree, Random Rule Forest, Reasoned Rule Mining, Policy Induction) as a drop-in replacement for the LLM-enhanced Random Forest.
The Vela production stack reaches 19 to 38 percent precision on real-world base rates by combining the SSFF orchestration pattern with TRL-family reasoning models and real-time market retrieval. SSFF is the connective tissue. Progress on any single module propagates through the whole quant VC pipeline.
Limitations
The paper is explicit about what SSFF does not yet do. Company names appear in prompts during market research synthesis, which may allow LLM recall of memorized outcomes for high-profile startups (estimated to affect fewer than 5 percent of the test set). Temporal regime awareness is not modeled: the same startup may warrant different treatment in a 2021 zero-rate bull market than in a 2023 correction. Prospective validation against live venture deal flow has not been completed, so all reported metrics are retrospective on historical outcomes. Fairness auditing across founder demographics and geographic regions has not been conducted and is a prerequisite for responsible deployment at scale.
Read the paper
Startup Success Forecasting Framework: A Multi-Agent Framework for Startup Success Prediction.
Xisen Wang, Fuat Alican, Yigit Ihlamur.
NeurIPS 2025 Workshop on Generative AI for Finance. Best Poster Award.
arXiv:2405.19456.
SSFF is the multi-agent origin of V, Vela's master agent, and the architectural backbone of the Think-Reason-Learn research program. For the benchmark Vela uses to measure quant VC progress, see VCBench. For the reasoning modules that plug into SSFF, see GPTree, Random Rule Forest, Reasoned Rule Mining, and Policy Induction.
Authored by members of the Vela team. See the full roster of contributors.
For research collaboration in quant VC, multi-agent systems, and LLM-augmented ML for venture capital, email engage@vela.partners.