Question 1

What is GPT-HTree?

Accepted Answer

GPT-HTree is a persona-aware founder-evaluation framework for quant VC developed by Vela Research and the University of Oxford. It segments founders into personas through hierarchical clustering, fits a localized decision tree within each persona, and uses GPT-4 to generate a human-readable description of each persona from its z-score feature profile. Published on arXiv in January 2025 as arXiv:2501.13743.

Question 2

What is quant VC?

Accepted Answer

Quant VC is the application of quantitative, reproducible, empirically validated methods to venture capital decision-making. It treats venture capital as a rare-event prediction problem and brings the same rigor to it that quantitative finance brings to credit risk or quantitative medicine brings to diagnostic screening. A quant VC firm requires quantitative scoring against honest baselines, reproducible methodology, and interpretability so every prediction can be audited.

Question 3

Where does GPT-HTree fit in Vela's quant VC research program?

Accepted Answer

GPT-HTree sits in the LLM-Augmented ML strand of Vela's quant VC research, alongside LLM-AR (neural-symbolic reasoning), rare-event prediction with LLM feature engineering, and verifiable reasoning (LLMs as code generators). In this strand the LLM is a component inside a classical ML pipeline, not the decision-maker. GPT-HTree specifically uses GPT-4 only for persona description; prediction happens inside hierarchical clustering and per-cluster decision trees.

Question 4

How does GPT-HTree evaluate founders?

Accepted Answer

Four stages. (1) Class-balanced resampling with CTGAN (Conditional Tabular GANs) to generate realistic synthetic successful founders, which is necessary because raw decision trees collapse at the 1.9% real-world base rate. (2) Agglomerative hierarchical clustering on the resampled data to discover eight founder personas. (3) A separate Gini-impurity decision tree trained within each persona; feature importance differs per persona. (4) GPT-4 persona descriptions generated from z-scores of each cluster's most distinctive features.

Question 5

What founder personas does GPT-HTree discover?

Accepted Answer

Eight archetypes on Vela's 8,800-founder dataset, with real-world-baseline success rates: Serial Exit Founders 17.4% (9.2x), IPO Experience Founders 16.3% (8.6x), Venture Capitalists 13.1% (6.9x), Tech Leaders 3.7% (1.9x), Industry Influencers 2.7% (1.4x), Professional Athletes 1.9% (baseline), Career Professionals 1.4% (below baseline), Early Professionals 0.8% (below baseline). The top-to-bottom spread is 22x. Within the VC-experience cluster, CTGAN resampling additionally separated subclusters from 3.9% to 50%.

Question 6

Why does persona segmentation matter for quant VC?

Accepted Answer

A global decision tree fits one rule across the whole founder population and loses persona-specific signal. Serial exit founders succeed for different reasons than VC-experience founders, who succeed for different reasons than big-tech-experience founders. GPT-HTree asks a different question, what decision rule applies to this specific type of founder, and answers it per persona. For quant VC deployment, knowing which persona each sourcing channel yields is directly actionable at the portfolio-construction level.

Question 7

What makes GPT-HTree auditable?

Accepted Answer

Three inspectable artifacts per decision: a persona assignment (which cluster, via centroid distance), a decision path (which split in that cluster's tree classified the founder), and a human-readable LLM-generated description of the persona. A partner can trace the classification, check the tree rule, and read the persona label for intuition-sanity. If any of the three fails inspection, the partner can override.

Question 8

How does GPT-HTree relate to GPTree?

Accepted Answer

GPT-HTree directly extends GPTree (Xiong et al., 2024) from Vela's Think-Reason-Learn family. GPTree uses an LLM to build a single global decision tree. GPT-HTree adds hierarchical clustering on top so each persona gets its own localized decision tree. GPT-HTree also adds GPT-4 persona description as a separate interpretability layer. Both are part of Vela's broader quant VC research program.

Question 9

What are GPT-HTree's limitations?

Accepted Answer

False positives tend to cluster around surface signals (media presence without substance, inflated valuation histories, shallow industry connections). False negatives concentrate on domain experts in emerging sectors and first-time founders whose profiles don't match traditional success patterns. The model inherits LinkedIn/Crunchbase bias and may reinforce conventional success metrics. Market timing is not a feature. Results used GPT-4o only; newer reasoning models have not been evaluated. LLM hallucination during feature engineering remains a data-quality concern.

GPT-HTree: Hierarchical Clustering and LLM Personas for Quant VC

What GPT-HTree contributes to quant VC

What is quant VC, and where does GPT-HTree fit?

How does GPT-HTree evaluate founders?

How accurate is GPT-HTree?

Why cluster-level personas matter for quant VC

What makes GPT-HTree auditable for quant VC decisions

How GPT-HTree fits into Vela's quant VC research program

Limitations

Read the paper