Simulatte Credibility — Research Program

Results measured
against reality.

12 studies. 11 countries. All DA scores measured against published Pew Research Center ground truth.

US Population Accuracy

95.3%

Pew American Trends Panel ground truth.

Calibrated 81.9% holdout

Europe — 9 Countries

93.33%

Mean calibrated DA across 9 European populations. All 9 above 91%.

Mean calibrated 68.57% holdout mean

India v2 — Peak

97.61%

Program peak. First company to replicate India's political landscape at population scale.

95.87% holdout Holdout also above 91%

01 — United States

95.3% calibrated.
81.9% holdout.

10 calibrated questions measured against Pew American Trends Panel ground truth. The holdout score — 81.9% — earned on 5 questions the system never saw during calibration, with zero topic anchors.

95.3%

Calibrated DA

Questions tested

81.9%

Holdout DA

±0.00pp

Variance

Ground truth: Pew American Trends Panel, Waves 119–130, 2022–2023. 40 personas · WorldviewAnchor architecture. Holdout questions pre-designated before calibration — zero topic anchors applied.

Read technical report → Audit data ↗

Calibrated accuracy — sprint convergence

Above 91% threshold

+4.3pp

Holdout (unseen)

81.9%

02 — Europe Benchmark v2

9 nations. All above
91%.

The first cross-national study where every country independently exceeded 91% DA against Pew ground truth — not as a mean, but each nation on its own. Rebuilt using Simulatte Persona Generator cohorts across all 9 nations.

Mean calibrated DA 93.33%

Mean holdout DA 68.57%

Countries above 91% 9 of 9

Peak country Italy — 95.48%

Variance ±0.00pp all 9 countries

Ground truth Pew Global Attitudes, Spring 2024

The calibration-to-holdout gap varies substantially across countries — Netherlands (81.47% holdout) and Poland (79.31%) show strong generalisation, while Hungary (55.92%) and Spain (61.07%) reveal where worldview transfer still has work to do. Every country is ±0.00pp across 3 replications.

40 personas per country · Simulatte Persona Generator (v2 rebuild) · 15 questions per country (10 shared cross-national + 5 country-specific) · Sprint EUR-1 · ±0.00pp variance.

Italy

95.48%

+4.48pp above 91%

Holdout: 63.10%

Poland

94.55%

+3.55pp above 91%

Holdout: 79.31%

Netherlands

94.41%

+3.41pp above 91%

Holdout: 81.47%

United Kingdom

94.00%

+3.00pp above 91%

Holdout: 63.03%

Greece

93.93%

+2.93pp above 91%

Holdout: 69.53%

Sweden

93.37%

+2.37pp above 91%

Holdout: 69.78%

Hungary

91.47%

+0.47pp above 91%

Holdout: 55.92%

Spain

91.45%

+0.45pp above 91%

Holdout: 61.07%

France

91.33%

+0.33pp above 91%

Holdout: 73.96%

Read technical report → Audit data ↗

03 — India v2 — Program Peak

97.61% calibrated.
95.87% holdout.

The first study in the program where holdout DA — earned on questions never seen during calibration, with zero topic anchors — also exceeds 91% DA. The calibration-to-holdout gap is 1.74pp, down from 13.4pp in the US study.

Calibrated 97.61% +11.61pp above 91%

Holdout 95.87% Holdout also exceeds 91%

The first study in the program where holdout DA (95.87%) also exceeds 91% — not just calibrated performance. The LLM generalises from worldview alone, without topic-specific anchors.

Calibration → Holdout gap

1.74pp

Down from 13.4pp (USA) — smallest in the program

Personas

india_general · DEEP tier · Persona Generator

Every study. Every number.

12 completed studies. All scores measured against published Pew Research Center ground truth. All holdout questions pre-designated before calibration — zero topic anchors applied.

Study	Calibrated DA	Holdout DA
PEW USA v2	95.3% ±0.00pp	81.9% ±0.87pp
PEW India v2 ★ Peak	97.61% ±0.00pp	95.87% ±0.00pp
Europe — Italy	95.48% ±0.00pp	63.10% ±0.00pp
Europe — Poland	94.55% ±0.00pp	79.31% ±0.00pp
Europe — Netherlands	94.41% ±0.00pp	81.47% ±0.00pp
Europe — UK	94.00% ±0.00pp	63.03% ±0.00pp
Europe — Greece	93.93% ±0.00pp	69.53% ±0.00pp
Europe — Sweden	93.37% ±0.00pp	69.78% ±0.00pp
Europe — Hungary	91.47% ±0.00pp	55.92% ±0.00pp
Europe — Spain	91.45% ±0.00pp	61.07% ±0.00pp
Europe — France	91.33% ±0.00pp	73.96% ±0.00pp
PEW Germany (1C)	91.3%	76.5%

★ India v2 is the only study where holdout DA (95.87%) also exceeds 91%. DA = 1 − TVD = 1 − Σ|realᵢ − simᵢ| / 2.

Methodology

How we measure accuracy.

Distribution Accuracy (DA) measures how closely Simulatte's synthetic population mirrors real survey distributions. Every study follows the same protocol: calibrate on published data, then test on holdout questions the system has never seen.

Distribution Accuracy

DA = 1 − TVD. Total Variation Distance measures the maximum divergence between synthetic and real response distributions. A DA of 95% means the synthetic population differs from ground truth by only 5pp.

Benchmark Reference

91% DA is the natural self-inconsistency floor implied by survey test-retest literature — the point at which a simulation is matching the Pew sample within the noise floor of the data itself.

Holdout Protocol

Questions are split before calibration. Holdout questions receive zero topic anchors. Holdout DA measures pure worldview transfer — generalisation to unseen topics.

WorldviewAnchor

Each persona carries a structured worldview — values, priorities, ideological lean — derived from real typology data. The LLM conditions on worldview, not demographic stereotypes.

Replication Variance

Every study runs 3 times. ±0.00pp variance across all 12 studies means identical distributions regardless of LLM sampling randomness.

Open Audit

All study configurations, sprint runners, question sets, persona manifests, and raw outputs are published on GitHub. Every number is independently reproducible.

Read full methodology → Audit data ↗

Results measured against reality.

95.3% calibrated.81.9% holdout.

9 nations. All above91%.