Chapter 8: Evidence Assessment
From Opinion to Probability: The Evidence Foundation
This study’s credibility rests on one foundation: evidence. Not speculation, not expert hunches, but systematically evaluated evidence from 120 sources. This chapter reveals how we transformed qualitative insights into quantitative probabilities.
The Evidence Collection Protocol
Inclusion Criteria
Every piece of evidence met strict standards:
1. Source Quality
- Peer-reviewed journals (Impact Factor > 3.0)
- Government reports from major economies
- Research institutions with established credibility
- Industry leaders with demonstrated expertise
2. Methodological Rigor
- Quantitative analysis preferred
- Systematic reviews valued
- Controlled experiments weighted heavily
- Large-scale data studies prioritized
3. Temporal Relevance
- Published 2020-2025
- Explicit future projections
- Current data (not just historical)
- Trend analysis included
4. Direct Relevance
- Clear connection to hypotheses
- Specific rather than general
- Measurable implications
- Falsifiable claims
Quality Scoring Framework
Each evidence piece received scores across four dimensions:
Authority (40% weight)
- Source credibility
- Author expertise
- Institutional backing
- Track record
Methodology (30% weight)
- Research design quality
- Sample size/scope
- Statistical rigor
- Reproducibility
Recency (20% weight)
- Publication date
- Data currency
- Trend stability
- Update frequency
Replication (10% weight)
- Citation count
- Independent confirmation
- Consensus alignment
- Contradictory evidence
The 120 Evidence Pieces
Distribution by Hypothesis:
- H1 (AI Progress): 20 pieces
- H2 (AGI Achievement): 17 pieces
- H3 (Employment): 19 pieces
- H4 (Safety): 20 pieces
- H5 (Development Model): 20 pieces
- H6 (Governance): 24 pieces
Distribution by Source Type:
- Academic papers: 45 (37.5%)
- Industry reports: 28 (23.3%)
- Government studies: 22 (18.3%)
- Think tank analysis: 15 (12.5%)
- Technical benchmarks: 10 (8.3%)
Bayesian Synthesis Method
The Bayesian Framework
We use Bayesian inference to combine evidence:
P(H|E) = [P(E|H) × P(H)] / P(E)
Where:
- P(H|E) = Posterior probability given evidence
- P(E|H) = Likelihood of evidence if hypothesis true
- P(H) = Prior probability (started at 0.5)
- P(E) = Marginal probability of evidence
Evidence Integration Process
Step 1: Initialize Priors
- All hypotheses start at 50% (maximum uncertainty)
- No assumption about outcomes
- Equal weight to A and B options
Step 2: Sequential Update
for evidence in evidence_list:
quality_score = calculate_quality(evidence)
relevance = assess_relevance(evidence)
# Convert to log-odds for numerical stability
log_odds = log(prior_odds)
# Update based on evidence strength
if evidence.supports_A:
log_odds += (quality_score - 0.5) * relevance
else:
log_odds -= (quality_score - 0.5) * relevance
# Convert back to probability
posterior = exp(log_odds) / (1 + exp(log_odds))
Step 3: Uncertainty Quantification
- Bootstrap resampling (1,000 iterations)
- Generate confidence intervals
- Account for evidence quality variance
- Propagate through causal network
Evidence Highlights by Hypothesis
H1: AI Progress Trajectory
Strong Evidence for Acceleration (A):
- GPT-3 to GPT-4: 10x improvement in 2 years
- Investment growing 50% annually
- Compute availability doubling every 6 months
- No fundamental barriers identified
Quality Score: 0.774 average for A evidence
Weak Evidence for Barriers (B):
- Scaling may plateau (theoretical)
- Energy constraints possible
- Data limitations suggested
Quality Score: 0.650 average for B evidence
Result: 91.1% probability of continued acceleration
H2: AGI Achievement
Mixed Evidence - Genuine Uncertainty
For AGI (A):
- Emergent abilities in large models
- Transfer learning improving
- Reasoning capabilities expanding
- Quality: 0.765
Against AGI (B):
- Current systems still brittle
- True understanding absent
- Combinatorial explosion remains
- Quality: 0.753
Result: 44.3% probability - a true toss-up
H3: Employment Impact
Strong Evidence for Displacement (B):
- McKinsey: 400M jobs at risk by 2030
- Oxford study: 47% of jobs automatable
- MIT: Replacement exceeding creation
- Quality: 0.792
Weaker Evidence for Complementarity (A):
- Historical precedents of adaptation
- New job categories emerging
- Augmentation tools growing
- Quality: 0.737
Result: 74.9% probability of net displacement
H4: Safety and Control
Moderate Evidence for Safety (A):
- Alignment research progressing
- Safety culture strengthening
- Regulatory frameworks emerging
- Quality: 0.787
Significant Evidence for Risks (B):
- Control problem unsolved
- Misalignment examples accumulating
- Dual-use concerns growing
- Quality: 0.760
Result: 59.7% probability of safe development (slight lean)
H5: Development Paradigm
Strong Evidence for Centralization (B):
- Compute costs escalating exponentially
- Network effects dominant
- Data moats expanding
- Winner-take-all dynamics clear
- Quality: 0.787
Weaker Evidence for Distribution (A):
- Open source movement
- Academic research continues
- Some startups succeeding
- Quality: 0.693
Result: 77.9% probability of centralization
H6: Governance Evolution
Evidence for Authoritarian Drift (B):
- Surveillance capabilities expanding
- Emergency powers normalizing
- Democratic norms eroding globally
- Tech-state fusion accelerating
- Quality: 0.789
Evidence for Democratic Resilience (A):
- Historical adaptation precedents
- Civil society mobilizing
- Regulatory efforts underway
- Public awareness growing
- Quality: 0.746
Result: 63.9% probability of authoritarian outcomes
Confidence Intervals and Uncertainty
Uncertainty by Hypothesis
Hypothesis | Probability | Uncertainty | 95% CI Width |
---|---|---|---|
H1 | 91.1% | ±5.7% | 22.9% |
H2 | 44.3% | ±16.9% | 65.5% |
H3 | 25.1% | ±9.9% | 37.0% |
H4 | 59.7% | ±13.3% | 49.7% |
H5 | 22.1% | ±12.7% | 46.9% |
H6 | 36.1% | ±13.3% | 48.7% |
What Uncertainty Tells Us
High Certainty (H1):
- Overwhelming evidence consensus
- Trend unmistakable
- Plan accordingly
Maximum Uncertainty (H2):
- Evidence perfectly balanced
- Genuine unknown
- Prepare for both
Medium Uncertainty (Others):
- Direction clear but magnitude uncertain
- Confidence in trends
- Details remain fuzzy
Evidence Quality Patterns
Strongest Evidence Categories
- Technical benchmarks (avg quality: 0.812)
- Large-scale empirical studies (0.798)
- Systematic reviews (0.785)
- Government assessments (0.771)
Weakest Evidence Categories
- Expert opinions (0.652)
- Theoretical arguments (0.668)
- Historical analogies (0.691)
- Single case studies (0.703)
Geographic Bias Assessment
- US sources: 45%
- European: 25%
- Chinese: 15%
- Other: 15%
Implication: Western bias may affect global applicability
Key Evidence-Based Insights
1. Progress Is Nearly Certain
The evidence for continued AI advancement is overwhelming. Planning for slow AI is planning for a future that won’t happen.
2. AGI Remains Unknowable
Despite intense research, AGI timing remains genuinely uncertain. Both outcomes equally supported by evidence.
3. Displacement Dominates
Employment evidence strongly favors displacement over complementarity. The question isn’t if but how much and how fast.
4. Centralization Accelerating
Economic forces driving concentration are powerful and accelerating. Distributed development increasingly unlikely.
5. Democracy Under Pressure
Governance evidence shows authoritarian drift across multiple indicators. Democratic preservation requires active effort.
Validation and Robustness
Cross-Validation Tests
- Leave-one-out analysis: Results stable
- Random subsampling: Core findings persist
- Time-based splits: Trends consistent
Contradiction Analysis
Where evidence conflicts, we:
- Weight by quality scores
- Examine temporal patterns
- Consider source bias
- Maintain uncertainty
Missing Evidence
We acknowledge gaps:
- China’s internal development
- Classified government research
- Proprietary industry data
- Social movement dynamics
The Evidence Message
The evidence tells a clear story:
- Technical progress will continue (very high confidence)
- Economic disruption is coming (high confidence)
- Power will concentrate (high confidence)
- Governance will struggle (moderate confidence)
- Outcomes remain shapeable (but window closing)
This isn’t speculation—it’s what the evidence says. The question isn’t whether these trends exist, but how we respond to them.
Next: Computational Framework →
Previous: Causal Network Model ←