Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 8: Evidence Assessment

From Opinion to Probability: The Evidence Foundation

This study’s credibility rests on one foundation: evidence. Not speculation, not expert hunches, but systematically evaluated evidence from 120 sources. This chapter reveals how we transformed qualitative insights into quantitative probabilities.

The Evidence Collection Protocol

Inclusion Criteria

Every piece of evidence met strict standards:

1. Source Quality

  • Peer-reviewed journals (Impact Factor > 3.0)
  • Government reports from major economies
  • Research institutions with established credibility
  • Industry leaders with demonstrated expertise

2. Methodological Rigor

  • Quantitative analysis preferred
  • Systematic reviews valued
  • Controlled experiments weighted heavily
  • Large-scale data studies prioritized

3. Temporal Relevance

  • Published 2020-2025
  • Explicit future projections
  • Current data (not just historical)
  • Trend analysis included

4. Direct Relevance

  • Clear connection to hypotheses
  • Specific rather than general
  • Measurable implications
  • Falsifiable claims

Quality Scoring Framework

Each evidence piece received scores across four dimensions:

Authority (40% weight)

  • Source credibility
  • Author expertise
  • Institutional backing
  • Track record

Methodology (30% weight)

  • Research design quality
  • Sample size/scope
  • Statistical rigor
  • Reproducibility

Recency (20% weight)

  • Publication date
  • Data currency
  • Trend stability
  • Update frequency

Replication (10% weight)

  • Citation count
  • Independent confirmation
  • Consensus alignment
  • Contradictory evidence

The 120 Evidence Pieces

Distribution by Hypothesis:

  • H1 (AI Progress): 20 pieces
  • H2 (AGI Achievement): 17 pieces
  • H3 (Employment): 19 pieces
  • H4 (Safety): 20 pieces
  • H5 (Development Model): 20 pieces
  • H6 (Governance): 24 pieces

Distribution by Source Type:

  • Academic papers: 45 (37.5%)
  • Industry reports: 28 (23.3%)
  • Government studies: 22 (18.3%)
  • Think tank analysis: 15 (12.5%)
  • Technical benchmarks: 10 (8.3%)

Bayesian Synthesis Method

The Bayesian Framework

We use Bayesian inference to combine evidence:

P(H|E) = [P(E|H) × P(H)] / P(E)

Where:
- P(H|E) = Posterior probability given evidence
- P(E|H) = Likelihood of evidence if hypothesis true
- P(H) = Prior probability (started at 0.5)
- P(E) = Marginal probability of evidence

Evidence Integration Process

Step 1: Initialize Priors

  • All hypotheses start at 50% (maximum uncertainty)
  • No assumption about outcomes
  • Equal weight to A and B options

Step 2: Sequential Update

for evidence in evidence_list:
    quality_score = calculate_quality(evidence)
    relevance = assess_relevance(evidence)
    
    # Convert to log-odds for numerical stability
    log_odds = log(prior_odds)
    
    # Update based on evidence strength
    if evidence.supports_A:
        log_odds += (quality_score - 0.5) * relevance
    else:
        log_odds -= (quality_score - 0.5) * relevance
    
    # Convert back to probability
    posterior = exp(log_odds) / (1 + exp(log_odds))

Step 3: Uncertainty Quantification

  • Bootstrap resampling (1,000 iterations)
  • Generate confidence intervals
  • Account for evidence quality variance
  • Propagate through causal network

Evidence Highlights by Hypothesis

H1: AI Progress Trajectory

Strong Evidence for Acceleration (A):

  • GPT-3 to GPT-4: 10x improvement in 2 years
  • Investment growing 50% annually
  • Compute availability doubling every 6 months
  • No fundamental barriers identified

Quality Score: 0.774 average for A evidence

Weak Evidence for Barriers (B):

  • Scaling may plateau (theoretical)
  • Energy constraints possible
  • Data limitations suggested

Quality Score: 0.650 average for B evidence

Result: 91.1% probability of continued acceleration

H2: AGI Achievement

Mixed Evidence - Genuine Uncertainty

For AGI (A):

  • Emergent abilities in large models
  • Transfer learning improving
  • Reasoning capabilities expanding
  • Quality: 0.765

Against AGI (B):

  • Current systems still brittle
  • True understanding absent
  • Combinatorial explosion remains
  • Quality: 0.753

Result: 44.3% probability - a true toss-up

H3: Employment Impact

Strong Evidence for Displacement (B):

  • McKinsey: 400M jobs at risk by 2030
  • Oxford study: 47% of jobs automatable
  • MIT: Replacement exceeding creation
  • Quality: 0.792

Weaker Evidence for Complementarity (A):

  • Historical precedents of adaptation
  • New job categories emerging
  • Augmentation tools growing
  • Quality: 0.737

Result: 74.9% probability of net displacement

H4: Safety and Control

Moderate Evidence for Safety (A):

  • Alignment research progressing
  • Safety culture strengthening
  • Regulatory frameworks emerging
  • Quality: 0.787

Significant Evidence for Risks (B):

  • Control problem unsolved
  • Misalignment examples accumulating
  • Dual-use concerns growing
  • Quality: 0.760

Result: 59.7% probability of safe development (slight lean)

H5: Development Paradigm

Strong Evidence for Centralization (B):

  • Compute costs escalating exponentially
  • Network effects dominant
  • Data moats expanding
  • Winner-take-all dynamics clear
  • Quality: 0.787

Weaker Evidence for Distribution (A):

  • Open source movement
  • Academic research continues
  • Some startups succeeding
  • Quality: 0.693

Result: 77.9% probability of centralization

H6: Governance Evolution

Evidence for Authoritarian Drift (B):

  • Surveillance capabilities expanding
  • Emergency powers normalizing
  • Democratic norms eroding globally
  • Tech-state fusion accelerating
  • Quality: 0.789

Evidence for Democratic Resilience (A):

  • Historical adaptation precedents
  • Civil society mobilizing
  • Regulatory efforts underway
  • Public awareness growing
  • Quality: 0.746

Result: 63.9% probability of authoritarian outcomes

Confidence Intervals and Uncertainty

Uncertainty by Hypothesis

HypothesisProbabilityUncertainty95% CI Width
H191.1%±5.7%22.9%
H244.3%±16.9%65.5%
H325.1%±9.9%37.0%
H459.7%±13.3%49.7%
H522.1%±12.7%46.9%
H636.1%±13.3%48.7%

What Uncertainty Tells Us

High Certainty (H1):

  • Overwhelming evidence consensus
  • Trend unmistakable
  • Plan accordingly

Maximum Uncertainty (H2):

  • Evidence perfectly balanced
  • Genuine unknown
  • Prepare for both

Medium Uncertainty (Others):

  • Direction clear but magnitude uncertain
  • Confidence in trends
  • Details remain fuzzy

Evidence Quality Patterns

Strongest Evidence Categories

  1. Technical benchmarks (avg quality: 0.812)
  2. Large-scale empirical studies (0.798)
  3. Systematic reviews (0.785)
  4. Government assessments (0.771)

Weakest Evidence Categories

  1. Expert opinions (0.652)
  2. Theoretical arguments (0.668)
  3. Historical analogies (0.691)
  4. Single case studies (0.703)

Geographic Bias Assessment

  • US sources: 45%
  • European: 25%
  • Chinese: 15%
  • Other: 15%

Implication: Western bias may affect global applicability

Key Evidence-Based Insights

1. Progress Is Nearly Certain

The evidence for continued AI advancement is overwhelming. Planning for slow AI is planning for a future that won’t happen.

2. AGI Remains Unknowable

Despite intense research, AGI timing remains genuinely uncertain. Both outcomes equally supported by evidence.

3. Displacement Dominates

Employment evidence strongly favors displacement over complementarity. The question isn’t if but how much and how fast.

4. Centralization Accelerating

Economic forces driving concentration are powerful and accelerating. Distributed development increasingly unlikely.

5. Democracy Under Pressure

Governance evidence shows authoritarian drift across multiple indicators. Democratic preservation requires active effort.

Validation and Robustness

Cross-Validation Tests

  • Leave-one-out analysis: Results stable
  • Random subsampling: Core findings persist
  • Time-based splits: Trends consistent

Contradiction Analysis

Where evidence conflicts, we:

  1. Weight by quality scores
  2. Examine temporal patterns
  3. Consider source bias
  4. Maintain uncertainty

Missing Evidence

We acknowledge gaps:

  • China’s internal development
  • Classified government research
  • Proprietary industry data
  • Social movement dynamics

The Evidence Message

The evidence tells a clear story:

  1. Technical progress will continue (very high confidence)
  2. Economic disruption is coming (high confidence)
  3. Power will concentrate (high confidence)
  4. Governance will struggle (moderate confidence)
  5. Outcomes remain shapeable (but window closing)

This isn’t speculation—it’s what the evidence says. The question isn’t whether these trends exist, but how we respond to them.


Next: Computational Framework →
Previous: Causal Network Model ←