Purpose:
This starter dossier ingests publicly released materials (House Oversight release, DOJ declassifications, estate batch releases, and mainstream media coverage) to produce a public-interest, survivor-protective dataset for analysis. The goal is to extract names/organizations/dates/context from publicly available documents and
produce a timeline and co-mention graph for investigative review.

Scope & Seed Corpus:

  • House Oversight public release (~33,295 pages; Sept 2025)
  • DOJ declassification packets (DocumentCloud collections; Feb 27, 2025)
  • Estate batch releases (including album pages; recent estate disclosures)
  • Media coverage and reporting (Reuters, CBS, Al Jazeera)
  • Legislative & agency notices (Treasury/FinCEN SARs agreement; Sen. Wyden bill)
    Guardrails & Ethics:
  • Do NOT publish or surface unredacted survivor PII or explicit CSAM material.
  • Focus on public figures, institutions, and document-level citations.
  • Redact or omit sensitive personal identifiers before sharing.
  • Use publicly released documents only; do not attempt to access sealed records.
    Immediate Deliverables:
  1. CSV: seed entities and doc references (epstein_starter_entities.csv)
  2. One-page brief: this document (epstein_starter_brief.pdf)
  3. Next steps: run NER across full corpus, build timeline, co-mention graph, ingest SARs when available.
Share this post