Purpose:
This starter dossier ingests publicly released materials (House Oversight release, DOJ declassifications, estate batch releases, and mainstream media coverage) to produce a public-interest, survivor-protective dataset for analysis. The goal is to extract names/organizations/dates/context from publicly available documents and
produce a timeline and co-mention graph for investigative review.
Scope & Seed Corpus:
- House Oversight public release (~33,295 pages; Sept 2025)
- DOJ declassification packets (DocumentCloud collections; Feb 27, 2025)
- Estate batch releases (including album pages; recent estate disclosures)
- Media coverage and reporting (Reuters, CBS, Al Jazeera)
- Legislative & agency notices (Treasury/FinCEN SARs agreement; Sen. Wyden bill)
Guardrails & Ethics: - Do NOT publish or surface unredacted survivor PII or explicit CSAM material.
- Focus on public figures, institutions, and document-level citations.
- Redact or omit sensitive personal identifiers before sharing.
- Use publicly released documents only; do not attempt to access sealed records.
Immediate Deliverables:
- CSV: seed entities and doc references (epstein_starter_entities.csv)
- One-page brief: this document (epstein_starter_brief.pdf)
- Next steps: run NER across full corpus, build timeline, co-mention graph, ingest SARs when available.
