Processing Capstone Email Using Predictive Coding

Introduction

The Illinois State Archives, in partnership with the University of Illinois and with three-year funding offered by the National Historical Publications and Records Commission (NHPRC) is launching a project called: Processing Capstone Email Using Predictive Coding (a.k.a. the Capstone Email Project). The project seeks to develop and demonstrate a reliable and sustainable method of identifying and providing appropriate access to the email messages of state agencies that have enduring value.

Following the lead of the National Archives and Records Administration, we will start with using a Capstonei approach to identifying email messages having enduring value. This means the project will identify and secure email messages of senior administrative officers from state agencies according to the priorities of the Director of the State Archives. Once the email is secured, the project will work with experts in the areas of text analytics and electronic discovery to explore tools that use technology-assisted review techniques (predictive coding in particular) for the purposes of parsing and classifying the email.

We envision the tools will assist in identifying and prioritizing review of sensitive content, in generating descriptive metadata, aggregating email threads, identifying near-duplicates, and providing for some level of automatic appraisal and redaction. Once the selected tools have been identified and configured, we will conduct batch processing of email so it may be ingested into a digital repository. From there, the email will be made available for public access through in-person access to an offline computer terminal.

Plan of Work

✔ Phase 1 – Kick-Off and Initial Explorations
✔ Phase 2 – De-duplication and Assessment
✔ Phase 3 – Auto-categorization Tools Assessment
✔ Phase 4 – Restrictions and Redaction Tool Assessment
✔ Phase 5 – Enhancement Tools Assessment
✔ Phase 6 – Batch Email Processing
✔ Phase 7 – Search and Access Tools Evaluation
 Phase 8 – Rollout Process

Performance Objectives

1. Establish proven workflows for the processing of Capstone email.
2. Process at least 20 GB of email including at least one senior state official.
3. Demonstrate processing efficiency exceeding manual human review.
4. Provide public access to Capstone email.

CoSA NHPRC Email Symposium 2017 - Brent West presenting
CoSA NHPRC Email Symposium 2017 - Brent West presenting

2017 SAA Team Presentation
2017 SAA Team Presentation

iPRES Conference - September 2018
iPRES Conference - September 2018

iPRES Conference – September 2018
iPRES Conference – September 2018

iPRES Conference – September 2019
iPRES Conference – September 2019

Brent West (standing) and Josh Hackel (seated) of the University of Illinois explain to Illinois State Archives staff

Brent West (standing) and Josh Hackel (seated) of the University of Illinois explain to Illinois State Archives staff how to access state agency emails using specially developed software. The State Archives and the University of Illinois have been collaborating on a three year project to develop a reliable and sustainable method to provide access to Email records that have enduring value. Funding for the project, "Processing Capstone Email Using Predictive Coding," was made available from the National Historical Publications and Records Commission.

Research Assistant Tara Trentelange tests out the public access computer

Research Assistant Tara Trentalange tests out the public access computer

Team Members

  • 1. Project Director – David Joens
    E-Records Archivist and Director | Illinois State Archives
    (217) 782-3492, djoens@ilsos.gov
  • 2. Co-Principal Investigator – Joanne Kaczmarek
    Associate Professor and Archivist for Electronic Records | University of Illinois
    (217) 333-6834, jkaczmar@illinois.edu
  • 3. Co-Principal Investigator – Brent West
    Asst. Director for Records and Information Management Services | University of Illinois
    (217) 265-9190, bmwest@uillinois.edu
  • 4. Project Manager – Amanda Hartman
    Records Archivist | Illinois State Archives
    (217) 524-7528, ahartman@ilsos.gov
  • 5. Text Analytics Expert (October 2016 - May 2017)  – Dan Roth
    Professor of Computer Science | University of Illinois
    (217) 244-7068, danr@illinois.edu
  • 6. IT Infrastructure Expert (October 2016 - February 2019)– Tom Habing
    Software Development Manager | University of Illinois
    (217) 244-4425, thabing@illinois.edu
  • 7. Archival Email Expert (January 2017 - October 2019) – Chris Prom
    Assistant Archivist | University of Illinois
    prom@illinois.edu
  • 8. Archival Advisor (June 2017 - October 2019) – William Maher
    Director of Archives | University of Illinois
    w-maher@illinois.edu
  • 9. Research Assistant (October 2016 - May 2017) – Jiayue Niu
    Tools Assessment, Workflow Development, and Email Processing | University of Illinois
    jniu6@illinois.edu
  • 10. Research Assistant (June 2017 - November 2018) – Mei Mei
    Tools Assessment and Workflow Development | University of Illinois
    meim2@illinois.edu
  • 11. Research Assistant (June 2017 - August 2018) – Aarthi Shankar
    Tools Assessment and Workflow Development | University of Illinois
    shankar9@illinois.edu
  • 12. Research Assistant (January 2019 - June 2019) – Tara Trentalange
    Tools Assessment and Workflow Development | University of Illinois
    taralt2@illinois.edu
  • 13. Research Assistant (January 2019 - Present) – Joshua Hackel
    Tools Assessment and Workflow Development | University of Illinois
    jhackel2@illinois.edu