SEC/EDGAR Data

  1. 10X Summaries - file containing all summary data for all 10-X filings, including header information, sentiment word counts, and file statistics.

  2. 10X Document Dictionaries - file containing header information and word counts for all 10-X filings (14.4 GB).

  3. All SEC/EDGAR Filings Tabulation - an Excel spreadsheet with a tabulation of all SEC/EDGAR filings from 1993-2023.

  4. Master Index Data - the SEC/EDGAR master index files used to create the 10X data archives and tabulate all form filings.

  5. Cleaned and Raw 10-X Files - all 10-X filings for all years. The cleaned files have the extraneous characters removed which provides for substantial compression. The raw files are those downloaded directly from the SEC/EDGAR website.

  6. 10-X Header Data - This dataset captures all of the information in the header section of 10-K/Qs (and all variants) filed on EDGAR. The headers appear in the required ".txt" filing, which includes the complete filing, and are demarcated with the tags <SEC-Header> </SEC-Header> or <IMS-Header></IMS-Header>.  There is one row of data for each "FILER" field in each filing (about 1.5 million observations).
     

Note: We use the label "10X" to refer to all 10-K/Q filings. Specifically, this includes the following forms: 

f_10K = {10-K', '10-K405', '10KSB', '10-KSB', '10KSB40'}

f_10KA = {'10-K/A', '10-K405/A', '10KSB/A', '10-KSB/A', '10KSB40/A'}

f_10KT = {'10-KT', '10KT405', '10-KT/A', '10KT405/A'}

f_10Q = {'10-Q', '10QSB', '10-QSB'}

f_10QA = {'10-Q/A', '10QSB/A', '10-QSB/A'}

f_10QT = {'10-QT', '10-QT/A'}