SEC/EDGAR Data
-
10X Summaries - file containing all summary data for all 10-X filings, including header information, sentiment word counts, and file statistics.
-
10X Document Dictionaries - file containing header information and word counts for all 10-X filings (14.4 GB).
-
All SEC/EDGAR Filings Tabulation - an Excel spreadsheet with a tabulation of all SEC/EDGAR filings from 1993-2023.
-
Master Index Data - the SEC/EDGAR master index files used to create the 10X data archives and tabulate all form filings.
-
Cleaned and Raw 10-X Files - all 10-X filings for all years. The cleaned files have the extraneous characters removed which provides for substantial compression. The raw files are those downloaded directly from the SEC/EDGAR website.
-
10-X Header Data - This dataset captures all of the information in the header section of 10-K/Qs (and all variants) filed on EDGAR. The headers appear in the required ".txt" filing, which includes the complete filing, and are demarcated with the tags <SEC-Header> </SEC-Header> or <IMS-Header></IMS-Header>. There is one row of data for each "FILER" field in each filing (about 1.5 million observations).
Note: We use the label "10X" to refer to all 10-K/Q filings. Specifically, this includes the following forms:
f_10K = {10-K', '10-K405', '10KSB', '10-KSB', '10KSB40'}
f_10KA = {'10-K/A', '10-K405/A', '10KSB/A', '10-KSB/A', '10KSB40/A'}
f_10KT = {'10-KT', '10KT405', '10-KT/A', '10KT405/A'}
f_10Q = {'10-Q', '10QSB', '10-QSB'}
f_10QA = {'10-Q/A', '10QSB/A', '10-QSB/A'}
f_10QT = {'10-QT', '10-QT/A'}