Using the Stage One parsed files (immediately below), a dataset is created containing summary data for each individual 10-X filing. See here.
Stage One 10-X Parse files
All text filings for 10-Ks, 10-Qs, and their variants are distilled into cleaned text files. This process substantially decreases the file sizes by excluding extraneous material such as HTML, ASCII-encoded segments, and tables. The parsed data files are provided in zipped archives by year. The data consists of more than one million files and takes about 75-100 GB of storage (compressed).
EDGAR Server Log
The EDGAR Server Log provides all page requests for the EDGAR web server in daily files. We have compressed the data into a more useful format and also provide a summary file with total counts by day.
GAAP data in Stata format for Table 1 of our 2016 Journal of Accounting Research paper.
Augmented 10-X Header Data
All information contained in the header section of all 10-K/Q (and variants) filed on EDGAR, plus some additional data such as longitude, latitude, and population based on the business address zip code. The sample begins in 1994.
Company Code of Ethics Archive
Collection of Ethics Codes for S&P 500 firms and "small" firms for 2008 and 2019.