Using the Stage One parsed files (immediately below), a dataset is created containing summary data for each individual 10-X filing. See here.
Stage One 10-X Parse files
All text filings for 10-Ks, 10-Qs and their variants are distilled into cleaned text files. This process substantially decreases the file sizes by excluding extraneous material such as HTML, ASCII-encoded segments, and tables. The parsed data files are provided in zipped archives by year. The data consists of more than one million files and takes about 75-100 GB of storage (compressed).
EDGAR Server Log
The EDGAR Server Log provides all page requests for the EDGAR web server in daily files. We have compressed the data into a more useful format and also provide a summary file with total counts by day.
GAAP data in Stata format for Table 1 of our 2016 Journal of Accounting Research paper.
Augmented 10-X Header Data
All information contained in the header section of all 10-K/Q (and variants) filed on EDGAR, plus some additional data such as longitude, latitude, and population based on the business address zip code. The sample begins in 1994.