1. Stage One 10-X Parse files

    All text filings for 10-Ks, 10-Qs and their variants are distilled into cleaned text files. This process substantially decreases the file sizes by excluding extraneous material such as HTML, ASCII-encoded segments, and tables. The parsed data files are provided in zipped archives by year.  The data consists of more than one million files and takes about 75-100 GB of storage (compressed).
  2. EDGAR Server Log

    The EDGAR Server Log provides all page requests for the EDGAR web server in daily files.  We have compressed the data into a more useful format and also provide a summary file with total counts by day.
  3. JAR nonGAAP.dta

    GAAP data in Stata format for Table 1 of our 2016 Journal of Accounting Research paper.