Augmented 10-X Header Data

 

Download the data 

Description

This dataset captures all of the information in the header section of 10-K/Qs (and all variants) filed on EDGAR and appends some additional features. The headers appear in the required ".txt" filing, which includes the complete filing, and are demarcated with the tags <SEC-Header> </SEC-Header> or <IMS-Header></IMS-Header>.  The additional features--such as latitude, longitude, and population--are based on the business address zip code as reported in the filing.  Appended data for non-U.S. addresses is missing. Although the filing format is essentially stable over its entire history, a few structural quirks are worth noting:

  • Early periods (pre-1999) included a field labeled SROS, presumably self-regulatory organizations, that indicated exchange listing (e.g., NASD, NYSE). I have not included this field in the dataset.
     
  • The ACCEPTANCE-DATETIME field did not begin appearing until 20020515 (and therefore is missing in prior periods).
     
  • In cases where data are taken directly from the form, missing values are simply blank. In other cases involving derived data, -99 is used for missing values. For the boolean variables, 0 is used for missing.
     
  • Certain companies, primarily regulated utilities, will file the same form for multiple, but related, firms with all of the filers listed as a <FILER> field in each document. Descriptive data for each FILER field is recorded for each filing, which creates records that are unique with respect to the cik of the specific filing, but duplicated across the multiple filings. We account for this phenomenon in the "number-of-filers" and "filer-number" fields described in the Data section below.

Data

The file is in csv format, which is easily imported into Excel, Stata, or any other software. The first record is a header with the variable names. The following list describes each variable as they appear sequentially for a given record. There are more than 1.4 million records. The fields below are sequence number, variable name, variable label, EDGAR field, and source. 
 

1,f_cik,CIK,,Filename
2,f_fdate,File Date,,Filename
3,f_ftype,Form Type,,Filename
4,f_year,File Year,,Filename
5,f_quarter,File Quarter,,Filename
6,f_month,File Month,,Filename
7,f_day,File Day,,Filename
8,f_yymm,File YYMM,,Filename
9,f_yq,File YearQtr,,Filename
10,f_dw,File Day of Week (0=Monday),,Filename
11,accpt_datetime,Acceptance DateTime,<ACCEPTANCE-DATETIME>,EDGAR
12,acc_num,Accession Number,ACCESSION NUMBER,EDGAR
13,conf_subtype,Conformed Submission Type,CONFORMED SUBMISSION TYPE,EDGAR
14,pdoc_cnt,Public Document Count,PUBLIC DOCUMENT COUNT,EDGAR
15,conf_per_rpt,Conformed Period of Report,CONFORMED PERIOD OF REPORT,EDGAR
16,file_date,Filed as of Date,FILED AS OF DATE,EDGAR
17,date_of_chg,Date as of Change,DATE AS OF CHANGE,EDGAR
18,comp_conf_name,Company Conformed Name,COMPANY CONFORMED NAME,Filer
19,cik,Central Index Key,CENTRAL INDEX KEY,Filer
20,sic_label,Standard Industrial Classification (label),STANDARD INDUSTRIAL CLASSIFICATION,Filer
21,sic_num,Standard Industrial Classification (4-digit code),STANDARD INDUSTRIAL CLASSIFICATION,Filer
22,irs_num,IRS Number,IRS NUMBER,Filer
23,state_of_incorp,State of Incorporation,STATE OF INCORPORATION,Filer
24,fye,Fiscal Year End,FISCAL YEAR END,Filer
25,form_type,Form Type,FORM TYPE,Filer
26,sec_act,SEC Act,SEC ACT,Filer
27,sec_file_num,SEC File Number,SEC FILE NUMBER,Filer
28,film_num,Film Number,FILM NUMBER,Filer
29,ba_street1,Business Address: Street 1,STREET 1,Filer
30,ba_street2,Business Address: Street 2,STREET 2,Filer
31,ba_city,Business Address: City,CITY,Filer
32,ba_state,Business Address: State,STATE,Filer
33,ba_zip9,Business Address: Zip+4,ZIP,Filer
34,ba_phone,Business Phone,BUSINESS PHONE,Filer
35,ma_street1,Mailing Address: Street 1,STREET 1,Filer
36,ma_street2,Mailing Address: Street 2,STREET 2,Filer
37,ma_city,Mailing Address: City,CITY,Filer
38,ma_state,Mailing Address: State,STATE,Filer
39,ma_zip9,Mailing Address: Zip+4,ZIP,Filer
40,former_name,Former Conformed Name,FORMER CONFORMED NAME,Filer
41,date_of_name_chg,Date of Name Change,DATE OF NAME CHANGE,Filer
42,number_of_filers,Number of filers associated with this filing,,Derived
43,filer_number,Sequence number for filer (>1 for mult filers),,Derived
44,ba_zip5,Busn Address 5-digit Zip (numeric),,Derived
45,ma_zip5,Busn Address 5-digit Zip (numeric),,Derived
46,ba_latitude,Busn Address Latitude,,Derived (World Geodetic System of 1984)
47,ba_longitude,Busn Address Longitude,,Derived (World Geodetic System of 1984)
48,ba_population,Busn Address Population,,Derived
49,ba_is_state,Busn Address US State (0/1),,Derived
50,ba_contiguous_state,Busn Address Contiguous US State (0/1),,Derived
51,ba_state_fips,Busn Address State FIPS Code,,Derived
52,ba_county_fips,Busn Address County FIPS Code,,Derived
53,ba_msa,Busn Address Metropolitan Statistical Area Code,,Derived
54,ba_cbsa,Busn Address Core Based Statistical Area Code,,Derived
55,ba_csa,Busn Address Combined Statistical Area Code,,Derived
56,ba_num_businesses,Busn Address Number of Businesses,,Derived