Augmented 10-X Header Data
Download the data
Description
This dataset captures all of the information in the header section of 10-K/Qs (and all variants) filed on EDGAR and appends some additional features. The headers appear in the required ".txt" filing, which includes the complete filing, and are demarcated with the tags <SEC-Header> </SEC-Header> or <IMS-Header></IMS-Header>. The additional features--such as latitude, longitude, and population--are based on the business address zip code as reported in the filing. Appended data for non-U.S. addresses is missing. Although the filing format is essentially stable over its entire history, a few structural quirks are worth noting:
- Early periods (pre-1999) included a field labeled SROS, presumably self-regulatory organizations, that indicated exchange listing (e.g., NASD, NYSE). I have not included this field in the dataset.
- The ACCEPTANCE-DATETIME field did not begin appearing until 20020515 (and therefore is missing in prior periods).
- In cases where data are taken directly from the form, missing values are simply blank. In other cases involving derived data, -99 is used for missing values. For the boolean variables, 0 is used for missing.
- Certain companies, primarily regulated utilities, will file the same form for multiple, but related, firms with all of the filers listed as a <FILER> field in each document. Descriptive data for each FILER field is recorded for each filing, which creates records that are unique with respect to the cik of the specific filing, but duplicated across the multiple filings. We account for this phenomenon in the "number-of-filers" and "filer-number" fields described in the Data section below.
Data
The file is in csv format, which is easily imported into Excel, Stata, or any other software. The first record is a header with the variable names. The following list describes each variable as they appear sequentially for a given record. There are more than 1.4 million records. The fields below are sequence number, variable name, variable label, EDGAR field, and source.
1,f_cik,CIK,,Filename 2,f_fdate,File Date,,Filename 3,f_ftype,Form Type,,Filename 4,f_year,File Year,,Filename 5,f_quarter,File Quarter,,Filename 6,f_month,File Month,,Filename 7,f_day,File Day,,Filename 8,f_yymm,File YYMM,,Filename 9,f_yq,File YearQtr,,Filename 10,f_dw,File Day of Week (0=Monday),,Filename 11,accpt_datetime,Acceptance DateTime,<ACCEPTANCE-DATETIME>,EDGAR 12,acc_num,Accession Number,ACCESSION NUMBER,EDGAR 13,conf_subtype,Conformed Submission Type,CONFORMED SUBMISSION TYPE,EDGAR 14,pdoc_cnt,Public Document Count,PUBLIC DOCUMENT COUNT,EDGAR 15,conf_per_rpt,Conformed Period of Report,CONFORMED PERIOD OF REPORT,EDGAR 16,file_date,Filed as of Date,FILED AS OF DATE,EDGAR 17,date_of_chg,Date as of Change,DATE AS OF CHANGE,EDGAR 18,comp_conf_name,Company Conformed Name,COMPANY CONFORMED NAME,Filer 19,cik,Central Index Key,CENTRAL INDEX KEY,Filer 20,sic_label,Standard Industrial Classification (label),STANDARD INDUSTRIAL CLASSIFICATION,Filer 21,sic_num,Standard Industrial Classification (4-digit code),STANDARD INDUSTRIAL CLASSIFICATION,Filer 22,irs_num,IRS Number,IRS NUMBER,Filer 23,state_of_incorp,State of Incorporation,STATE OF INCORPORATION,Filer 24,fye,Fiscal Year End,FISCAL YEAR END,Filer 25,form_type,Form Type,FORM TYPE,Filer 26,sec_act,SEC Act,SEC ACT,Filer 27,sec_file_num,SEC File Number,SEC FILE NUMBER,Filer 28,film_num,Film Number,FILM NUMBER,Filer 29,ba_street1,Business Address: Street 1,STREET 1,Filer 30,ba_street2,Business Address: Street 2,STREET 2,Filer 31,ba_city,Business Address: City,CITY,Filer 32,ba_state,Business Address: State,STATE,Filer 33,ba_zip9,Business Address: Zip+4,ZIP,Filer 34,ba_phone,Business Phone,BUSINESS PHONE,Filer 35,ma_street1,Mailing Address: Street 1,STREET 1,Filer 36,ma_street2,Mailing Address: Street 2,STREET 2,Filer 37,ma_city,Mailing Address: City,CITY,Filer 38,ma_state,Mailing Address: State,STATE,Filer 39,ma_zip9,Mailing Address: Zip+4,ZIP,Filer 40,former_name,Former Conformed Name,FORMER CONFORMED NAME,Filer 41,date_of_name_chg,Date of Name Change,DATE OF NAME CHANGE,Filer 42,number_of_filers,Number of filers associated with this filing,,Derived 43,filer_number,Sequence number for filer (>1 for mult filers),,Derived 44,ba_zip5,Busn Address 5-digit Zip (numeric),,Derived 45,ma_zip5,Busn Address 5-digit Zip (numeric),,Derived 46,ba_latitude,Busn Address Latitude,,Derived (World Geodetic System of 1984) 47,ba_longitude,Busn Address Longitude,,Derived (World Geodetic System of 1984) 48,ba_population,Busn Address Population,,Derived 49,ba_is_state,Busn Address US State (0/1),,Derived 50,ba_contiguous_state,Busn Address Contiguous US State (0/1),,Derived 51,ba_state_fips,Busn Address State FIPS Code,,Derived 52,ba_county_fips,Busn Address County FIPS Code,,Derived 53,ba_msa,Busn Address Metropolitan Statistical Area Code,,Derived 54,ba_cbsa,Busn Address Core Based Statistical Area Code,,Derived 55,ba_csa,Busn Address Combined Statistical Area Code,,Derived 56,ba_num_businesses,Busn Address Number of Businesses,,Derived