10-X Header Data

Download the data


This dataset captures all of the information in the header section of 10-K/Qs (and all variants) filed on EDGAR. The headers appear in the required ".txt" filing, which includes the complete filing, and are demarcated with the tags <SEC-Header> </SEC-Header> or <IMS-Header></IMS-Header>. Although the filing format is essentially stable over its entire history, a few structural quirks are worth noting:

  • Early periods (pre-1999) included a field labeled SROS, presumably self-regulatory organizations, that indicated exchange listing (e.g., NASD, NYSE). We have not included this field in the derived data.
  • A field labeled ITEM INFORMATION appears in only two documents and appears to be an artifact of a required field in 8-K filings. We have not included this field in the derived data.
  • The ACCEPTANCE-DATETIME field did not begin appearing consistently until 20020515.
  • Missing values are blank.
  • Certain companies, primarily regulated utilities, will file the same form for multiple, but related, firms with all of the filers listed as a <FILER> field in each document. Each row of data in this dataset represents a "filer" within a given filing. Descriptive data for each "filer" field is recorded for each filing. Many times these documents are also filed for each of the "filers" included in the document, thus there is the potential for what is essentially duplicate observations. The "filer_count" field in the data provides the sequence number followed by the total in the sequence for "filers" in a given filing (e.g., 3of6 would be the third "filer" within a given filling where there is a total of 6 filers).


The file is in csv format, which is easily imported into Excel, Stata, or any other software. The first record is a header with the variable names. The following list describes each variable as they appear sequentially for a given record. There are more than 1.4 million records.

  1. Variables derived from the file name and accession number.
    1. filing_firm_cik - the CIK number of the firm filing the document.
    2. filing_date - the date when the document was filed (YYYYMMDD).
    3. filer_cik - the CIK number for the "filer" that the row of data is reporting. (Not necessarily the same as the filing_fim_cik, since multiple "filers" can appear in one filing.
    4. filer_count - the sequence of filers reported within one filing (sequence # of total filers, e.g., 3of6 - see above).
    5. filing_agent_cik - the CIK number of the agency filing the document.
    6. filing_agent_tot - each accession number in the filing includes a running total of filings for a given agent which is reset to zero at the beginning of the year. In some cases, this can be useful in determining the sequence of filings.
  2. GENERAL - variables not included in a specific header sub-paragraph. These variable names are preceded with "g_".
    1. g_abs_asset_class_1 - field for asset-backed securities. This is only reported for ABS beginning in 2017 and has occurred up to two times in some filings. Thus, we have two of these variables.
    2. g_abs_asset_class_2 - see #7.1.
    3. g_acceptance-datetime - occurs primarily for larger firms prior to 2017, but is consistent following 200205.
    4. g_accession_number - SEC defined accession number (xxxxxxxxxx-xx-xxxxxx). The first 10 digits are the CIK of the entity submitting the filing which would be the company or a third-party filer agent. The next two digits are the year and the final six digits represent the sequential count of submitted filing from that CIK (usually, but not always, reset to zero at the start of each calendar year).
    5. g_confirming_copy - this field appears in only 1,020 filings and defines the submission as being a confirming copy to a paper submission.
    6. g_conformed_period_of_report - End date of reporting period of filing (optional).
    7. g_conformed_submission_type - form type. For this sample any 10-K/Q variant.
    8. g_date_as_of_change - date when the last posted acceptance occurred (optional).
    9. g_filed_as_of_date - EDGAR assigned official filing date, or post-acceptance new filing date (required).
    10. g_public_document_count - number of public documents in the submission.
  3. COMPANY DATA - sub-paragraph appearing in all filing headers. These variable names are preceded with "cd_".
    1. cd_company_conformed_name - company name (required). (Commas have been replaced with a space in names where appropriate.)
    2. cd_central_index_key - filer Central Index Key which is assigned by the SEC for this filer.
    3. cd_standard_industrial_classification - company's self-assigned SIC code, which is initially reviewed for approval by the SEC.
    4. cd_organization_name - this field was first reported in Q4 of 2023. The SEC defines it as: "The EDGAR CIK Owner Organization Name when it is associated with the CF branch."
    5. cd_irs_number - the company's IRS number. I assume this corresponds to their Employer Identification Number (EIN).
    6. cd_state_of_incorporation - state where the company was incorporated (optional, but generally reported).
    7. cd_fiscal_year_end - company fiscal year end (optional, but generally reported).
  4. FILING VALUES - sub-paragraph appearing in all filing headers. These variable names are preceded with "fv_".
    1. fv_form_type - for this sample 10-K/Q and variants of these forms (e.g., 10-K405).
    2. fv_sec_act - act under which filings are made. Almost always "1934 Act".
    3. fv_sec_file_number - SEC Conformed File Number.
    4. fv_film_number - Also known as the Document Control Number (DCN) Microfilm number assigned to submission.
  5. BUSINESS ADDRESSS - sub-paragraph appearing in almost all filing headers. These variable names are preceded with "ba_".
    1. ba_street_1
    2. ba_street_2
    3. ba_city
    4. ba_state
    5. ba_zip
    6. ba_business_phone
  6. MAIL ADDRESS - sub-paragraph appearing in most filing headers. These variable names are preceded with "ma_".
    1. ma_street_1
    2. ma_street_2
    3. ma_city
    4. ma_state
    5. ma_zip
  7. FORMER COMPANY - the former companies reported for this "filer". The maximum in the data is five, thus there are five separate entries. The variable names are preceded by "fc_".
    1. fc_former_conformed_name_1
    2. fc_date_of_change_1
    3. fc_former_conformed_name_2
    4. fc_date_of_change_2
    5. fc_former_conformed_name_3
    6. fc_date_of_change_3
    7. fc_former_conformed_name_4
    8. fc_date_of_change_4
    9. fc_former_conformed_name_5
    10. fc_date_of_change_5