CRSP Flat Files

** Click here to download a zipfile containing all of the CRSP access programs. **

 

Recently CRSP began making their data files directly accessible in flat-file format in addition to their proprietary legacy CRSPAccess format. A general description of the new format is provided at this website. The new format allows users to more directly input, store, and manipulate the original data. I have created Python programs to access some of the more frequently used files. The software is provided "as is" without warranty of any kind.

For CRSP subscribers, the flat files are located in the \SI\sizYYYY12_ascii directory, where YYYY is the current year. (The CRSP TR folder provides supplementary flat-file data for items such as the CPI and treasuries.  At this point, I have not created any software for these files.)  

The programs that access the price/return history for all stocks have two entry points (in addition to daily/monthly versions). In one case, you can provide the program a PERMNO and it will return a dictionary containing the history of data items. In order to use this method, you must first create a file using the program LOCAL_TSIndexFiles to create a pointer file for random access.  You only need to do this once to create the file (and update it each year). 

Alternatively, you can call a version of the access program that will load all of the data (all price data for all firms) into a dictionary keyed on PERMNO. Most machines should be able to do this with the monthly data without memory issues. Because of the memory requirements, storing all of the daily data will only work with machines having more than about 64 gig of RAM. (You could try hacking the program to include only the data of interest and shrink this requirement down.) Also running the program to store all of the daily data in debug mode of some development environments can be problematic. Give it a try, but be aware of this limitation.

Click on the program names below to view a brief description.

  1. CRSP_MonthlyIndex_TimeSeries.py / CRSP_DailyIndex_TimeSeries.py
  2. CRSP_TimeSeries_Monthly.py / CRSP_TimeSeries_DailyPrimary.py
  3. LOCAL_TSIndexFiles.py
  4. CRSP_Delist_Monthly_Return.py / CRSP_Delist_Daily_Return.py
  5. CRSP_Distributions.py
  6. CRSP_IndexHeader.py
  7. CRSP_IndexMembership.py
  8. CRSP_Name_Histroy.py
  9. CRSP_SecurityHeaderInformation.py
  10. CRSP_Shares_History.py
  11. LOCAL_Converter.py
  12. SAMPLE_Betas.py

 

  • CRSP_MonthlyIndex_TimeSeries.py / CRSP_DailyIndex_TimeSeries.py - programs to load the CRSP Monthly/Daily Index Time Series files (sfz_mind.dat and sfz_dind.dat) into a dictionary keyed on dates for a requested CRSP INDNO identifier. The functions have three arguments:
     
    • dailyindex_timeseries(path, target_index, missing_values=None)
       
    • monthlyindex_timeseries(path, target_index, missing_values=None)
       
    1. path - the file path pointing to the folder containing the CRSP flat files.
    2. target-index - the CRSP INDNO index identifier.  The CRSP INDNO's are documented here. The function returns a dictionary keyed on date. For the daily data the date is in YYYYMMDD format, for the monthly data the format is YYYYMM. Each date-key in the dictionary is associated with a class object with attributes representing the variables described in the CRSP documentation on p. 7 (here).
    3. missing_values - user-assigned value for missing data (numeric or None).

The programs return:

dind_ts or mind_ts - a dictionary of DITSClass or MITSClass objects (see code) keyed on date.

 

  • CRSP_TimeSeries_Monthly.py / CRSP_TimeSeries_DailyPrimary.py - programs to load the stock time-series files (sfz_mth.dat and sfz_dp_dly.dat). Each module has two functions.
  1. Load all data:

    timeseries_dailyprimary_all(path, logfile=None, missing_values=None, limit=None)

    timeseries_monthly_all(_path, logfile=None, missing_values=None, limit=None)

    timeseries_dailyprimary_all and timeseries_monthly_all load the full data set into a dictionary keyed on PERMNOs. For monthly data, this takes about 1 minute, 3 gigs of RAM, and loads about 4.3 million records each containing all of the data items for a given date and PERMNO. For daily data, this takes about 12 minutes to load, requires 40 gig of RAM, and loads about 90.5 million records. The function has four arguments:

        path - the file path pointing to the folder containing the CRSP flat files.
        logfile - <optional> file object for creating a log file.
        missing_values - <optional> user-assigned missing value (numeric or None).
        limit - <optional> limits the number of PERMNO's loaded into the data dictionary.

    The programs return:
        ts_mthly/ts_dp - dictionaries of TSMthlyClass/TSDPClass objects (see code) keyed on CRSP PERMNO.
     
  2. Load data for a dictionary keyed on PERMNO, with a list of tuples containing begin/end dates:

    timeseries_daily_filtered(path, permno_filter_dictionary, _ptr_dictionary, missing_values=None)

    timeseries_monthly_filtered(path, permno_filter_dictionary, _ptr_dictionary, missing_values=None)

    This function loads only data for a given PERMNO.  Before running this function you must run the load_ptr_dictionary function in  LOCAL_TSIndexFiles.py to facilitate random access of the file. The function has three arguments:

        path - the file path pointing to the folder containing the CRSP flat files.
        permno_filter_dictionary - user-supplied dictionary with PERMNO key and list of tuple begin/end dates. Note that the list of tuples allows the user to request more than one date interval for a given PERMNO.
        ptr_dictionary - dictionary of file pointers that must be loaded in the calling program (pointer file is created with LOCAL_TSIndexFiles.py).
        missing_values - <optional> - user-assigned value for missing data (numeric of None).

    The programs return:
          ts_mthly/ts_dp - dictionaries of TSMthlyClass/TDSPClass objects (see code) keyed on CRSP PERMNO and requested dates.
     
  • LOCAL_TSIndexFiles.py  

    As noted above, one version of the stock time-series programs loads only the data for a requested PERMNO, which allows quick access to only those stocks you wish to analyze. In order to achieve efficient access, you must first create a file that contains pointers for each PERMNO record. (You only need to do this once with each new version of the CRSP data.) By modifying the inputs/outputs in the __main__ section of the program, you can create monthly and/or daily files.

     
  • CRSP_Delist_Monthly_Return.py / CRSP_Delist_Daily_Return.py - programs to load the CRSP Delist Monthly/Daily Return Information files (sfz_mdel.dat and sfz_del.dat) into a dictionary keyed on PERMNO. The functions have two arguments:
  1. path - the file path pointing to the folder containing the CRSP flat files.
  2. missing_values - <optional> user-assigned missing value (numeric or None).

 

  • CRSP_Distributions.py - program to load CRSP Distributions file (sfz_dis.dat). The program loads CRSP Distributions (sfz_dis.dat) into a dictionary keyed on PERMNO's. Each PERMNO entry contains sub-levels of dictionaries keyed on distcd, exdt, and acperm. For example, to retrieve IBM's (PERMNO=12490) dividend that was declared on 19540211, with an ex-distribution date of 19540217, the dictionary lookup would be distributions[12490][1232][19540217][0].
    • load_distributions(path, missing_values=None):

  1. path -  the file path pointing to the folder containing the CRSP flat files.

  2. missing_values - <optional> - user-assigned missing data (numeric of None).

The program returns:

distributions{permno, {distcd, {exdt, {acperm, DISTClass}}}}

where DISTClass is a class object with attributes corresponding to the variables listed in the CRSP documentation for the Distributions file.

 

  • CRSP_IndexHeader.py - program to load CRSP Index Header data into a dictionary keyed on indno.
     
    • load_index_header(path, missing_values=None)

      • The function has two arguments:

        1. path -  the file path pointing to the folder containing the CRSP flat files.

        2. missing_values - <optional> - the value assigned to missing data (numeric of None).

          The program returns a dictionary keyed on indno with IDXHDRClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Index Header file.

    • index_header_print(_index_hdr, target, logfile=None)

      • Utility routine to output header description from index_hdr dictionary for a given target. Output will print to console unless a logfile is specified.

  1. _index_hdr - the dictionary object from load_index_header.

  2. target - the number of the target index.
  3. logfile - <optional> the file where the output is written. Default is print to console.

    The program returns a dictionary keyed on indno with IDXHDRClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Index Header file.
     
  • CRSP_IndexMembership.py - Module with two access points. One is to load the index_membership data into a dictionary keyed on PERMNO and date, the second is to determine if a security is in the S&P500 on a target date.
     
    • Load index_membership into a dictionary keyed on PERMNO and date.

      index_membership(path, missing_values=None)

      1. path -  the file path pointing to the folder containing the CRSP flat files.

      2. missing_values - <optional> - the value assigned to missing data (numeric of None).

        The program returns a dictionary keyed on permno with IDXMBRClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Index Membership file.

    • Boolean function that returns True if the targeted permno is in the S&P500 on the targeted date.

      index_membership_query(_index_mbr, _permno, target_date)

      1. index_mbr dictionary from index_membership program.

      2. permno - targeted PERMNO

      3. target_date - targeted date (YYYYMMDD)

 

  • CRSP_Name_History.py - Module with two access points. One is to load the name_history data into a dictionary keyed on PERMNO and date and the second is to retrieve a specific name history (NMHISTClass) for a given PERMNO and target date.
     
    • load_name_history(path, logfile=None, missing_values=None)

      Loads name_history into a dictionary keyed on PERMNO and date. Arguments:

      1. path -  the file path pointing to the folder containing the CRSP flat files.

      2. logfile - a file object for writing internal counts to a logfile.

      3. missing_values - <optional> - the value assigned to missing data (numeric of None).

        The program returns a dictionary keyed on PERMNO with NMHISTClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Name History file.

    • get_namehistory(_namehistory, _permno, _target_date)

      Function that returns a name history class object of a targeted PERMNO and date. Arguments:

      1. name_history - the dictionary from the load_name_history program.

      2. permno - targeted PERMNO

      3. target_date - targeted date (YYYYMMDD)
         

  • CRSP_SecurityHeaderInfostory.py - Program to load security header information into a dictionary keyed on PERMNO.
    • load_security_header_information(path, missing_values=None)


       Arguments:

      1. path -  the file path pointing to the folder containing the CRSP flat files.

      2. missing_values - <optional> - the value assigned to missing data (numeric of None).

        The program returns a dictionary keyed on PERMNO with SECHDRINFOClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Security Header Info file.

         
  • CRSP_Shares_History.py - Module with two access points. One to load the name_history data into a dictionary keyed on PERMNO and date and a second to retrieve a specific name history (NMHISTClass) for a given PERMNO and target date.
     
    • load_name_history(path, logfile=None, missing_values=None)

      Loads name_history into a dictionary keyed on PERMNO and date. Arguments:

      1. path -  the file path pointing to the folder containing the CRSP flat files.

      2. logfile - a file object for writing internal counts to a logfile.

      3. missing_values - <optional> - the value assigned to missing data (numeric of None).

        The program returns a dictionary keyed on PERMNO with NMHISTClass objects having attributes corresponding to the variables listed in the CRSP documentation for the Name History file.
         
  • LOCAL_Converter.py - Routine used by other programs to convert missing values based on variable type. Must be imported.

     
  • SAMPLE_Betas.py - A sample program that calculates betas using monthly data for all firms with complete data for all five-year intervals.