CRSP Flat Files

Download a zipfile containing all of the CRSP access programs

The CRSP 2024 data (released in 1/2025) is the last release produced in the Flat File Format 1.0 (FIZ) and their proprietary legacy CRSPAccess format before moving to the new FF 2.0 (CIZ) format. Documentation for the changes is provided here. The new format also modifies the dividend reinvestment assumption, which slightly changes the historical return series (see Schwarz, Walter, and Weiss 2025). For CRSP subscribers, the flat files are located in the cisYYYYMM_ascii directory, where YYYY is the current data year. 

I appreciate that CRSP is conveniently accessible on WRDS, but being able to directly input, store, and manipulate all of the returns within your own code has its advantages. I prefer not to have any intermediary between my code and the raw data. As a result, I have created programs to access some of the more frequently used files. The software is provided "as is" without warranty of any kind.

I provide Python modules to access the daily and monthly returns files, the index file, the distribution file, the info_hist file, and a sample application program. There are three modules for accessing the returns: one for daily returns, one for monthly returns, and a utility program that indexes the data files for random access. The daily and monthly return programs are separate for two reasons: (1) the variables in each file are not the same, and (2) to load all of the data into a dictionary requires a bit more brute force for daily returns. Descriptions of each program are provided below.

  • CRSP_StkMthSecurityData_FF2.0.py / CRSP_StkDlySecurityData_FF2.0.py - programs to load the CRSP Daily/Monthly/Daily Index Time Series files (StkDlySecurityData.dat and StkMthSecurityData.dat) into a dictionary keyed on PERMNO (str) and date (int), e.g., ts_dly[PERMNO][YYYYMMDD] or ts_mthly[PERMNO][YYYYMM]. There are two access points in the module, one function loads all of the data, and the other loads only the data for a dictionary of PERMNOs and dates. Missing values are "" for strings, None for int, and NAN for float.
    1. Load all data:

      timeseries_daily_all(path, logfile=None, limit=None)

      timeseries_monthly_all(path, logfile=None, , limit=None)

      timeseries_daily_all and timeseries_monthly_all load the full dataset into a dictionary keyed by PERMNOs. For monthly data, this takes about 1 minute, 3 gigs of RAM, and loads about 5.1 million records each containing all of the data items for a given date and PERMNO. For daily data, in order to load all of the data on a machine with around 64G you must select a subset of the available variables, which you can see in the CRSPDataobj code. This will take something more than 15 minutes to load,  and loads about 107.7 million records. The functions have four arguments:

      path - the file path pointing to the folder containing the CRSP flat files.
      logfile - <optional> file object for creating a log file.
      limit - <optional> limits the number of PERMNOs loaded into the data dictionary.

      The programs return:
          ts_mthly/ts_dly - dictionaries of data objects (see code) keyed on PERMNO and date.

    2. Load data for a dictionary keyed on PERMNO, with a list of tuples containing begin/end dates:

      timeseries_daily_filtered(path, permno_filter_dictionary, _ptr_dictionary)

      timeseries_monthly_filtered(path, permno_filter_dictionary, _ptr_dictionary)

      This function loads only data for a given set of PERMNOs. Before running this function, you must run the load_ptr_dictionary function in LOCAL_TSIndexFiles.py to facilitate random access of the file. The filtered returns functions have three arguments:

      path - the file path pointing to the folder containing the CRSP flat files.
      permno_filter_dictionary - user-supplied dictionary with PERMNO key and list of tuple begin/end dates. Note that the list of tuples allows the user to request more than one date interval for a given PERMNO. See the example in the _test_module().
      ptr_dictionary - dictionary of file pointers that must be loaded in the calling program (pointer file is created with LOCAL_TSIndexFiles.py).

      The programs return:
          ts_mthly/ts_dly - dictionaries of data objects (see code) keyed on PERMNO and requested dates.
       
  • LOCAL_TSIndexFiles.py  

    As noted above, one version of the stock time-series programs loads only the data for a requested PERMNO, which allows quick access to only those stocks you wish to analyze. To achieve efficient access, you must first create a file that contains pointers for each PERMNO record. (You only need to do this once with each new version of the CRSP data.) By modifying the inputs/outputs in the __main__ section of the program, you can create monthly and/or daily files. The function create_index_file() has two arguments: in_crsp--the path to the CRSP index data (either daily or monthly, i.e., StkXXXSecurityData.dat), and the path/file for the output file. Run the program separately for the daily and monthly data. Once the data has been created, it is then loaded into the programs above using the load_ptr_dictionary() access point, which requires the pointer to the index file that has been created for the daily or monthly data you're accessing.

     
  • CRSP_Index_TimeSeries_FF2.0.py - program to load the CRSP data for a specified index contained in the daily or monthly index file (IndDlySeriesData.dat or IndMthSeriesData.dat). The program is accessed using:

    • index_timeseries(path, target_index,  series_type=X) 

      path: Directory containing CRSP data files
      target_index: INDNO to retrieve (must be in valid range). From IndSeriesInfoHdr.dat
      series_type: Either "Monthly" or "Daily"
       
      The program returns three items: 
      (1) data objects for the requested index stored in a dictionary keyed on date (YYYYMMDD or YYYYMM),
      (2) the beginning year of the data, and
      (3) the ending year of the data
       
  • CRSP_Distributions.py - program to load CRSP Distributions file (StkDistributions.dat). The program loads the data into a dictionary keyed on PERMNOs. Each PERMNO entry contains sub-levels of dictionaries keyed on disttype, disexdt, and disseqnbr. For example, to retrieve IBM's (PERMNO=12490) dividend that was declared on 19540211, with an ex-distribution date of 19540217, the dictionary lookup would be distributions[12490]["CDIV"][19540217][1].
    • load_distributions(path)

  1. path -  the file path pointing to the folder containing the CRSP flat files.

The program returns:

distributions{permno, {distype,  {exdt, {disseqnbr, DISTClass}}}}

where DISTClass is a class object with attributes corresponding to the variables listed in the CRSP documentation for the Distributions file.

  • CRSP_StkInfoHist_FF2.py - two functions for use with the StkSecurityInfoHist.dat file. The first function load_info_history() loads all of the data into a dataclass keyed on PERMNO and begdate for the information series. Once this is loaded you can then retrieve the info_history data for a specific PERMNO and target date using get_infohistory(). See the program for details.

  • SAMPLE_Betas.py - sample program (in the UtilityPrograms subfolder) that demonstrates how these programs can be used to calculate betas for all stocks over five-year intervals.