Loughran-McDonald Master Dictionary w/ Sentiment Word Lists

Master Dictionary

Updated: February, 2024

CSV Format:   Loughran-McDonald_MasterDictionary_1993-2023.csv

XLSX Format: Loughran-McDonald_MasterDictionary_1993-2023.xlsx

The dictionary/sentiment lists are free for use in academic research. For commercial licenses, please contact us at loughranmcdonald@gmail.com.

 

  • The base dictionary is derived from release 4.0 of 2of12inf. This is a fairly common baseline dictionary and is oriented towards common words. The 2of12inf dictionary contains word inflections but does not contain abbreviations, acronyms, British English, hyphenated words, names, or phrases.
     
  • We extend the 2of12inf baseline dictionary to include words appearing in 10-K documents and earnings calls that are not found in the original 2of12inf word list by examining tokens from all 10-K type filings for the full EDGAR 10-K archive and earnings calls from CapIQ. We have added words to the original 2of12inf dictionary that are either an inflection of more commonly appearing words or words that appear in more than a trivial number of the documents. 
     
  • The dictionary reports counts, proportion of total, average proportion per document, standard deviation of proportion per document, document count (i.e., number of documents containing at least one occurrence of the word), seven sentiment category identifiers, complexity, number of syllables, and source for each word (source is either 12of12inf or the year in which the word was added).  
     
  • The sentiment categories are: negative, positive, uncertainty, litigious, strong modal, weak modal, and constraining. The sentiment words are flagged with a number indicating the year in which they were added to the list. Note: A year preceded by a negative sign indicates the year/version when the word was removed from the sentiment category.  

  • The Complexity column represents the words included in the lexicon developed in our 2024 JFQA measure for measuring firm complexity.
     
  • Although the dictionary does not, in general, include abbreviations, in the post-2018 versions we have added a limited number of abbreviations commonly occurring in the periodic filings. 
     
  • Detailed documentation appears here.
     
  • A Python module that will load the dictionary and its components (and optionally separate sentiment dictionaries) is here.

 

Sentiment and Complexity Word Lists

As noted above, the Master Dictionary tabulates all of the sentiment and complexity word lists.

The word lists are described in:

  • Tim Loughran and Bill McDonald, 2011, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks, Journal of Finance, 66:1, 35-65. (Available at SSRN: http://ssrn.com/abstract=1331573.)

  • Andriy Bodnaruk, Tim Loughran and Bill McDonald, 2015, Using 10-K Text to Gauge Financial Constraints, Journal of Financial and Quantitative Analysis, 50:4, 1-24. (Available at SSRN:http://ssrn.com/abstract=2331544.)

  • Tim Loughran and Bill McDonald, 2016, Textual Analysis in Accounting and Finance: A Survey, Journal of Accounting Research, 54:4,1187-1230. (Available at SSRN: http://ssrn.com/abstract=2504147.)

  • Tim Loughran and Bill McDonald, 2024, Measuring Firm Complexity, Journal of Financial and Quantitative Analysis, forthcoming. (Available at SSRN: https://ssrn.com/abstract_id=3645372.)

 

We thank Cam Harvey and others who have suggested some of the modifications and updates we’ve included in these lists.

For WordStat users:  WordStat .cat and .NFO files (2018 version)