Loughran-McDonald Master Dictionary w/ Sentiment Word Lists
Updated: January 2022
CSV Format: Loughran-McDonald_MasterDictionary_1993-2021.csv
XLSX Format: Loughran-McDonald_MasterDictionary_1993-2021.xlsx
The dictionary/sentiment lists are free for use in academic research. For commercial licenses, please contact us at email@example.com.
- The base dictionary is derived from release 4.0 of 2of12inf. This is a fairly common baseline dictionary and is oriented towards common words. The 2of12inf dictionary contains word inflections but does not contain abbreviations, acronyms, British English, hyphenated words, names, or phrases.
- We extend the 2of12inf baseline dictionary to include words appearing in 10-K documents and earnings calls that are not found in the original 2of12inf word list by examining tokens from all 10-K type filings for the full EDGAR 10-K archive and earnings calls from CapIQ. We have added words to the original 2of12inf dictionary that are either an inflection of more commonly appearing words or words that appear in more than a trivial number of the documents.
- The dictionary reports counts, proportion of total, average proportion per document, standard deviation of proportion per document, document count (i.e., number of documents containing at least one occurrence of the word), seven sentiment category identifiers, number of syllables, and source for each word (source is either 12of12inf or the year in which the word was added).
- The sentiment categories are: negative, positive, uncertainty, litigious, strong modal, weak modal, and constraining. The sentiment words are flagged with a number indicating the year in which they were added to the list. Note: A year preceded by a negative sign indicates the year/version when the word was removed from the sentiment category.
- Although the dictionary does not, in general, include abbreviations, in the post-2018 versions we have added a limited number of abbreviations commonly occurring in the periodic filings
- Detailed documentation appears here.
- A Python module that will load the dictionary and its components (and optionally separate sentiment dictionaries) is here.
Sentiment Word Lists
As noted above, the Master Dictionary also tabulates all of the sentiment word lists.
The word lists are described in:
Tim Loughran and Bill McDonald, 2011, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks, Journal of Finance, 66:1, 35-65. (Available at SSRN: http://ssrn.com/abstract=1331573.)
Andriy Bodnaruk, Tim Loughran and Bill McDonald, 2015, Using 10-K Text to Gauge Financial Constraints, Journal of Financial and Quantitative Analysis, 50:4, 1-24. (Available at SSRN:http://ssrn.com/abstract=2331544.)
Tim Loughran and Bill McDonald, 2016, Textual Analysis in Accounting and Finance: A Survey, Journal of Accounting Research, 54:4,1187-1230. (Available at SSRN: http://ssrn.com/abstract=2504147.)
We thank Cam Harvey and others who have suggested some of the modifications and updates we’ve included in these lists.
For WordStat users: WordStat .cat and .NFO files (2018 version)