Computerized Language Measures

This page concerns computerized language measures that are available for use with the Discourse Attributes Analysis Program (DAAP) and are of the following types:

Bernard Maskit, Wilma Bucci and Sean Murphy

A major limitation in conducting natural language and psychotherapy process research can be the labor intensive nature of transcribing and having raters judge large amounts of material. In order to address this issue computerized language measures that operationalize the referential process have been developed. Maskit's Discourse Attributes Analysis Program (DAAP) (2012) is a computerized text analysis program designed to compute scores for measures of the referential process, as well as other language measures. 

Like other text analysis programs, DAAP computes a score by matching words in a text to a dictionary and dividing the number of matches by the total number of words to provide a proportion of the language of a given type that is represented by the dictionary. In addition, DAAP has several features that are unique and are of particular value to the study of psychotherapy process. These include its ability to make use of weighted dictionaries and to smooth language use data over the course of a transcript.

Weighted dictionaries allow for weights to be assigned to each word in a dictionary. This feature allows for more sophisticated models of language use to be developed than could be obtained by creating dictionaries based on categories of language (e.g. positive or negative emotion words) alone. By weighting words, models can be constructed that include use of high frequency words, including function words, in order to predict a psychological variable. A major value of this approach is that relatively few types of function words represent the vast majority of words that are used in English and several other languages. The use of such words is implicit in communication and difficult to mediate consciously. By capitalizing on these features of function words, weighted dictionaries can cover a wider range of language use thus improving the reliability of measures based on these word types. These measures are also more capable of representing language style, as opposed to content, which may be of greater utility when one is trying to measure a psychological process. The Weighted Referential Activity Dictionary (WRAD) (Bucci & Maskit, 2006) (described further below) is the only such dictionary that has yet been constructed and validated to our knowledge; however, the possibility of creating other weighted measures exists and early tests of weighted measures of arousal and valence have been conducted by Murphy (2012).

The DAAP also provides the ability to smooth data from each dictionary over the words of a text or over time. When graphed this provides a quick overview of how language is being used over the course of a psychotherapy session and can help to point out key moments of interest in psychotherapy process. This feature also allows language use measures to be compared to one another, or to other measures such as vocal features, as they unfold over the course of a session. Measures such as Mean High WRAD (MHW) and High Wrad Proportion (HWP) (described further below) depend on this smoothed data in order to compute their scores. Covariation measures which show the degree to which pairs of language measures move together also depend on smoothing. Positive covariations shows that two measures move in the same direction most of the time whereas negative covariations show that the two measures move in opposite directions most of the time.

A detailed discussion of the DAAP program and of technical documentation of the program may be found at the link below:

Measures Based on Weighting and Smoothing

The Weighted Referential Activity Dictionary (WRAD): The WRAD is a weighted computerized dictionary based on an empirically derived model of judges’ ratings of referential activity or RA (Bucci & Maskit, 2006). RA is a construct used to identify moments in language when the speaker is immersed in narrative. High WRAD language marks the Symbolizing Phase of the referential process. The dictionary includes 697 words that cover 85% of the words in the corpus that was used to construct the measure. Each of these 697 words has a weight that contributes to a score between 0 and 1 with a neutral value of .5. The mean of WRAD weights in a sequence of text predicts how a judge would score the overall referential activity of that segment of text. (Detailed information about the creation of WRAD may be found at this link.)

Mean High WRAD (MHW): is the extent to which the smoothed WRAD data is above its neutral midpoint of .5. This is a measure of how high WRAD gets on average when it is high. To understand this metaphorically we can consider a person’s speed in miles per hour. WRAD would be analogous to a person’s average speed, over a given time period, however they move, running, walking, crawling, etc. MHW would be how fast they move on average when they are running. High MHW scores are interpreted as indicating how immersed a speaker becomes in a narrative.

High WRAD Proportion (HWP): The proportion of words in a text for which the smoothed WRAD curve is above its neutral midpoint of .5. This measure can be understood as a measure of sustained RA, above the midpoint within a segment, though it does not necessarily reach extremely high measures (this is better captured by MHW above). A segment with a higher HWP score is likely to consist of a story, memory description, or detailed discussion of a mental image.

The Reflection WRAD Covariation (Ref_WRAD): A measure of narrative immersion. Reflection and WRAD typically move in opposite directions. A strong negative Ref_WRAD covariation indicates immersion in a story when WRAD is high alternating with distancing when REF is high.

The Disfluency Ref Covariation (DF_Ref): The extent to which Disfluency and Reflection move together or apart. A positive covariation may indicate avoidance in a speaker whereas a negative covariation may indicate that the speaker is reflecting on something.

The Disfluency WRAD Covariation (DF_WRAD): Disfluency and WRAD tend to move in opposite directions from one another indicating that speakers are generally less disfluent (more fluent) when they are actively engaged in a narrative.

Unweighted Measures

The unweighted measures (lists of words with a common theme) that are commonly used in DAAP analyses are listed below, along with their definitions.

These content dictionaries were originally developed by first compiling a list of all words (types) in all texts in our archive, and eliminating the function words. Each of these words was scored by three judges as to whether or not this word belongs in any of these dictionaries. For each dictionary, if all three judges agreed, then the word was included in the dictionary; if two of the three judges agreed that it should be included, then the final decision was left to a fourth judge. Under these circumstances, the word was included in the dictionary if and only if the fourth judge agreed that it should be.

These content dictionaries are all dynamic. That is, for each new project using these dictionaries, a new list of words that had not as yet been judged for inclusion in these dictionaries was generated, and each word in this new list judged for inclusion by three judges. If all three judges agreed, then the word was added to the existing dictionary; if two of the three judges agreed that the word should be included, then the final decision was made by a fourth judge.

Unweighted Measures Thought to Operationalize the Referential Process

Reflection (REF): REF words concern how people think and communicate thoughts. This dictionary includes words referring to cognitive or logical functions (e.g., assume, think, plan ) or entities (e.g., reason, cause, consequence); problems or failures of cognitive or logical functions (e.g., confuse); complex verbal communicative functions (e.g., comment, convince, argue, obfuscate); features of mental functioning (e.g., creative, logical). High REF language marks the Reorganizing Phase of the referential process.

Negation (NEG): A limited set of items that people use when negating in communication (e.g. no, not, never). Negations are a kind of logical operator and so are likely to co-occur with REF words, higher proportions of such terms are expected to mark the Reorganizing Phase of the referential process (Murphy, Bucci & Maskit, 2011, June).

Disfluency (DF): A limited set of six items that people use when struggling to communicate. These words are: kind, well, like, mean, know and the designation mm, which is used to represent all occurrences of um, uh, hmmm, etc. As part of preparation for use with DAAP the words like, kind, well, know and mean are disambiguated. The DF dictionary uses the word ‘know,’ as in, ‘well, you know, I uh, I uh, said…’ as opposed to ‘know’ as in ‘I know the answer.’ Similarly, ‘like’ as in, ‘well, like, I like, went to the like store you know’ is distinguished from ‘I like you,’ or ‘more like that.’ DAAP also counts incomplete words, repeated words and repeated two word phrases as disfluencies. Disfluent language is associated with cognitive load and effort in speech planning (Bortfeld, Leon, Bloom, Schober, & Brennan, 2001), higher use of disfluency may mark the Arousal Phase of the referential process (Kingsley, 2009).

Other Unweighted Measures

Affect (AFF): Words that concern how people feel and communicate feelings directly. This includes emotion labels (e.g., angry, sad, happy); functions associated with affective arousal (e.g., cried, screams, dare, fight, giggled); functions indicating motivation (e.g., need, try); words implicitly associated with affect (e.g., alone, against); evaluations indicating an affective response, either positive or negative (e.g., cute, gross, lousy, terrific, wonderful, important). The global measure of these words is the Affect Sum Dictionary (AFFS), which includes all of the affect words that have been identified for use with DAAP. These are further classified as Positive (AFFP), Negative (AFFN) and Mixed Affect (AFFZ) words. Negative affect words are yet further classified into four dimensions: Depression (AFFND), Hostility (AFFNH), Pain (AFFNP) and Fear (AFFNF). The definitions of Positive and Negative Affect are self explanatory; AFFZ words are words that seem to have an affective or emotional loading, but are neither Positive, nor Negative (e.g. anticipate, attitude, evoked, idealizing, overwhelmed, serious). Use of these Mixed Affect words may be a measure of disturbance and/or defensiveness, since they refer to emotion, but in an abstract way. The global measure AFFS is used as an index of how often affect words are being used overall without respect to their valence.

Sensory Somatic (SENS):  A set of words pertaining to bodily and or sensory experience (e.g., dizzy, eye, face, listen).  This dictionary has been further classified into: Body (SENS1), Sensory (SENS2), Motion (SENS3), Food (SENS4), Misc. (SENS5), Relationship (SENS7), Sex (SENS8) and Illness (SENS9) sub-dictionaries.

Sensory Somatic Affect Sum (SAS):  This is a master list of all words contained in the Sensory Somatic (SENS) and Affect Sum (AFFS) dictionaries in a single list.  The utility of this dictionary is that since it includes more words than either the SENS or AFFS dictionaries alone it is more reliable in samples with few words than either dictionary on its own.  This measure may be useful in distinguishing moments when emotionally arousing material is being discussed.

Theme Measures:

The DAAP measures also include a set of 26 theme measures that cover a range of topics relevant to psychotherapy and psychoanalysis.  These are listed below.

  1. Oral
  2. Anal
  3. Sex
  4. Male sexuality
  5. Female sexuality
  6. Maternal
  7. Medical terms
  8. Psychiatric terms
  9. Psychoanalytic terms
  10. Annihilation
  11. Narcissism
  12. Anti-social
  13. Guilt-Shame
  14. Family
  15. Other People
  16. Neutral relationships
  17. Negative relationships
  18. Positive relationships
  19. Submission
  20. Idealism
  21. Mastery
  22. Career
  23. Arts
  24. Development
  25. State of consciousness
  26. Defense


Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F., & Brennan, S. E. (2001). Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender.Language and Speech, 44(2), 123-147.

Bucci, W. & Maskit, B. (2006). A weighted dictionary for Referential Activity. In J. G. Shanahan, Y. Qu, & J. Wiebe (Eds.) Computing Attitude and Affect in Text; Dordrecht, The Netherlands: Springer; pp. 49-60.

Kingsley, G. (2009). The clinical validation of measures of the Referential Process. Ph.D. dissertation, Adelphi University, The Institute of Advanced Psychological Studies, United States -- New York. Retrieved September 3, 2010, from Dissertations & Theses @ Adelphi University.(Publication No. AAT 3377938).

Maskit, B. (2012, September). The Discourse Attributes Analysis Program (DAAP) (Series 8) [Computer software]. Unpublished computer software.

Murphy, S.M., Bucci, W. & Maskit (2011, June). The language of psychotherapy process: Cross-linguistic markers of narrative and Referential Activity using the Linguistic Inquiry Word Count (LIWC). In S. Murphy (Moderator), Cross-linguistic studies in narrative, emotional expression and the Referential Process. Panel presented at the 42nd International Annual Meeting of The Society for Psychotherapy Research, Bern, Switzerland.