The Weighted Referential Activity Dictionary (WRAD)

Bernard Maskit and Sean Murphy

This page concerns the computerized measurement of Referential Activity (RA). For an overview of RA, information on judge based scoring of RA, or for information about other computerized language measures please see the pages below:

The Weighted Referential Activity Dictionary (WRAD) is a list of words with weights that is run using The Discourse Attributes Analysis Program (DAAP) to yield a Referential Activity (RA) score for any natural language segment.  DAAP also produces a smoothed continuous RA score across the segment that may be compared to a variety of other language measures (please read about DAAP for more information). The WRAD measure was developed based on the reliable scoring of trained judges, using scales derived conceptually from the psychological features of the referential process as defined by Bucci (1997, 2002).

The development of computerized RA measures has both practical and theoretical significance.  On the applied level, while the scales are reliably and easily scored after brief training, computerized procedures permit assessment of RA in large samples and longitudinal studies.  Such measures also permit the reliable micro-analytic tracking of fluctuations in RA within various forms of communicative discourse.

The first method for computerized scoring of RA was the Computerized Referential Activity (CRA) measure of Mergenthaler and Bucci (1999). This measure was based on two dictionaries, comprising a total of 181 types. The CRA measure includes a set of items that are characteristic of high RA speech, the High CRA dictionary, and a set of items characteristic of low CRA speech, the Low CRA dictionary.

CRA has generally been applied using one of two text analysis systems, the UNIX based TAS/C (Mergenthaler, 1985) and the Windows based CM (Mergenthaler 1998). Neither permit use of weighted dictionaries, and both track fluctuations using either arbitrary segmentation into units of fixed size, or labor intensive operator scored segmentation.  These limitations are overcome by the Discourse Attributes Analysis Program (DAAP), which automatically produces a continuous measure without use of segmentation into arbitrary units. The DAAP produces a mathematically smooth local averaging that starts anew with each change of speaker, and was specifically designed to permit use of weighted as well as unweighted dictionaries. The availability of the DAAP system was a necessary condition for the application of the Weighted Referential Activity Dictionary (WRAD) that was built to model the RA scores.

To the authors’ knowledge, most dictionaries thus far produced for computerized text analysis are unweighted; that is, an item either is in the dictionary or it is not. Weighted rather than binary dictionaries are particularly important for assessment of stylistic variables, which tend to vary in degree, in contrast to content features, which can usually be defined as present or absent. We anticipate that weighted dictionaries for other psychological and linguistic variables that are more closely related to style than to content can also be produced using this technique. WRAD dictionaries in Spanish and Italian are currently in use; WRAD dictionaries in other languages are under consideration.

Our procedure used a principle of modeling RA scale scores as rated by judges, similar to that introduced by Mergenthaler and Bucci (1999); but we used new techniques specifically designed to produce weighted dictionaries.  For details concerning the method of creation of the WRAD, summary data and linguistic implications please see Bucci & Maskit's 2006 chapter, 'A weighted referential activity dictionary' in Shanahan, Qu, & Wiebe (Eds.) 'Computing attitude and affect in text.'

Description of the WRAD 

The WRAD is a list of 697 item types, of which 675 are ordinary words. (The dictionary actually contains 706 items, as the numbers 1,...,9 appear as both numbers (1, 2, ...) and words (one, two, ....) Of the other 22, 12 are beginnings of contractions, such as “couldn” or “didn”; 7 are ends of contractions, such as “s” or “t”; two are artificial words used for disambiguation, “knowD” and “likeV.” One item is the neutral sound often written as “mm”, “um” or “hm”; all of these are written, following our transcription rules, as “MM”.

The complete dictionary with weights may be downloaded for use with the Discourse Attributes Analysis Program (DAAP) here.

Correlations with Scale Scores

The WRAD measure correlates with the scoring of judges from .38 to .60 for the six samples used to create the model.  The coverage of the WRAD list is very high; the 697 types account for a total of between 83% and 87% of all tokens in the data sets used to create the model. This contrasts with coverage ranging from 50% to 56% for CRA. The greater coverage is made possible primarily by the weighting procedure; without weighting, as in the construction of CRA, only items associated with RA extremes could be included; the weighting procedures permit inclusion of mid-range items with appropriate weights.  Work to establish normative and other psychometric properties of the WRAD is currently under-way; some normative data for psychotherapy sessions has been produced.

The World According to WRAD

The example below shows how a text looks to the WRAD dictionary.  In this example, words that are highly weighted predictors of high RA are marked in dark red and words that are highly weighted predictors of low RA are in dark blue.  More neutral predictors are indicated in lighter shades of these colors.

I don't remember how old I was but my grandmother came to live with usHer husband had died and we had been in a two bedroom apartment and moved to a three bedroom but my sister and I still had to share a roomGrandmother got her own room and just at the time she came to live with us, she started to develop arthritis in her hands.  And there was a decanter and glasses set I was very fond ofThe decanter was all trimmed in gold and it was a beautiful shape and the glasses were very delicate all trimmed in the same gold.  And she picked it up one nightShe was having an argument with my parentsShe used to fight with my father.  This was my mother's mother and between her being upset and the fight, and what they told me was it was her arthritis, but now I wonder if she threw itShe broke this set, and it had always been my favorite.  If I were home sick, my mother would fill up the glasses and I would have my juice out of the glasses and on special occasions the decanter would be on the table and I was very angry at her that it was broken and they kept saying it was her arthritis, her hand had a spasm.  And I wasn't allowed to be angry at her about this.

The Mystery of Simple Words

Most people's initial reaction to the above example is to say something like, "You mean the words in black, right?"  The majority of the words marked in the example above (those included in the WRAD dictionary) are the words we pay the least attention to when we listen to a story.  Words such as "I", "the", "and", "it" are often thought of as "garbage words" that are not particularly meaningful.  Our tendency is to focus instead on the content words like "arthritis" or "decanter" above.  Yet, these simple words make up roughly 60% of all of the words spoken in the English language. 

To understand how much meaning is carried in these simple words consider the popular children's game MadLibs.  MadLibs are stories with blanks left for children to fill in content words.  The other words that are included in the MadLib are for the most part simple function words.  Yet, the arrangement of these words is such that no matter what words the child fills in, the finished MadLib will make a story.  For example, if we were to make a MadLib type example from the above it might look something like:

The _______ was all ______ in _____ and it was a _____ _____ and the ______ were very _______ all ______ in the same _____. And she picked it up one night. She was having an _______ with my parents.

MadLibs are more carefully designed than the above; however, this sort of example demonstrates just how much we know from simple words.  While we don't know much about the particulars of the story, we can tell 1) that we are listening to a story and 2) that it is about some particular kind of object and actions associated with it.

The Validity of WRAD

In addition to its relationship to judges' ratings of referential activity (Bucci & Maskit, 2006), as predicted by theory, WRAD has been shown to have a robust positive correlation with judges' ratings of the extent to which a given text segment represents the symbolizing phase of the referential process.  WRAD has similarly been shown to be negatively correlated with judges' ratings of the extent to which a segment represents the arousal, or reorganizing phases of the referential process (Kingsley, 2010).  These results are consistent with Bucci's theory and in combination with other measures produced by the Discourse Attributes Analysis Program (DAAP) serve as the foundation for increasingly sophisticated models of the referential process.  

Recently studies have been conducted showing that WRAD shares substantial variance with measures of episodic memory and that populations with demonstrated episodic memory impairments (Bucci, Maskit & Murphy, 2009) such as persons with Schizophrenia (Lewis, Murphy & Hanakawa, 2009) and Alzheimer's Dementia (Nelson & Polignano, 2009) are distinguishable from subjects without these disorders by the WRAD scores of their memory narratives. The WRAD has been found to be positively related to variation in voice pitch (Campanelli, 2008) and the measure has been shown to relate to measures of temporal sequences in narratives (Nelson, Moskowitz & Steiner, 2008).

Currently ongoing research has shown WRAD to have moderate temporal stability over a six week period.  Other ongoing studies have demonstrated that readers of narratives measured as having high and low WRAD scores tend to respond to these narratives with remarkably similar WRAD scores of their own. Similarly, conversational partners have significantly correlated WRAD scores when participating in the same conversation and no correlation in control conditions. (Murphy, 2010)

Taken together these results show WRAD to be a strong predictor of various components of Bucci's theories of referential activity and the referential process.  According to these theories language high in referential activity should represent moments when a speaker is emotionally engaged in an act of memory, fantasy, imagining, or some similar internal process, and is communicating this state to the listener.  Results from the above studies show WRAD to be a measure that: relates to judges ratings of such moments (Kingsley, 2010); relates to independent measures of narrative engagement and episodic memory strength (Bucci et al. 2009); is related to physiological characteristics of the speaker, namely variation of voice pitch (Campanelli, 2008); has an influence on the listener as demonstrated by the listener's response with a similar degree of WRAD; and seems to have some trait-like stability over a period of time, suggesting that some people tend to engage in more high RA activities than others (Murphy, 2010).

Other Languages

The RA scoring manual (Bucci et al. 1992) has been translated into Italian and Spanish, and substantial corpora of texts have been scored for RA in these languages.

Please see the paper by Roussos and O'Connell (2005), in Spanish, for details of the construction of the Spanish WRAD.

The Italian WRAD (IWRAD) was constructed using a new technique; in contrast to the English WRAD, it contains close to 10,000 words. We expect that this new technique, or something closely related to it, will be used for future versions of WRAD and similar weighted dictionaries.

See Mariani, Maskit, Bucci & DeCoro (2013) for information concerning the Italian DAAP and other referential process dictionaries in Italian. See also 


Bucci, W., Kabasakalian, R. & the RA Research Group (1992). Instructions for scoring Referential Activity (RA) in transcripts of spoken narrative texts. Ulm, Germany; Ulmer Textbank.

Bucci, W. (1997). Psychoanalysis and Cognitive Science: A multiple code theory. NY: Guilford Press.

Bucci, W. (2002). Referential Activity (RA): Scales and computer procedures. In Fonagy et al. (Eds.) An Open Door Review of Outcome Studies in Psychoanalysis; Second Edition. 192-195. International Psychoanalytical Association, London.

Bucci, W. (2014). Weighted Referential Activity Dictionary (WRAD). figshare.

Bucci, W. & Maskit, B. (2006). A weighted dictionary for Referential Activity. In J.G. Shanahan , Y. Qu, & J. Wiebe (Eds.) Computing Attitude and Affect in Text; Dordrecht, The Netherlands: Springer; pp. 49-60.

Bucci, W., Maskit, B. & Murphy, S. (2009, May). Measures of referential activity as indicators of episodic memory in different age groups and time contexts. Poster session at the Association for Psychological Science Annual Convention in San Francisco, CA.

Campanelli, L. (2008, June). Acoustic analysis of the voice: An implicit measure of emotional communication in the psychotherapeutic exchange. Paper presented at the 39th International Annual Meeting of The Society for Psychotherapy Research, Barcelona, Spain.

Kingsley, G. (2010). The clinical validation of measures of the Referential Process. Dissertation Abstracts International: Section B: The Sciences and Engineering, 5827.

Lewis, K., Murphy, S., Hanakawa, Y. (2009, May). Uncovering episodic memory through linguistic measures in schizophrenia. Poster session at the Association for Psychological Science Annual Convention in San Francisco, CA.

Mergenthaler, E. (1985). Textbank Systems: Computer science applied in the field of psychoanalysis. Springer, Heidelberg & New York.

Mergenthaler, E. (1998). CM - the Cycles Model software. (Version 1.0) Universität Ulm, Ulm, Germany.

Mergenthaler, E. & Bucci, W. (1999). Linking verbal and nonverbal representations: Computer analysis of Referential Activity. British Journal of Medical Psychology, 72, 339-354.

Murphy, S. (2010, June). Understading basic psychological processes in psychotherapy: The properties of measures of the Referential Process in varied contexts. Paper presented at the 41st International Annual Meeting of The Society for Psychotherapy Research, Asilomar, CA.

Nelson, K. L., Moskovitz, D. J., & Steiner, H. (2008). Narration and vividness as measures of event-specificity in autobiographical memory. Discourse Processes, 45, 195-209.

Nelson, K. & Polignano, M. (2009, May). Referential activity in negative episodic 'flashbulb' memories from patients. Poster session at the Association for Psychological Science Annual Convention in San Francisco CA.

Roussos, A. & O'Connel, M. (2005). Construcción de un diccionario ponderado en español para medir la Actividad Referencial. Revista del Instituto de Investgaciones de la Facultad de Psicología / UBA., 10 (2) pp. 99-119.