The Italian WRAD (IWRAD)

Bernard Maskit

A new method of constructing empirically derived weighted dictionaries was developed by Bernard Maskit for the Italian version of the WRAD. We note that this new method has the potential of yielding dictionaries of greater sensitivity, and will be used in future constructions. Linguistic differences in the Italian vs. English languages also suggest that the Italian WRAD (IWRAD) is likely to contain different categories of words than were selected in the English WRAD

The sample texts

In order to develop the IWRAD, the first step was to model judges' ratings of RA based on the four scales. Two sets of Italian text segments that had been scored for RA by judges were used; one from a group based in Milan, the other from Rome. The leaders of the two groups of judges, Anna Bonfanti and Rachele Mariani, had been trained in RA scale scoring in New York, in English, and had reached reliability with each other on scoring the RA scales in Italian.

There were 11 data sets of texts from Milan, consisting of responses to TAT and Blacky images, early memories, reports of dreams, interviews and psychoanalytic session material. There were 21 data sets of texts from Rome consisting primarily of AAI and RAP interviews; there were also interviews with pregnant women. These data sets were separated into two groups. The first group, consisting of about three-quarters of the text material, was used for making the dictionary; the second group of texts was used to test the final IWRAD dictionary. There were 1,301 text segments, containing a total of 138,609 words (tokens), used to construct the dictionary. Then 508 text segments, containing 47,952 words (tokens), were used to test it.

The RA scale scores vary between 0 and 10, with 5 as the natural neutral value. In order to control for the differences in the scale scores from the two sets of scorers, the two sets of RA scores were rescaled so that, for each of them, the mean RA scale score became equal to 5.

The new method for building the dictionary

The text files used to make the dictionary were read (by the computer) and separated into distinct words (types). For each occurrence of each word, the RA scale score of the segment in which this word occurs was recorded; this resulted in a list of RA scale scores. For each word, we then compute three numbers, the number of occurrences of the word in the text sample, the mean of these RA scale scores, and the variance of these RA scale scores. As described below, we include a particular word in the dictionary if the number of occurrences lies above a certain number, called the token cut-off, and if the variance lies below another number, called the variance cut-off. For each word included in the dictionary the weight assigned to it is the mean RA scale score.

We made the choice of the cut-offs by making different tentative dictionaries with different cut-off values for both tokens and variance; computing the correlation between the IWRAD scores generated by these tentative dictionaries and the RA scale scores, and maximizing this correlation. The resulting IWRAD dictionary contained 9,918 items (words), and covered over 99% of the words in the texts.

The next step was the removal of all words that are idiosyncratic to our particular set of texts, such as names. We also removed numbers, dates, etc., for which there was no consistent dictionary value. After these adjustments were made, we obtained a final IWRAD dictionary containing 9,596 items, which covered about 98% of the text material used to construct and test it.

As a final test, we used this dictionary to compute the correlations between the RA scale scores and the Mean IWRAD scores, for the texts used to make the dictionary; for the texts that had been reserved for this test; for the texts scored for RA in Milan; for the texts scored for RA in Rome; and for all texts together. These results are reported in Table 1. The correlation of 0.32 (n = 508) for the texts reserved for this test is good, and the correlation of .75 (n = 1809) with all texts is very high. We note that the corresponding correlations obtained in testing the English WRAD were .38 (n = 113) for the texts reserved for the test, and .54 (n = 763) for the texts used to make the dictionary (Bucci and Maskit, 2005).

Table 1. IWRAD Test Data

Text

WRAD - RA correlation

Number of Texts

Coverage

Texts Used to Make Dictionary

.86

1,301

.99

Texts Reserved for Test

.32

508

.94

All Texts from Milan

.65

910

.98

All Texts from Rome

.79

899

.97

All Texts

.75

1,809

.98

See http://sites.google.com/site/italiandaap for information concerning the Italian DAAP and other referential process dictionaries in Italian.

REFERENCES

Bucci, W. & Maskit, B. (2006). A weighted dictionary for Referential Activity. In J.G. Shanahan , Y. Qu, & J. Wiebe (Eds.) Computing Attitude and Affect in Text; Dordrecht, The Netherlands: Springer; pp. 49-60.