Measuring formality through word frequencies
The degree of formality of a text can be measured by adding the frequencies of context-independent words, subtracting the frequencies of context-dependent (deictic) pronouns) and normalizing the sum
|
Grouping words in the traditional grammatical categories (nouns, verbs, prepositions, etc.), this produces the following formula for formality (F):
F = (noun frequency + adjective freq. + preposition freq. + article freq. - pronoun freq. - verb freq. - adverb freq. - interjection freq. + 100)/2
Such a formula provides an easily applicable measure for ordering language from different sources, genres or styles according to their formality. The calculated formality corresponds generally quite well with intuitive expectations, e.g. official documents or scientific texts are more formal than personal letters, speeches are more formal than conversations, etc. For example, data for Dutch reveal the following ordering:
| context- independent categories | deictic categories | | | | |
| Nouns | Articles | Prep. | Adject. | Pron. | Verbs | Adv. | Conj. | Form. |
Oral Female | 10.40 | 6.89 | 5.86 | 8.09 | 16.95 | 19.35 | 17.45 | 7.47 | 38.7 |
Oral N.Acad. | 12.75 | 8.50 | 6.34 | 6.71 | 16.01 | 18.80 | 19.31 | 6.34 | 40.1 |
Oral Male | 11.48 | 8.16 | 6.69 | 7.63 | 15.84 | 18.45 | 16.53 | 7.05 | 41.6 |
Oral Acad. | 13.16 | 9.58 | 7.91 | 7.13 | 13.96 | 17.75 | 17.88 | 7.13 | 44.1 |
Novels | 18.52 | 10.48 | 10.26 | 10.00 | 13.25 | 20.62 | 10.47 | 6.06 | 52.5 |
Fam. Magaz. | 21.78 | 9.77 | 12.21 | 11.14 | 10.09 | 18.71 | 9.74 | 6.39 | 58.2 |
Magazines | 24.20 | 11.61 | 13.90 | 10.93 | 8.55 | 17.68 | 8.73 | 4.34 | 62.8 |
Scientific | 23.10 | 15.00 | 13.75 | 10.75 | 6.71 | 16.58 | 7.98 | 5.98 | 65.7 |
Newspapers | 25.97 | 14.68 | 14.54 | 10.57 | 5.62 | 16.69 | 7.21 | 4.70 | 68.1 |
|
Copyright© 1995 Principia Cybernetica -
Referencing this page
|
|
Author
F. Heylighen, & J-M. Dewaele
Date
Jul 13, 1995
|
|