Without rules, languages would be cumbersome to learn, use, and understand. Rules provide a single set of instructions that can generate many valid utterances. For example, in English, instead of memorising a separate singular and plural form for every noun, we only need to know singular forms and the general rule “add –s” to make plural nouns. All languages have these sorts of rules at all levels, from how a language uses sounds (phonology) to how a language forms sentences (syntax).
Despite the obvious advantages of linguistic rules, rules very often have exceptions: nouns like ox and goose do not adhere to the rule for forming plurals, and verbs like be and go form their tenses differently to most other verbs. Since rules are so efficient, why do we have exceptions? How do they emerge, how do they survive, and how are they related to rules?
How individuals manage rules and exceptions in language has long been a point of debate in cognitive science. Rule dynamics zooms away from the individual and focuses on the language as a dynamic system. Instead of asking how an individual learns rules and exceptions, we seek to understand how a language used across an entire population sustains exceptions among efficient, dominant rules.
The role of frequency and similarity
To examine how a language system maintains rules and exceptions, we made a detailed study of verbs in the past tense in a historical corpus. Data from the Corpus of Historical American English confirm that irregular verbs are more likely to have high frequency – in other words, we use verbs like be more than we use verbs like walk. Some irregular verbs have frequency that seems unusually low, but they seem to be able to sustain irregularity at least in part because they are phonologically close to higher frequency irregulars (e.g., spit can stay irregular because it sounds very similar to another irregular verb like sit).
Changes in regularity or irregularity of verbs over time occur in a particular band of mid-range frequencies, but these changes are relatively minor. For the most part, the system is highly stable: verbs that are regular remain regular, and verbs that are irregular remain irregular. Only a minority of “active” verbs are changing, and roughly the same number of verbs are moving from irregular to regular and visa versa.
The figure below shows verbs in the 1980-1989 decade of CoHA. Irregular verbs are classed by similarity (larger classes contain more verbs), and each class is placed according to the proportion of irregular tokens (I) and the summed frequency. For a few active classes, the temporal dynamics are also shown. For more, see the full paper.
Population turnover as a mechanism
Corpus data shows that more frequent verbs are more likely to be irregular, and that for the most part, irregular verbs are stable despite pressure from the regular rule. This sheds light on the dynamics of verb (ir)regularity work, but does not address the question of why they work in this way.
An agent-based model is ideal for addressing this question: why does irregularity exist primarily at high frequencies? The Naming Game is a well-established model wherein agents converge on shared names for meanings. To extend this to rules, we created a “regularity game” where agents play paired games to converge on shared rules for verbs. At each time step, two randomly chosen agents interact according to rules.
These rules could take many specific forms, but assuming they do not favour either the regular or irregular rule, frequency dependent irregularity only emerges under specific conditions. First, agents must be capable of having not just two states (regular and irregular), but three potential regularity states. The third state is a “mixed” regularity state where both regular and irregular forms are acceptable, much like some speakers of English might use sneaked and snuck for the past tense of sneak. Second, there must be some rate at which “child” agents, who have a natural preference for the regular rule, enter the population and replace “adult” agents. This model not only provides a basic framework for understanding how frequency dependent irregularity is sustained in language, but can also generalise to any three-state system with biased replacement more broadly. For more, read the full paper.