Some lexical items show remarkable properties: they may lack internal consistency, they may be externally inconsistent and – in a fascinating minority – these two characteristics may be linked. This last type is rare, but it deserves special attention for what it tells us about the nature of lexical information and the way this is stored. I first review briefly the typology of lexeme-internal splits; these reflect phenomena such as suppletion, periphrasis, heteroclisis and deponency, and they reveal higher-level patterns. Then I summarize the types of external splits (notably different case and agreement requirements), and the conditions on such splits. This outline of internal and external splits forms the essential basis for analysing the unusual instances where internal and external splits are linked. Demonstrating that the splits are linked is not straightforward, and so I give four types of argument to establish that particular splits are indeed linked. The examples, from a range of languages, show subtle variations, and these differences allow us to see clearly the hierarchy of ways in which featural information is associated with lexemes.

1 Essentials: internal and external splits

The best dictionary entries embody the secure accumulated results of linguistic research. If the entry for enjoy gives no information about the past tense, we infer that this verb is regular, there is no split in its morphological paradigm, and hence its past tense form is enjoyed. Conversely, the past of go is specified as went. This verb has a split in its paradigm, an internal split, induced by suppletion. Besides full suppletion, we might find the paradigm split by lesser alternations, augments, heteroclisis, defectiveness or semi-deponency. Turning to external requirements, a Russian dictionary can state simply that the preposition k ‘towards’ takes the dative case value; this is invariably true, which means that this preposition is externally consistent. Contrast this with Russian po ‘for, about, by’, for which much more must be specified: since it governs different case values (accusative, dative and locative), according to complex conditions. That is, it shows a split in its external requirements.

The underlying notion here may be expressed as the Lexeme Consistency Principle:

A lexeme’s internal structuring and its external requirement are both consistent.

Of course, there are exceptions to this principle, as in the examples already given. Its useful function is as a baseline from which we can calibrate the interesting diversity we actually find. This method, setting unambiguous baselines, and calibrating carefully from them, is a hallmark of Canonical Typology.Footnote 1 So we start from the assumption that, given a basic lexical entry, there is nothing more to be stated: lexemes are internally and externally consistent. That is the baseline set by the principle; it is simply like agreeing to measure length from zero. We shall revisit briefly the typology of exceptions to this principle, internal splits in §1.1 and external splits in §1.2. For clarity, each type of split will be justified independently, before we go on to the main topic, the relations between such internal and external splits.

1.1 Internal splits

Here is a clear instance of an internal split.

  1. (1)
    figure a

It is evident that (1) is not internally consistent.Footnote 2 It shows two quite different stems, hence it is an instance of suppletion. There is a canonical typology of inflectional phenomena, which allows us to integrate suppletion and other phenomena such as defectiveness and periphrasis into a comprehensive scheme (Corbett, 2015a:149–158). But we can go further, taking a more abstract view, and ask more generally about how paradigms are split, irrespective of the phenomenon which induces the split. Four criteria have been proposed, which are introduced briefly here (see Corbett, 2015a:158–179 for the detail):

  1. C1:

    Form versus composition/feature signature: example (1) concerns just the form. The paradigm has the same featural scheme (in terms of person and number) as other Italian verbs. Only the forms are unpredictable from the inflectional system of Italian. This contrasts with paradigms where the split is a matter of the feature signature: it involves different features (e.g. Russian verbs in the past tense mark number and gender while the present marks person and number).

  2. C2:

    Justification: morphomic versus motivated: the pattern in (1), namely first and second persons plural versus the remainder, is one that is internal to the morphology (morphomic); it contrasts with patterns which are motivated from outside the paradigm (for instance, in terms of semantics).

  3. C3:

    Specification of pattern: lexically specified to fully regular: this criterion highlights the interest of our example. In terms of the forms, Italian andare ‘go’ is exceptional, being the only verb with these suppletive stems (apart from the derived riandare ‘to go again’). More abstractly, however, there are other verbs with irregular patterning involving the same cells, and so Italian andare ‘go’ must be lexically specified but it is not unique.

  4. C4:

    Relevance: ‘internalandexternalsplits: in the canonical case such splits are internal to the lexeme, and have no effect in syntax. That is what we find with our Italian example. It behaves like other verbs, in this respect. We do find internal splits which have external relevance, and these will be our main focus.

This is not an arbitrary list of criteria; rather there is an underlying logic, a ‘substruction’ (Lazarsfeld, 1937:133–136; Round & Corbett, 2020:493–496), which establishes how the criteria cover the theoretical space (Corbett, 2015a:177–178). We have seen, then, that there are lexemes with internal splits in their paradigm, independent of any external split. But a key point from that typology for present purposes is Criterion 4. This criterion allows for the situation where a split may not just be a morphology-internal split but may correspond to an external split too.

1.2 External splits

We now turn to external splits, setting them out in their own right first, before looking at possible connections to internal splits. Turkish (tur) provides a clear instance:

Turkish (Kornfilt, 1997:423–424)Footnote 3

  1. (2)
    figure b
  1. (3)
    figure c

The postposition gibi ‘like’ has an external requirement, namely the case value of its governee. And there is a split in this requirement: in (2) we find the genitive, but in (3) the nominative. The case value to be used is determined by specific conditions (I give fuller detail on this split in Corbett, under review). One of these conditions is that the part of speech of the governee is crucial: with nouns the nominative is used, with pronouns there is a more complex choice between nominative and genitive. There is a split too in the governors. While various postpositions in Turkish govern a single case value, gibi ‘like’ (and three other postpositions, için ‘for’, -(y)lA/ile ‘with, by’ and kadar ‘as…as’) stand out as exceptional; these four are split from the remaining postpositions, since they have a split in their external requirement. This is true whether one takes a syntactic perspective, expecting adpositions to have consistent government requirements, or a semantic perspective, expecting case values (nominative and genitive here) to have consistent semantics.

This use of the term ‘split’, shares a unifying notion with other uses. We partition the lexicon into parts of speech (lexical categories); then there are regularities which can cross-cut the parts of speech, namely features. When we require an additional partition, in either dimension (parts of speech or features), we term it a split. Further, for given external relations we divide lexemes into primary (governor for case, controller for agreement) versus secondary (governee for case, target for agreement). The additional partitioning (split) may involve the primary lexeme and/or the secondary lexeme. To be specific, in the Turkish example, there is first a split involving the primary lexeme (the postposition which is the case governor). Turkish has a set of postpositions, justified by their common syntactic properties. We need to partition this part of speech further, since the special case government we are interested in is restricted to four postpositions (hence this is an inter-lexemic split). The secondary lexeme (the governee) is also affected: there is a split here since nouns behave differently from pronouns (an inter-lexemic split), and furthermore within the pronouns the number feature is invoked (a featural split). More generally, lexemes may be referenced in their entirety (thus a verb may govern a particular case whatever its own featural specification); or according to their featural specification (nouns require attributive modifiers to agree according to their number). The secondary lexeme equally has these two possibilities (for instances, nouns and pronouns are governed differently, or governees in the singular are treated differently from plurals). These four possibilities (induced by the oppositions entire lexeme vs feature specification, and primary vs secondary lexeme) allow us to build an elegant typology using this minimal machinery (Corbett, under review).

Clearly then, there are instances of external splits, independent of internal splits, and earlier we examined internal splits, also independently (§1.1). It is time to ask whether and how they may be related.

2 Internal and external splits: possible relations between them

We started from the idea that, given a lexical entry, there is nothing more to be stated: lexemes are internally and externally consistent. We then reviewed the interesting phenomena that deviate from these canonical baselines. Given that there are internal and external splits, and a typology for each, the next logical step is to ask about the relation between them. It is here that we call on the Extended Lexeme Consistency Principle, which has an additional final clause:

A lexeme’s internal structuring and its external requirement are both consistent.

Furthermore, they are consistent with each other.

This is the baseline against which we shall calibrate the examples we find. Since splits can be internal (§1.1) or external (§1.2), there are four logical possibilities for the relation, as in Table 1.

Table 1 Possible relations of internal and external splits

Let us take the possibilities in turn. The Russian preposition k ‘towards’ has no internal split; it cannot have, since Russian prepositions do not inflect. Its external requirements are consistent: it always governs the dative case. In these respects it is canonical and of no further interest here. That is the simplest type. More interestingly, Russian komnata ‘room’ inflects, and it does so without any split in its paradigm. Furthermore, it is externally consistent (the relevant external property is agreement, and this requirement is consistent). Thus it too has neither an internal nor an external split, though in principle both would be possible. In the second type, the verb go is split internally, having the suppletive stems go and went; but externally, it is consistent. Third, we consider items which have external splits, but no internal splits. There are various types of example here (to compare with type 1). We discussed Turkish gibi ‘like’ in §1.2. Like Russian k ‘towards’, the Russian preposition po ‘for’ cannot show an internal split, since it does not inflect; however, in contrast to k ‘towards’, po ‘for’ is a wonderful example for complex external splits, governing different cases according to the part of speech and number of the governee, as well as its particular function (see Muravenko, 2014; Corbett, under review, and references in both). Still, none of these could be internally split. A more interesting example therefore is Serbo-Croat gazda ‘landlord’ and many similar nouns. Internally they are consistent; they have no internal split. Yet externally they are masculine when singular, but hybrid when plural; that is, their agreements vary (masculine vs feminine) according to the agreement target (Marković, 1954; Corbett, 2015b:205–207 and references there, Franks, 2020:448–464). Thus they have an external split, but no internal one.

It is the fourth type which we shall concentrate on, lexemes which are split both internally and externally. The splits may or may not be linked. The less interesting type is that where there is simply no relation between the splits, and hence we cannot ask whether they are consistent with each other. As an example, Russian verbs have an internal split involving present vs past tense; the present realizes person and number, and the past has number and gender. The split involves the feature signature (see Criterion 1 in §1.1, and Corbett, 2015a:146, 157). Some have also an external split; thus ždat′ ‘wait for’ can take an accusative or genitive object. However, there is no relation between the internal, featural split and the external, government split. This may seem obvious, yet there are splits in government which are sensitive to the TAM properties of the verb. Hence we have to demonstrate that internal and external splits are linked, instance by instance; we cannot assume a link, nor indeed the lack of one. Where we can demonstrate a link, then we have the interesting issue of consistency between the splits.

It is worth stressing at this point that there are many examples of splits with an apparent internal-external link, but where (disappointingly) the relation proves to be at best indirect, on closer examination. An example of just such a mistaken link that is often cited by statisticians concerns Dutch storks: statistics for a series of springs suggest a correlation between the number of storks nesting and the number of human births (see, for instance, Sapsford, 2006). But there is no direct causal relationship. Rather there is a third variable, the weather nine months earlier. We can find similar examples involving splits. Consider the following data on nouns from inflection class I in Serbo-Croat:Footnote 4

  1. (4)
    figure d

The paradigm of the noun grad ‘city’ is split by the absence or presence of the augment -ov- as shown (Browne, 1993:319–320). The conditions determining which nouns take this augment are complex, interesting and changing (see Baerman et al., 2017:93–98). What is relevant here is that for all cells of the paradigm which take this augment, agreement is plural. We might be tempted to suggest that the agreements are causally linked to the augment. But this would be falling into the Dutch stork trap. There are other nouns, like prozor ‘window’, which do not take the augment. But they take exactly the same set of agreements as those with the augment. In other words, nouns like grad ‘city’ in (4) have an internal split in their paradigm, conditioned by number. Agreement is also conditioned by number, and equally so for nouns like grad ‘city’ and for prozor ‘window’. Number is the third variable here: there is no direct link between the augment (a purely morphological phenomenon) and the agreement; we would not argue that grad ‘city’ has an external split directly linked to its internal split. Rather both are linked to number. The effect of the third variable, number here, does mean that the data fit with the Extended Lexeme Consistency Principle. This is to be expected, since morphological and morphosyntactic factors align in many instances. But this is not our prime concern: rather we are interested in examples where there is demonstrably a direct linkage between internal and external splits, and where it therefore makes sense to analyse whether the splits are consistent.Footnote 5

To clarify what is needed, here is an operationalization of how to identify the phenomenon of interest. The operationalization consists of the following steps for analysing a given lexeme:

  1. (1)

    specify the lexeme’s internal split

    that is, give a morphosyntactic feature specification (e.g. singular vs plural), or

    list the cells (e.g. genitive singular vs remaining cells), for internal behaviour A

    vs internal behaviour B (e.g. suppletive stem A vs suppletive stem B);

  2. (2)

    specify the lexeme’s external split

    that is, give the different external behaviours: external behaviour A vs external

    behaviour B (e.g. takes feminine agreement vs takes masculine agreement);

  3. (3)

    establish that external behaviour A is found together with internal behaviour A, and external behaviour B is found together with internal behaviour B;

  4. (4)

    check that this external split is not found with lexemes which lack the internal split (the example of the Serbo-Croat augment (4) fails at this point).

A worked example is presented in §3.1.

So far we have discussed how to demonstrate a linkage between an internal and external split, namely that they co-vary. When such a linkage can be demonstrated, we can ask further whether there is evidence to demonstrate directionality, namely that one split determines the other. As we shall see, there are instances where there is such evidence.

Given that these linkages need careful analysis, we should ask what type of evidence would be relevant, for examples which are less straightforward than the augment in (4). In order to establish which data count, there are four lines of argument. The first is general plausibility: alternative explanations would require a degree of coincidence which is highly unlikely (§3). The other three arguments are more specific versions of this one. In §4 we Iook at overabundance, where alternative inflectional forms link to different external requirements; in §5 we move on to variation in time and space, and consider instances where differences in inflection are linked to different external requirements, as both vary over time or space. In §6 we consider the specific and revealing types of split in pluralia tantum nouns. §7 is devoted to the significance of all these splits, including a constraint on featural information, and an extension to the scope of the Agreement Hierarchy. As we analyse the examples, we shall find some evidence for internal splits determining external splits. Given this, we should bear in mind the Principle of Morphology-Free Syntax, according to which rules of syntax do not have access to purely morphological features (such as inflection class) nor to the internal structure of a word (such as whether it is split); it might appear that our instances represent challenges to the principle. As we shall see when we review the data in §7.4, the principle remains secure. General conclusions are presented in §8.

3 Argument 1: plausibility

In several instances, the relation of internal inflectional form with external requirement is simply too unlikely to be coincidental. The relation may be seen as unusual within the confines of the particular language and/or cross-linguistically. Some of our examples are heteroclites, and these can usefully be introduced together. ‘Heteroclite’ was used by classicists to mean different or irregular; thus Matthews (2007:323) ‘a noun whose inflection follows something other than the regular pattern is traditionally “heteroclite’’’. In modern linguistics its range has been narrowed to indicate specifically those items whose paradigms are split between two or more inflection classes (see Kaye, 2015:1–7, 15–29).Footnote 6 Typically heteroclites have only internal splits: we shall be interested in the small subset of them which also have external splits. However, whatever the type of internal split, the key point in this section is that the relation between the internal and the external split is striking.

3.1 Serbo-Croat oko ‘eye’

We begin with a clear-cut example of a noun with an internal split and a split in its external requirements:

  1. (5)
    figure e

The noun oko ‘eye’ has a clear internal split between singular and plural, and in two interrelated ways. First, it has an irregular stem alternation; the change of consonant kč [] before -i is not a synchronic alternation in the inflectional morphology; a different alternation would be expected in the plural dative, instrumental and locative, as we see in klupko ‘skein of yarn’, namely kc []. Second, it is a heteroclite: we find inflections from two different inflection classes, class IV in the singular and class III in the plural.Footnote 7 While there are several thousand nouns in inflection class IV and in III (Tošović, 2016), there are just two nouns like oko ‘eye’; the only other noun that behaves in the same way is uho / uvo ‘ear’, with plural uši.

What then of the external requirement? Gender assignment in Serbo-Croat depends first on semantics (nouns denoting females are feminine and those denoting males are masculine). That does not apply here. And second it follows inflection class (nouns of inflection class I are masculine, II and III feminine, and IV neuter). This is what we find with oko ‘eye’ and uxo ‘ear’. We expect class IV nouns to take neuter agreement, and class III to take feminine agreement, and so these two nouns switch gender between singular and plural (as Wechsler & Zlatić, 2003:40 put it, they co-vary ‘in lockstep across the singular/plural divide’):

Serbo-Croat

  1. (6)
    figure f
  1. (7)
    figure g

We should now consider the strength of the evidence provided by the co-occurrence of these two splits. The first point is that we can find the two types of split independently. We can have an internal split according to number, with no external effect. Within Serbo-Croat, in (5), we saw the split in the inflection of grad ‘town’, which has an internal split conditioned by number but this internal split has no additional external effect; for agreement purposes it behaves like any normal noun. Similar examples outside Serbo-Croat are not hard to find; for example the split in Slovenian človek ‘man, person’ plural ljudje (Corbett, 2007a:30) has no external effect. We can also find the external split without the internal. Within Serbo-Croat, as noted earlier (Table 1), there are nouns like gazda ‘landlord’ and vladika ‘bishop’, which are masculine in the singular but masculine and feminine in the plural (with Agreement Hierarchy effects, that is, agreements vary according to the agreement target, see §7.2). These have an external split but no internal split.

We see that the particular internal split of oko ‘eye’ has analogues both within Serbo-Croat and cross-linguistically. Similarly, parallels to the external split exist within and outside Serbo-Croat. The linkage of this internal and this external split is, however, rare within Serbo-Croat. There are just two nouns which behave this way, oko ‘eye’ and uho ‘ear’. It is not even all the paired body parts, since ruka ‘hand, arm’, and noga ‘foot, leg’ behave normally (internally in terms of inflection and externally in terms of agreement).

To check that this example meets our definition, let us run through the operationalization from §2.

  1. (1)

    specify the lexeme’s internal split:

    singular A vs plural B (evidence: stem alternation and inflection class);

  2. (2)

    specify the lexeme’s external split:

    neuter agreement A vs feminine agreement B;

  3. (3)

    establish that external behaviour A is found together with internal behaviour A, and external behaviour B is found together with internal behaviour B:

    yes: neuter agreement A is found with singular inflection A,

    and feminine agreement B is found with plural inflection B.

  4. (4)

    check that this external split is not found with lexemes which lack the internal split:

    yes: one other item shares the external split (uho ‘ear’) and it also has the

    internal split.

Thus oko ‘eye’ is indeed an example of what we are looking for; the operationalization makes that clear. A further argument that this link between the internal and external split is not coincidental is that it has persisted over time. The link goes back to the earliest attestations of Slavonic: in Old Church Slavonic (chu), oko ‘eye’ and uxo ‘ear’ were neuter in the singular; oči and uši were then the corresponding dual forms, and they took mainly feminine agreement (Vaillant, 1964:111–112, 168–169; Olander, 2015:189–191). According to Vaillant, with paired body parts the dual forms tended to supplant the plurals, which were rare. It is the dual forms oči and uši which have survived in Serbo-Croat as the plural forms. Thus this internal-external split is stable: it has persisted for some thousand years.Footnote 8

We can go further; it is not just that the two splits are linked. It is reasonable to conclude that the external split is determined by the internal split. The inflection class of the singular (IV) determines the gender of the singular (neuter), and the inflection class of the plural (III) determines the gender of the plural (feminine). This follows the gender assignment system of Serbo-Croat, mentioned earlier, which has been justified independently of the special case of oko ‘eye’ (Corbett, 2009b:152); in the full system, four inflection classes map onto three gender values. With oko ‘eye’ the directionality is particularly clear. We see this if we try the opposite prediction, gender to inflection. Suppose we were to specify the plural as irregularly feminine, and then attempt to predict the inflection class from that. We would expect to find the majority class II, and this would incorrectly predict the plural in -e, namely *oke, like ruke ‘hands, arms’. However, by specifying the plural as irregularly belonging to inflection class III, and predicting the gender value from this, we obtain the right prediction, namely feminine gender.

Serbo-Croat oko ‘eye’ demonstrates a linkage between an internal and an external split; furthermore it shows the directionality: the internal split determines the external split. I have spelled out this example in some detail; this will allow further cases to be presented more briefly.

3.2 Latin balneum ‘bath’

This is a venerable example of a heteroclite, cited by Baerman (2007:16), and discussed over two millennia earlier, by Varro (de Melo, 2019). Balneum ‘bath’ is comparable to Serbo-Croat oko ‘eye’, except that its stem does not change. (The regular nouns in (8) are from Risch, 1977:231.)

  1. (8)
    figure h

Varro’s comments on this heteroclite are given, translated and discussed by de Melo (2019:151, 514, tr. 515, 566 tr. 567; 1068, 1154–1155, 1257).Footnote 9 Varro does not comment on our main concern – the external requirements of balneum ‘bath’. However, the examples he gives illustrate nicely what is needed:

Latin (Varro: de Melo, 2019: 566)Footnote 10

  1. (9)
    figure i
  1. (10)
    figure j

Thus the heteroclite balneum ‘bath’ takes its singular and plural forms from different inflection classes,Footnote 11 and the gender values required are those we would expect according to these inflection classes. The internal split and the external requirement (agreement) co-vary.Footnote 12Balneum ‘bath’ is not quite unique: epulum ‘banquet, feast’, plural epulae, behaves similarly.

3.3 S\(\pmb{\upvarepsilon}\)l\(\pmb{\upvarepsilon\upvarepsilon}\) inquorate genders

The large Niger-Congo family provides numerous examples of external splits linked to internal splits. The linguistic tradition for these languages is slanted towards diachronic investigation, in a way that tends to mask the regularities of the synchronic systems; for recent discussion see Babou and Loporcaro (2016:3–6) and Bach (2018:225–233). However, the notion of ‘inquorate genders’ (agreement classes which comprise a small number of nouns, and whose agreements can be readily specified as an unusual combination of forms available for agreement with nouns with the normal gender values) is discussed, particularly since researchers working on this family are often careful to give good data on the number of nouns in each gender.

We look particularly at Sɛlɛɛ (snw), based on Agbetsoamedo (2014); see also Di Garbo and Agbetsoamedo (2018:185–186). Sɛlɛɛ is spoken in Santrokofi in the Volta Region of Ghana, and is one of the Ghana-Togo-Mountain languages. The classification of these languages is difficult, but they belong within Atlantic-Congo and ultimately within Niger-Congo. Sɛlɛɛ has five main gender values, as shown by various agreement targets, particularly within the nominal phrase. Table 2 gives the markers on the proximal demonstrative (-mle) for illustration:

Table 2 The main gender values of Sɛlɛɛ

Note the syncretisms in Table 2: genders I and IV share their singular agreement (hence the ordering in the table), while II and III share their plural agreement. Under ‘inflection class’ the numbers indicate the prefixal markers on nouns for singular and plural (and the inflection class is a strong predictor of gender). As example (11) shows, the prefixes on the nouns can be phonologically similar to the agreement marker, but this is not invariably the case (as seen in the plural of (14) and (15)).

Sɛlɛɛ: Agbetsoamedo (2014:110, from Acts 1.14)

  1. (11)
    figure k

Each of these five gender values comprises a substantial portion of the noun lexicon (at least 10% in Agbetsoamedo’s corpus of 552 nouns). Our interest is in those which do not fit into the main gender values. Consider, for instance, the noun kɔ-nɛɛ ‘hand, arm’. To make clear the similarity to previous examples from other languages, I present it with other regular nouns, with its small paradigm set out in a column (hence rotated from Table 2).

  1. (12)
    figure l

When singular, kɔ-nɛɛ ‘hand’ has the same inflection as nouns like kɔ-pa ‘machete’. When plural it takes the same marker as n-futu ‘stomachs’. Inflectionally, then, it is like the examples we have seen from other languages, except that it has a smaller paradigm. What then of the agreements? Here are the key examples:

Sɛlɛɛ: Yvonne Agbetsoamedo (personal communications 27 Oct. 2020 and 1 Dec. 2020)

  1. (13)
    figure m
  1. (14)
    figure n
  1. (15)
    figure o

These examples show that kɔ-nɛɛ ‘hand’, when singular, not only has the same marker as nouns like kɔ-pa ‘machete’, but also takes the same agreement. When plural it takes the same inflection as n-futu ‘stomachs’, and the same agreement. In other words, it is like a gender iii noun when singular and a gender v noun when plural. Why do we not simply add another gender value? When we look for similar nouns we find just three more: kɔ-kpa ‘leg’ (plural n-kpa) and their derived diminutive/derogatory nouns ka-nɛɛ-nyi or simply ka-nɛɛ ‘tiny hand/arm’ and ka-kpa-nyi or ka-kpɛɛ ‘tiny leg’. Thus there are only four nouns of this type, out of Agbetsoamedo’s corpus of 552 nouns. Seen in this way, the data from Sɛlɛɛ are fully comparable to those from Serbo-Croat and Latin (and there are further examples in Sɛlɛɛ, for which see Agbetsoamedo, 2014). The nouns in question are heteroclites, in that they take material from two different inflection classes (the noun paradigms are small in Sɛlɛɛ), and the resulting gender values are inquorate, with just four members.

The Sɛlɛɛ situation is not unusual; there are numerous comparable examples within Niger-Congo, and I list just three here. In Noni (nhu), an Eastern Beboid language of Cameroon, there are just six nouns with an irregular gender pairing (an external split), as detailed in Hyman (1981:8). We do not recognize an additional gender value (they are inquorate). Of these six, four have irregular plurals (an internal split). For Cicipu (awc), a Benue-Congo language of northwest Nigeria, McGill (2009:241–252) provides a clear discussion, giving the number of nouns in each gender; again the internal and external splits line up. Finally, Sagna (2019:594–597) describes Eegimaa (Banjal, bqj), where there are two inquorate genders with one or two members, where an unusual internal split (inflection) co-varies with an external split (agreement).

3.4 Scottish Gaelic muir ‘sea’

A remarkable instance is found in Scottish Gaelic (gla). In some dialects, such as that of Lewis, we find a change in gender value, conditioned by case. The item in question is muir ‘sea’ (Lamb, 2008:206):

  1. (16)
    figure p

The paradigm is internally split, with the genitive singular having an unpredictable form (irregularity in Scottish Gaelic nouns usually targets the genitive); the expected genitive singular would be: a’ mhuir (definite), and the indefinite muir (William Lamb, personal communication 9.10.2020). And this irregular cell brings an external split, in requiring a different gender value, as will be demonstrated. Use of the dative is restricted to government by some prepositions; the gender value here can be established only for older speakers. This is because the evidence for the gender value in the dative is adjectival agreement, but as William Lamb points out (personal communication) for younger speakers this particular gender distinction tends to be lost. Hence we concentrate on nominative and genitive. The stem mar- is also found in the plural. However, Scottish Gaelic does not distinguish gender in the plural; this is why gender values are given after the singular in (16).Footnote 13

Consider now the agreement data:

Scottish Gaelic (Joan MacDonald consultant, from Lewis; William Lamb, personal communications, November 2020)

  1. (17)
    figure q
  1. (18)
    figure r

For these examples, the first part is as quoted previously (Corbett, 2015a:170) and here we do indeed see a change of gender from am muir ‘the sea’ (masculine) to na mara ‘of the sea’ (feminine), as shown by the gender agreement of the article. This follows what is traditionally stated. However, I take being of a particular gender to mean controlling agreement in that gender value consistently (that is, under all syntactic conditions, for all agreement targets); this leads us to seek further evidence. Relative clauses provide no data. But when we consider the agreement of personal pronouns, as in the second part of these examples, we see that in both instances we have the masculine pronoun, with the feminine being unacceptable. Thus muir ‘sea’ does not simply change gender; rather, in the genitive (example (18)), it is a hybrid: it is feminine for attributive agreement but masculine for agreement of the personal pronoun.

The data are significant for how featural information is specified. Nevertheless, it is hard to tease out the different possibilities here. I therefore defer further discussion to §7.1.2 and §7.1.3, by which point further key data will be available.

3.5 Serbo-Croat dete ‘child’, deca ‘children’

The noun dete ‘child’, with the plural deca, is a dramatic instance of linked internal and external splits. This is its paradigm (Ekavian forms):

  1. (19)
    figure s

The singular of dete ‘child’ follows the small class of nouns like dugme ‘button’.Footnote 14 However, its plural is quite different. It has a different stem (and tc [] is not a regular alternation in inflection). But more significantly, the inflections are not normal plural inflections; rather they match a singular inflection type (class II), as shown by the noun žena ‘woman’ in (19). As a result, the ‘wrong’ distinctions are available: deca ‘children’ has a distinct form of the vocative (something that no regular plural noun has), and its instrumental is also distinct, while for regular plurals it is syncretic with the dative and locative (Corbett, 2007b:39).

This striking internal split is clearly linked to an unusual external split. In the singular the agreements are neuter, since the other nouns which inflect according to this class are neuter in the singular. But in the plural the agreements of deca ‘children’ are unusual and complex (Corbett, 1983:76–93; Wechsler & Zlatić, 2003:50–60, 206–219, 2012; Hristov, 2013:336–341, and references there). There are situations where unambiguously feminine singular agreement is found (there are several instances of these examples on the web):

  1. (20)
    figure t

Then there are others where the agreement is clearly plural:

  1. (21)
    figure u

In (21), the auxiliary is unambiguously plural, and there are good arguments for saying that the gender/number form of the participle došla is neuter plural (first, the evidence from conjoining, and second the constraint of the Predicate Hierarchy, Corbett, 1983:77–78 and 87). If personal pronouns are taken as agreement targets, then here we find neuter plural ona or masculine plural oni, dependent on the type of reading, with feminine plural possible if the children are all girls (Wechsler & Zlatić, 2003:51, 200, 205–211). In sum, deca ‘children’ can control different types of agreement; they differ according to the agreement target, and their distribution is subject to the Agreement Hierarchy (Corbett, 1983:76–88). The special interest of the effect of the case of the agreement target is discussed in Corbett (under review). The key point is that the remarkable agreement possibilities of deca ‘children’ are directly attributable to its remarkable internal split.Footnote 15

3.6 What we learn from the five examples for the plausibility argument

The case studies in §3 come from a range of languages. In each there is co-variance between the internal split and the external split. Furthermore, there is evidence as to the nature of this linkage. This is clearest with Serbo-Croat oko ‘eye’. The internal split involves an irregular (semi-suppletive) stem alternation. The resulting set of the noun’s inflected forms do not match one of the regular inflectional patterns. This indicates that it is the internal split which determines the external one, namely the split in gender. Specifying an irregular set of gender values would not lead to the prediction of the formal irregularity, but predictions from inflection class to gender, for the parts of the split, are straightforward. Serbo-Croat dete ‘child’, deca ‘children’ is largely comparable. Again there is an irregular (semi-suppletive) stem alternation, and the set of inflected forms is highly unusual, showing that it is the internal split which determines the external one, namely the split in gender and number. The prediction from inflection class to gender in the singular is normal; in the plural, the strange set of agreements is predictable to a limited degree. Scottish Gaelic muir ‘sea’ belongs next, since it too has an irregular (semi-suppletive) stem alternation, showing a morphomic pattern. The evidence is harder to interpret, but it appears that the internal split determines the external one. Latin balneum ‘bath’ and Sɛlɛɛ inquorate genders also show co-variance between the internal split and the external split. They do not provide an additional argument for directionality. That is, in each language there is a good argument that in general gender depends on inflection class (except for nouns whose gender is assigned according to semantic principles); however, the nouns in question, where only the pairing of inflection classes is irregular, offer no additional argument.

4 Argument 2: overabundance

Items which show overabundance (Thornton 2019a, 2019b) provide an additional type of argument. In the canonical situation each lexeme has a single realization for each featural specification; overabundance is the situation in which a given lexeme has more than one possibility in a particular cell or cells. For example, English burn has the past participle burned and burnt. If a lexeme has forms from different inflection classes, which are cell-mates, and these different possibilities also have different external requirements, this is a strong indicator that the two splits are linked. We focus on two main examples, which are interestingly different in nature.

4.1 Polish ręka ‘hand, arm’

Polish provides a cogent instance of an external split linked to internal overabundance; see the paradigm of ręka ‘hand, arm’ in (22):Footnote 16

  1. (22)
    figure v

We see two instances of overabundance. The one in the instrumental plural need not detain us; while rękami is the regular form and rękoma is highly irregular, the patterns of systematic syncretism mean that there is no relevant choice for the possible agreement targets. That is, there is no feasible external split. The instance of overabundance in the locative singular, on the other hand, is highly significant. The rest of the paradigm implies a noun of inflection class II, like książka ‘book’, and hence feminine. The consonant alternation in ręce is expected in the locative singular. On the other hand, the cell-mate ręku is totally unexpected. Its origin is in the dual, but the form has been taken over as a locative singular; this form would fit into inflection class I (predicting masculine) like rok ‘year’, for instance, or inflection class IV (predicting neuter), like biurko ‘desk’, given that we have a velar-final stem (Rothstein, 1993:698–699). In the locative singular, masculine and neuter share the same agreement form (distinct from the feminine).

We examine each locative singular with an attributive modifier. The numbers to the right are the number of examples in the National Corpus of Polish (http://nkjp.pl/poliqarp/), searched 28 May 2020, for exactly these phrases:Footnote 17

Polish

  1. (23)
    figure w
  1. (24)
    figure x

The form that would be expected for a regular noun, that in (23), has feminine agreement, as would be predicted. Its irregular cell-mate in (24) has agreement forms which are clearly not feminine. Switching the agreements leads to unacceptable variants: consultants do not accept them, and there are no such examples in the corpus (checked 29 October 2020). Thus we see that there is one cell in the paradigm (the locative singular) where there are cell-mates which could induce different agreement; and we do indeed find different agreements. These are determined ultimately by the inflection class of the cell-mates, through the gender assignment rules. Elsewhere in the paradigm agreements are as expected. Thus we have an internal split which determines an external split in agreement requirements.

The situation is actually more complex, in a way which appears not to have been noted previously. The locative singular forms are not simply feminine or masculine/neuter, as we see when we check other agreement targets. The locative cannot stand in subject position, so we cannot look for predicate agreement. We examine therefore the agreement of the relative and personal pronoun:Footnote 18

  1. (25)
    figure y

The personal pronoun is always feminine as in (25), and consultants had the same judgement for the relative pronoun. (A web example of the latter is: w ręku, która służy do pisania ‘in the hand, which (sg.f) serves for writing’.)

We shall consider further the significance of these data in §7.1.2 and §7.1.3. For now, note that the effect within the nominal phrase is not constrained by word order (ręku takes the same agreement for postposed as well as preposed modifiers).

4.2 Serbo-Croat dokument ‘document’

Here we investigate a group of nouns with a highly significant split. Tošović (2016:97) lists 15 nouns with similar behaviour; they differ slightly, and we shall focus on dokument ‘document’, since it occurs relatively frequently, and judgements are clear.Footnote 19 The data come mainly from consultant work with Ljubomir Popović (Belgrade), also from Andrija Petrović, Marko Simonović, advice from Wayles Browne, and from corpus work. As will become clear, there are differences according to area, and I shall concentrate on the situation in the east. Here are the forms:

  1. (26)
    figure z

We see that in the nominative and accusative plural, this noun is overabundant. We find the forms expected for a class I noun like prozor ‘window’ (as in (4) above), but also the forms for a class IV noun like klupko ‘skein of yarn’ in (5) above. This internal split is linked to an external split: the internal split and the agreements co-vary. Class I nouns are masculine and dokumenti takes masculine plural agreement, class IV nouns are neuter, and dokumenta takes neuter agreements, as these examples show:

  1. (27)
    figure aa
  1. (28)
    figure ab

We see that the different inflections in (27) and (28) give rise to different agreements: the internal split (overabundance in inflection class) co-varies with an external split (different agreements required). In an early discussion of dokument ‘document’ and the few similar nouns, Ignjatović (1963:216–219) provides textual examples from 1879–1960 and agreement targets always take the gender value implied by the inflection class, as indicated in (26) above.

We have examined the overabundant cells in (26), and seen that the internal alternatives co-vary with the external alternatives. We must now consider the remaining part of the plural paradigm, the oblique cases, whose forms are shared by inflection classes I and IV. There are two reasonable hypotheses here.

Hypothesis I:

the unusual behaviour of this noun (and similar ones) is restricted to the nominative and accusative plural. It is basically a class I noun, and there is nothing unusual about its oblique plural cells.

Hypothesis II:

gender is by default a property of the lexeme. If this default is overridden, we then expect gender to be a property of the number sub-paradigms. The implication is that the plural sub-paradigm would be masculine or neuter, and there should be a difference in agreement with the oblique case forms.

For choosing between these two hypotheses, attributive modifiers will not help, since they show systematic syncretism across the genders in the oblique plural. However, we can use the relative pronoun, in a direct case, to tease apart the two hypotheses:

  1. (29)
    figure ac

Both forms are accepted. This means that the noun, when in the oblique case values in the plural, can be masculine or neuter, as in the direct cases. This remarkable situation is confirmed by numerous examples found in the Serbian Web Corpus (srWaC, accessed 23.11.2020). We have masculine and neuter possible through the plural: in the direct cases with overt overabundance, and in the oblique cases with ‘covert’ overabundance. It is not that the oblique cases are simply unspecified for gender, since the feminine relative pronoun *koje is excluded. All this shows that Hypothesis II is correct.

Consultants suggest that the forms in -a and the associated neuter agreement are more prevalent in the east than the west. This makes sense, since Ignjatović (1963:218) points out that the -a plural is ultimately of Latin origin, entering the language through administrative, legal or ecclesiastical terminology.Footnote 20 Serbian was traditionally open to borrowings where Croatian would create neologisms (Klajn, 2001:90–91). Nevertheless, we find several instances also in the west: in the Croatian National Corpus (Hrvatski nacionalni korpus v 3.0 beta, accessed 23.11.2020), and in the Croatian web corpus (hrWaC, accessed 23.11.2020).

From the perspective of borrowing too, the phenomenon is noteworthy. Nouns originally from Latin, with singular in -um and plural in -a, of neuter gender, were borrowed as internationalisms (via French or German, for instance). The singular -um is not a Serbo-Croat inflection, so it was either treated as a part of the stem, or dropped (the latter especially in the west).Footnote 21 Either way this gave a singular stem fitting readily into inflection class I. The plural -a is a regular inflection (from inflection class IV), but not one that previously combined with the singular of inflection class I, hence there arose the combination of heteroclisis and overabundance we have analysed. This is something outside the current typologies of borrowing (Arkadiev & Kozhanov, 2021). Consider, in particular, Gardani’s detailed typology (2020:270–272), based on the established notions of MATerial and PATtern borrowing. In our example we find MAT borrowing of the specific items, but no PAT borrowing. The pattern which results is novel (even though the -a inflection was well established in Serbo-Croat): a consonant-final stem with no inflection in the nominative singular is paired with -a in the nominative plural; given these forms, regular gender assignment gives a masculine-neuter pairing. What is special here is that the borrowing triggered a pattern of covert overabundance: the native oblique plural inflections in the nouns in question are treated as belonging to two different inflection classes, within the same nouns, and controlling the two appropriate gender values.

4.3 What we learn from the examples showing overabundance

The Polish and Serbo-Croat examples are intersections of heteroclisis and overabundance. They differ in ways that will prove crucial for what we can learn from internal-external splits.

In the paradigm of the Polish noun ręka ‘hand, arm’, there is a clear split between the cells belonging to inflection class II and the locative singular variant form. This is ‘distinct’ heteroclisis. The key cell, the locative singular, has two available forms, two ‘cell-mates’, and they belong to two different inflection classes. Those inflection classes assign different values: inflection class II assigns feminine, while inflection class I assigns masculine and IV assigns neuter. And this is what we saw in examples (23) and (24): the form belonging to inflection class II, as the rest of the paradigm, induces feminine agreement, and that belonging to inflection class I / IV induces masculine / neuter agreement.

Serbo-Croat dokument ‘document’ is a different type of heteroclite in that there are cells which are ‘shared’ between the contributing inflection classes. The plural oblique case forms could belong to inflection class I, which would induce gender value masculine, or to inflection class IV, which would induce gender value neuter. And as examples (29a) and (29b) show, we find both gender values. Again, there is a straightforward assignment of gender value according to inflection class. Since the mapping of inflection classes to gender values is typically many to one, the evidence from these heteroclite nouns can be readily accommodated. If one tried to analyse gender as the predictor, then it is not clear how specifying an irregular gender value could lead to the right prediction of inflection class. Hence we see again that the internal split determines the external one.

At this point we should ask why such options are not more frequently available. Since the Serbo-Croat plural inflection -ima (for dative, instrumental and locative) is shared by inflection classes I, III and IV (the genitive differs), we can consider why we do not find alternative agreements for all nouns in these classes (not, of course, for attributive modifiers since they show syncretism for gender, but for the relative pronoun) in examples comparable to (29a) and (29b). The reason is that, by default, gender is assigned to nouns, as complete lexemes. Thus Serbo-Croat grad ‘city’ and prozor ‘window’ in (4) above inflect according to inflection class I, and so are exclusively masculine, even for forms that are shared with other inflection classes. This holds for the lexeme, and whether particular inflections are unique to a given inflection class, or are shared across classes, has no effect. Dokument ‘document’ is different in that it has cells with overtly overabundant forms which belong to different inflection classes. These are the nominative and accusative plural, where the forms dokumenti and dokumente belong to inflection class I (assigning masculine), while dokumenta, which can be nominative and accusative plural, belongs to inflection class IV (assigning neuter). The remaining plural cells fit with inflection class I and with class IV. For this noun, then, we have an override to the usual pattern of assigning gender to the entire lexeme. This means that dokument is masculine, except that this specification is overridden for the whole plural sub-paradigm. In the plural sub-paradigm the overtly overabundant nominative and accusative induce masculine or neuter agreement, according to the inflection class selected. The remaining cells of the plural sub-paradigm are covertly overabundant; they fit with both inflection classes and take both the corresponding gender values; this is seen given an agreement target like the relative pronoun, which can make the relevant distinction. Conversely, in the singular, forms like dokumentom (instrumental) are ambiguous between inflection classes, but corpus work confirms that only masculine agreement is found (for instance, the masculine relative pronoun koji). Thus there is no override in the singular. Here the noun is of inflection class I, and hence masculine. All this shows again the general point that agreement deals in feature values, not in particular inflections or morphemes.

Polish ręka ‘hand, arm’ is a more restricted instance than Serbo-Croat dokument ‘document’, since the override is for the locative singular only. The rest of the singular sub-paradigm belongs unambiguously to inflection class II. Therefore, only the locative singular shows the split in external gender requirement. The point that we need to take further, once we have examined other relevant examples, is that these two examples differ in the nature of the feature specification. In the plural, the variant dokumenta ‘documents’ is straightforwardly neuter: all agreement targets stand in the neuter. In contrast, with Polish ręka ‘hand’ in the locative singular form ręku, targets differ. Attributive modifiers are masculine / neuter, while others are feminine. Rather than having a straightforward gender specification, it is a hybrid. We return to this issue in §7.1.2 and §7.1.3 below. Further examples, which underpin the evidence provided by overabundance, are given in Appendix 1; these involve Polish, and Italian and Italo-Romance.

5 Argument 3: variation in time and space

Variation through time and across space provides a third type of argument. Suppose that at the level of individual lexemes we can observe forms from one inflection class replacing those of another, and going hand-in-hand with this a change in the external requirement, then this is strong evidence for a linkage. Our evidence here comes from Asia Minor Greek (§5.1). Equally if there is variation across space, and the two types of split co-vary, this too is good evidence for a linkage, and here we examine data from Old Frisian (§5.2). Scattered remnants, preserving the internal-external link, provide further support, as shown by data from Slovenian (Appendix 2).

5.1 Asia Minor Greek heteroclites

The Greek varieties (grk) spoken in eastern Asia Minor until 1923 provide more extensive evidence of internal-external linkage than in most of the examples we have discussed. We find suffixes invading paradigms where they did not figure originally. These suffixes combine a derivational element (originally diminutive, but this meaning had long been lost) and an inflectional element. This creates heteroclitic nouns, and their agreements link to the mix of forms. The data are from Petros Karatsareas (2011a:228–253, 2011b, and personal communications 7 and 8 Aug. 2020 and 21 Jan. 2021). The suffixes in question are the genitive singular -iu, the genitive plural -ion and the nominative / accusative plural -ia; these belong to an inflection class associated with neuter nouns, but they have entered nouns in inflection classes whose members are originally masculine or feminine. The conditions sound ideal for our purposes. However, when we come to the external split, we face two problems in Asia Minor Greek. First, in some varieties, notably in Cappadocian, gender distinctions have been lost. Second, in varieties that preserve gender distinctions to some degree, gender is not distinguished in the genitive plural. This means that the best place to look is the nominative / accusative plural -ia, where it has become part of paradigms that were otherwise masculine or feminine, and in those dialects that preserve gender distinctions, for instance in Pontic:

Pontic Greek (Drettas, 1997:129, glossing from pp. 110–111, 119, 124)

  1. (30)
    figure ad
  1. (31)
    figure ae

In (30) we have a noun that belongs to an inflection class inducing masculine agreement, but the nominative plural inflection -ia leads to consistent neuter plural agreement (not just of the clitic article as here, but with other agreement targets too, Petros Karatsareas, personal communication). Similarly in (31) the noun belongs to an inflection class that normally leads to feminine agreement, but with -ia in the nominative and accusative plural we again find consistent neuter agreement.

There are two key points to retain from the Greek data. First, the phenomenon is widespread in Pontic, but it is also found in other Greek dialects and also in the standard language (see sources above). So this is a more general phenomenon than several of those discussed earlier. And second, the agreements are consistently neuter plural (in dialects where gender is distinguished), for all agreement targets. No Agreement Hierarchy effects have been observed. Since we are dealing with just the nominative plural (and accusative syncretic with the nominative), how then does this fit with the data from Scottish Gaelic and from Polish? The point is that gender is not distinguished in the genitive plural, hence the gender of the nominative / accusative plural is the gender of the plural. For nouns which take -ia, then, this leads to neuter gender being specified for the ‘entire’ plural sub-paradigm (analogous to what we saw for Serbo-Croat dokument ‘document’). Indeed it is part of a trend for the neuter to expand (for non-human nouns) in Asia Minor Greek (Karatsareas, 2011b:114). We should not be misled by the small paradigm here: if a noun is neuter ‘only’ in the nominative and accusative plural, it is neuter in the plural. Hence these Asia Minor Greek nouns fit well into the bigger picture: we see a singular-plural split, albeit in a small paradigm, hence the plural is consistently neuter.

5.2 Old Frisian wīf ‘woman, wife’

Old Frisian (ofs) wīf ‘woman, wife’ allows us to see internal splits (the paradigm is split between inflection classes) and external splits (in gender agreement) co-varying in step with each other. The data come from the careful study by Fleischer and Widmer (2016) and personal communications from Jürg Fleischer (4 and 6 October 2020). The texts were selected from classical Old East Frisian legal manuscripts, written originally between the end of the thirteenth century and the middle of the fifteenth century (Bremmer, 2009:13–14). The item of note, Old Frisian wīf ‘woman, wife’, is a hybrid. It takes a feminine personal pronoun. For the relative pronoun and the predicate, Fleischer & Widmer are not aware of any examples with clear gender agreement (2016:221). However, in attributive position there are examples of neuter and of feminine agreement. This is a considerable move to the feminine as compared to the situation in Old High German (goh, Fleischer, 2012:175, 177, 178, discussed in Corbett, 2015b), and indeed as compared to modern German, while staying in accord with the Agreement Hierarchy (§7.2). For the noun’s external requirements, then, we look at attributive agreement and the linkage to the inflection of this noun. The essential information is given in Table 3 (modified from Fleischer & Widmer, 2016:223):Footnote 22

Table 3 Inflection of Old Frisian wīf ‘woman, wife’ and attributive gender agreement

Fleischer and Widmer (2016:223) give the inflections of wīf ‘woman, wife’, and for comparison the forms of the relevant inflection classes (neuter a-stems, feminine ō-stems and feminine i-stems), since the options are neuter and feminine.Footnote 23 In the nominative and accusative (syncretic for wīf ‘woman, wife’ but with some distinct forms elsewhere), the former system is preserved: no inflection (bare stem) and neuter agreement. In the dative case the inflections are ambiguous, shared across different inflection classes. Here we find both the earlier agreement, neuter (syntactic agreement) and the innovative feminine agreement (semantic).

The key case for our purposes is the genitive. This case value shows that there is an internal split in the paradigm. We find inflections from different inflection classes, -e from inflection classes that would predict feminine gender, and -es from the original inflection class, leading to neuter gender. Fleischer and Widmer (2016:225) give the following examples of genitives (with unattested forms in standardized orthography):

Old Frisian

a-stem inflection with neuter agreement (not *ō-stem inflection with neuter agreement)

  1. (32)
    figure af

ō-stem inflection with feminine agreement (not *a-stem inflection with feminine agreement)

  1. (33)
    figure ag

When wīf ‘woman, wife’ is inflected according to one inflection class (the a-stem inflection) we find only neuter agreement (32) in the nominal phrase, and when according to a different inflection class (the ō-stem inflection) there is only feminine agreement (33). Thus we see evidence for a linkage between inflection class and gender.

Thus far we have been generalizing over time and space. But we can do better, and look at the different manuscripts investigated. Table 4 is modified from Fleischer and Widmer (2016:227); I have arranged the manuscripts in chronological order (following Bremmer, 2009:13), running from the First Brokmer Manuscript (after 1276 but before circa 1300) to the Second Emsigo Manuscript (shortly after 1450).

Table 4 Old Frisian wīf ‘woman, wife’: examples with unambiguous indication of gender

In Table 4 the nominative/accusative shows a clear picture: the original inflectional form is preserved and attributive agreement is always neuter, in all the manuscripts. In the dative, the inflectional marker could be from different inflection classes, and we find several instances of the conservative neuter and innovative feminine agreement. In the genitive, the key case, we see a clear linkage: the agreement is always that predicted from the inflection class.Footnote 24

In broad-brush terms, the Old Frisian hybrid noun wīf ‘woman, wife’ is moving towards becoming a noun of feminine gender. But the progression is not straightforward, and arranging the examples in date order, as in Table 4, does not suggest a simple development. However, the extant manuscripts are distributed in space as well as time. As Fleischer & Widmer show, the data make better sense if interpreted geographically, as in Table 5, modified from Fleischer and Widmer (2016:229):

Table 5 Old Frisian wīf ‘woman, wife’: geographical distribution of examples from Table 4

In Table 5 the same manuscripts are arranged now according to their presumed place of origin (based on Bremmer, 2009:16). Innovative feminine agreement starts with the dative, where the inflectional morphology is ambiguous, and then may appear also in the genitive. Moreover the development is least advanced in the manuscripts originating in western areas and most advanced in the east.Footnote 25 There is an apparent issue, in that the easternmost documents, Riustring 1 and 2, do not show instances of feminine agreement in the genitive. This apparent issue is actually grist to our mill in that, as Fleischer and Widmer (2016:229) explain, in these documents we do not find genitives in -e. In harmony with this inflectional difference, we do not find feminine agreement in these documents either.

The careful analysis of the Old Frisian legal texts by Fleischer & Widmer, for all the difficulties which such data present, reveals a clear picture, one which fits well with our typology. We see a noun whose inflectional morphology is split (with variability according to dialect area), and whose hybrid agreements are linked to this split. This is particularly notable in the genitive where forms from different inflection classes are found. These classes induce different gender values and, indeed, the gender of attributive modifiers co-varies with the inflection class. The pattern is seen most clearly when we look at the geographical distribution.

5.3 What we learn from the examples showing variation in time and space

Diachrony, as we saw with Asia Minor Greek heteroclites, and geographical distribution, as in Old Frisian wīf ‘woman, wife’, both provide clear evidence supporting the claim that the internal and external splits co-vary. Both too support the claim that the internal split determines the external. In Asia Minor Greek, it is the incursion of the specific inflection, from outside the paradigm, that leads to the internal split, and creates the external split (gender agreement). The fact that this occurs with nouns which were previously masculine and others which were feminine shows that this is first a matter of inflection and then of gender. With Old Frisian wīf ‘woman, wife’ we see a step by step shift in inflection class, with the inflectional form which is shared by the inflection classes allowing both gender values, and the incursive inflectional form bringing the new gender value (feminine). Given the distribution of the Old Frisian manuscripts, it is their geographical distribution that allows us to see this effect most clearly.

6 Argument 4: pluralia tantum nouns

Several of our case studies have involved few examples (even just one) in a single language; the fact that these rarities nevertheless support significant generalizations (as we see in §7) is therefore impressive. We now turn to something found in larger groups of nouns in many languages, namely pluralia tantum nouns These nouns show a particular type of split, between the existing and missing part of the paradigm. Their properties have recently been analysed in detail (Corbett, 2019), so this section is short. What is important is that these nouns are of various types (not all are like scissors), and the differences prove significant for our purposes.

Consider first Russian sani ‘sledge’ (Zaliznjak, 1967/2002:57–61, 75–80), one of several similar nouns. Its behaviour is comparable to English scissors, and since it is found within a nominal system which inflects for (at least) six case values and two number values, the lack of singular forms is evident:

  1. (34)
    figure ah

Sani ‘sledge’ has regular forms for the plural, and has an internal split, since it has no regular singular forms. And it has a linked external split, since it requires plural agreement:

Russian

  1. (35)
    figure ai

Like English scissors, Russian sani ‘sledge’ can be used of one entity or more than one, but it always takes plural agreement. Indeed, the numeral odin ‘one’ is used in the plural with such nouns:

  1. (36)
    figure aj

Such nouns lack a morphological singular, and this lack links to the syntax, where they are consistently plural too. Their mismatch is between their morphology and syntax on the one hand (where only plural is possible) and the morphosemantics of number on the other (since reference can be to one or more than one entity). Nouns like scissors and sani ‘sledge’ represent a common type, but it is not the only one. Thus Tsez xexbi ‘child(ren)’ has only plural forms (like Russian sani ‘sledge’). Externally, it behaves differently. For one child it takes singular agreement, and for more than one it takes plural agreement (Comrie, 2001:381–383, discussed in Corbett, 2007b:31–38 and 2019:55–59). It is unremarkable in terms of morphosemantics and syntax: it is only its inflectional morphology which is unusual. Tsez xexbi ‘child(ren)’ has an internal but no external split. Thus pluralia tantum nouns may have an external split linked to an internal split, as shown by English scissors and Russian sani ‘sledge’ (Type 4 in Table 1), or they may have an internal but no external split, as with Tsez xexbi ‘child(ren)’ (Type 2 in Table 1).Footnote 26

There is a further important distinction to be drawn here. Pluralia tantum nouns like scissors and sani ‘sledge’ take consistent plural agreement. But a plurale tantum noun can also be a hybrid. Karlsson (1968) reported that in Finnish (fin) proper nouns like Yhdysvallat ‘the United States’, Filippiinit ‘the Philippines’ normally took a singular predicate. In attributive position, however, both singular and plural adjectives were possible, with singular on the increase. This trend has continued so that according to Hannu Tommola (personal communications, 5 June 2017), the singular is now usual, though occasional plurals are found in attributive position. The fact that pluralia tantum nouns can be hybrids will prove useful in §7.3.Footnote 27

Pluralia tantum nouns also provide a different argument bearing on directionality within the linkage of splits. We find examples as with English scissors and Russian sani ‘sledge’, where there is an internal split, and a co-varying external split. Then we find examples like Tsez xexbi ‘child(ren)’, where there is an internal but no co-varying external split. But we do not find the converse of Tsez, that is, a language where some nouns have to be lexically marked as taking only plural agreement, while having no internal split (that is, having a normal singular and plural), see Corbett (2000:66–67). This distribution of possibilities implies that it is the internal split determining the external one.

7 Significance: what we learn about features

When evaluating the significance of the examples we have analysed, we must bear in mind that the data are scarce. Indeed finding examples is like panning for gold. A majority have been found in Indo-European languages, simply thanks to the long history of previous work, leading to reference grammars and large dictionaries, which greatly improve the chances of finding relevant data. We also saw evidence from Sɛlɛɛ (§3.3); there are further instances in Niger-Congo but of the same general type. We also noted briefly relevant data from Finnish (§6). A second property of the data is that nouns but not verbs are represented; related to this, our examples involve agreement rather than government.Footnote 28 We need a fuller range of languages here. The examples that have been found, despite these issues, are highly significant. First, they help establish the locus of featural information (§7.1), and indicate how the scope of the Agreement Hierarchy can be extended (§7.2). They require us to be clear about the role of frequency (§7.3), after which we can assess the importance of the data for lexicalism and morphology-free syntax (§7.4).

7.1 The locus of featural information

We can make sense of our diverse examples by starting from the assumption that lexical information is specified, by default, at the level of the lexeme. For instance, a lexeme has a particular gender value (and this may often be inferred from its inflection class). This is what we find in items with no external split. The interesting examples involve overrides to this general default, involving either a sub-paradigm (§7.1.1) or individual cells (§7.1.2).

7.1.1 Featural information at the level of the sub-paradigm

We saw that with Serbo-Croat oko ‘eye’ (§3.1), its singular is straightforwardly neuter and its plural feminine. This holds for all agreement targets. Similarly Latin balneum ‘bath’ (§3.2) is neuter in the singular and feminine in the plural. Serbo-Croat dokument ‘document’ (§4.2) has the complication that it is overabundant, having two possible forms for the nominative and accusative plural. As we saw, it is in fact the whole plural sub-paradigm which has the two corresponding gender values: even in the cases with a single (ambiguous) form, both gender values are available for agreement. Pluralia tantum nouns like Russian sani ‘sledge’ (§6) also belong here: they have consistent agreement involving the plural sub-paradigm.

Asia Minor Greek helps motivate the distinction drawn above. We saw examples where the nominative / accusative form is taken from a different paradigm, and induces neuter agreement. However, the remaining form, the genitive plural, does not distinguish gender. Hence once the nominative and accusative are specified as neuter, based on the inflection class of the form, that is essentially the specification of the plural sub-paradigm. Finally, consider Sɛlɛɛ (§3.3). There the paradigm consists of just two cells. Specifying the plural sub-paradigm is, in a sense, the same as specifying the plural cell. But in fact it makes a difference. If we adopt the analysis requiring the lesser override, treating Sɛlɛɛ as overriding the plural paradigm, then we might suggest:

Preliminary generalization 1 (to be refined in §7.1.3)

If there is an internal split at the sub-paradigm level,

and this determines an external split,

then the agreement is straightforward: that is, a single value is involved.

As we shall see, this is in marked contrast to those splits where the cell level is involved. Nevertheless, the proposed generalization needs refining, since it does not account for Serbo-Croat deca ‘children’ nor Finnish Yhdysvallat ‘the United States’ (a problem remedied in §7.1.3).

7.1.2 Featural information at the level of the cell

Polish ręka ‘hand’ differs in that the single cell involved, the locative singular, shows overabundance. However, the unexpected gender agreement has also been established to be hybrid – it varies according to target. Old Frisian wīf ‘woman, wife’ was known to be a hybrid, and the important study cited was designed to show progression of the semantically justified agreement form (the feminine). It proves significant for our story in that the change is proceeding cell by cell – we need a different specification for each cell. The nominative and accusative preserve the old form and the old, syntactic agreement. The dative form is ambiguous as to inflection class, and it allows both agreements. The main interest is in the genitive, where there are two forms available, and syntactic agreement (neuter) is found with one, and semantic agreement (feminine) with the other. Thus the split has to be specified for specific cells. Whenever neuter agreement is possible, this indicates a hybrid, since the personal pronoun takes feminine agreement. These examples are consonant with the following:

Preliminary generalization 2 (to be refined in §7.1.3)

If there is an internal split at the cell level,

and this determines an external split,

then the agreement is hybrid (it varies according to the agreement target).

The surprising Scottish Gaelic example, muir ‘sea’ could be classified in two ways: (i) there is a morphomic split (singular nominative and dative, vs singular genitive and the plural); (ii) since gender is not distinguished in the plural, the plural sub-paradigm should be ignored and the split is singular nominative and dative, vs singular genitive. It is tempting to take the first analysis, thereby expanding the typology. But the simpler solution is the second. For this we look at the genitive singular cell, since the plural cells are not relevant to the external split. As we noted, it had been established earlier that this cell had a different gender value, but it had not previously been pointed out that it is hybrid: the gender value varies across targets. This would fit under Preliminary generalization 2, and is the approach we take. If we were to take the other route, the morphomic split, then we would expand the first line of the generalization accordingly.

7.1.3 The constraint on featural information

We must be clear about the different types of split. Let us start from a canonical noun: this takes the same agreements to the degree possible; that is, it always takes the same gender value, and furthermore when singular it takes singular agreements, when plural it takes plural agreements, and so on. This holds true for all possible agreement targets, that is to say, it has a consistent agreement pattern (Corbett, 2006:11–12). Here the canonical type is unremarkable and there is no term for it. We need to distinguish two sorts of deviation from this canonical baseline, two sorts of split. First, there can be a split in that not all cells of the paradigm (not all featural specifications) behave alike; for instance, a noun may be neuter in the singular but feminine in the plural (like Serbo-Croat oko ‘eye’, as in §3.1). Second there can be a split in that different targets behave differently; a noun may be neuter for attributive modifiers but feminine for the personal pronoun, making it a hybrid (German Mädchen ‘girl’ would be an example). And while the canonical noun has neither of these splits, some have both, like Serbo-Croat gazda ‘landlord’ noted in §2, which has a singular-plural split; the singular is masculine, but the plural is masculine / feminine, according to the target. Thus it is a hybrid, just in the plural (Corbett, 2015b:205–207 and references there).

All of our examples below have a split of the first type – there is a restriction to a part of the paradigm. Moreover, given our focus, in all of them I have argued that this split in agreement requirements is linked to their internal morphological split. So the question here is what we find externally: is the agreement consistent (the same for every target) or hybrid (different according to target, also called ‘mixed agreement’)? I shall be cautious, since the examples are relatively few. However, given how difficult they are to find, and how valuable they are, it makes sense to draw tentative conclusions, partly in the hope of stimulating further research. The generalization which covers the available data is that for a lexeme which has externally split requirements, if these are to be for consistent agreement (rather than hybrid), the split must involve a sub-paradigm (it must be motivated). That is to say, it is possible for Serbo-Croat oko ‘eye’, Latin balneum ‘bath’, Serbo-Croat dokument ‘document’, Russian sani ‘sledge’ and Asia Minor Greek heteroclitic nouns (also even Sɛlɛɛ kɔ-nɛɛ ‘hand’, §7.1.1) to have a straightforward external requirement (consistent agreement), for both sides of the split, because the split is motivated (it involves the singular-plural distinction). This is not possible for Scottish Gaelic muir ‘sea’, Polish ręka ‘hand’ or Old Frisian wīf ‘woman, wife’, since here the split is not motivated (it involves individual cells, or arguably a morphomic pattern for Scottish Gaelic).Footnote 29 This is evident in Table 6.

Table 6 The constraint on featural information

Table 6 reveals a pattern: examples of splits with consistent agreement all involve a motivated internal split (sub-paradigm); those with an internal split involving a single cell all have hybrid agreement. What then of the remarkable Serbo-Croat deca ‘children’? Here we have, arguably, a singular-plural split, and yet the noun is a hybrid. In brief (see §3.5, the detail is in Corbett, 1983:76–81), in attributive position we find feminine singular agreement in all cases, except in the nominative where it is ambiguous between feminine singular and neuter plural. The predicate requires plural agreement, and the gender agreement there is best analysed as neuter plural. The relative pronoun takes feminine singular or neuter plural (according to case), and the personal pronoun takes masculine, feminine or neuter plural (see Corbett, 1983:84–85, Wechsler & Zlatić, 2003:51).

We find a one-way implication: only lexemes with motivated splits may have straightforward external requirements (in this they are like canonical nouns with no split). The generalization is this:

Generalization 3 (replacing preliminary generalizations 1 and 2)

If there is an internal split,

and this determines an external split,

for agreement to be consistent,

it is a necessary, but not a sufficient criterion, that the split be a motivated one.

In the case of Serbo-Croat deca ‘children’, this noun could theoretically not be a hybrid, but it is. It has highly unusual properties which may underlie its behaviour here: two features (gender and number) are involved in its hybrid behaviour while typically hybrids involve just one. But mainly, the fact that it denotes humans means that its form-meaning mismatch is likely to make it a hybrid.Footnote 30 And pluralia tantum nouns illustrate the generalization well. Their split involves a sub-paradigm (the singular is missing), which is a necessary condition for consistent agreement: we find this with the Russian plurale tantum sani ‘sledge’. But it is not a sufficient condition, so they may be hybrids, as with Finnish Yhdysvallat ‘the United States’ (which is a hybrid so long as the plural attributive remains a possibility).

Several of our examples involve a split in gender agreement conditioned by number, and these data bear on a further issue. For consistent agreement in gender to be possible, the split must be conditioned by values of number (as the examples in Table 6 show). This generalization fits with various feature hierarchies which have been proposed over the years (e.g. Brown & Hippisley, 2012:57–62). It is also relevant to the claim by Carstairs-McCarthy (1994:771), that gender mixture can be based only on number. By this he means that when there are more controller genders than target genders, the extra value is always based on number (for example, there can be a class of nouns which are masculine in the singular and feminine in the plural, which if large enough gives a third gender value, as with the Romanian neuter). It was suggested by Enger and Corbett (2012:303–306) that this claim does not hold, on the basis of data from Norwegian dialects, where gender agreement is split on the basis of definiteness, and from Scottish Gaelic muir ‘sea’, where gender agreement is split on the basis of case (§3.4 above). It was known that the Norwegian instance was a hybrid, but Scottish Gaelic muir ‘sea’ was believed to have consistent agreement, something we have just shown is incorrect. Provided we restrict ourselves to normal gender values, those with consistent agreement patterns (which may well be what Carstairs-McCarthy intended), then till now his generalization holds.

Saving this generalization leads to a new issue. Scottish Gaelic muir ‘sea’, as just discussed, and Polish ręku ‘hand’ both have inconsistent agreement. Their external requirements suggest an Agreement Hierarchy effect, but to capture it would require us to extend the scope of the hierarchy, the issue to which we now turn.

7.2 Extending the scope of the Agreement Hierarchy

The Agreement Hierarchy consists of these positions, which we have already seen as relevant to some of our examples (see Fig. 1).

Fig. 1
figure 1

The Agreement Hierarchy

The constraint is:

For any controller that permits alternative agreements, as we move rightwards along the Agreement Hierarchy, the likelihood of agreement with greater semantic justification will increase monotonically (that is, with no intervening decrease). (Corbett 1979, 2006:206–233)

Can we use the Agreement Hierarchy to cover the more unusual instances we have met? In terms of the target positions it is perfect. The question is whether the constraint applies. The constraint refers to ‘greater semantic justification’, that is a relative ranking of alternative agreement forms. This covers instances like Serbo-Croat deca ‘children’ (§3.5), where the possible agreements include feminine singular, neuter plural and masculine plural. We need to be able to say that neuter plural agreement (the ‘right’ number but the ‘wrong’ gender) shows greater semantic justification than feminine singular, but that masculine plural shows even greater semantic justification.

Consider now Polish ręka ‘hand’, where we saw that the unexpected locative singular form ręku can take masculine / neuter agreement, but only in attributive position. Once discovered, this restriction makes sense, since we do not expect case to ‘survive’ over longer syntactic stretches: the fact that the phrase is in the locative would not be a factor for gender and number outside the phrase. Yet this expectation is equally a reflection of the structural constraint of the Agreement Hierarchy. Inside the nominal phrase, then, we find masculine / neuter agreement, and outside it feminine. Is it reasonable to say that one of these has greater semantic justification? I suggest that the feminine has greater semantic justification, in that it facilitates access to the antecedent, which is of feminine gender (apart from one overabundant form). Turning to Scottish Gaelic muir ‘sea’ (§3.4), this is a masculine noun, except when the noun is genitive and the agreement target is within the nominal phrase. In previous analyses, hybrids have been discussed based on two types of mismatch: (i) meaning-meaning mismatches (English committee, where plural agreement matches the more general semantic distinction of plurality of items as opposed to the less general notion of ‘collection’, which does not relate directly to the semantics of number in English); or (ii) form-meaning mismatches (for example, mamma ‘mum’ in the Nordreisa dialect of Norwegian, where the form does not match the meaning); see Corbett (2015b) for sources and for discussion of both types. Instances like Polish ręka ‘hand’ illustrate the third logical possibility, a form-form mismatch. The locative form ręku does not fit with the rest of the forms of this noun, and induces a gender value that is less general than that which applies to the rest of the noun.

It is important to recall that ‘greater semantic justification’ is a relative notion; it does not imply a large degree (any more than ‘longer than’ implies long); it runs from complete lack of semantic justification, to full semantic justification. Can we calibrate the agreements with Polish ręka ‘hand’ according to this scale? The use of feminine agreements with all forms of Polish ręka ‘hand’ has some function, in that this gender value is that of the noun as a whole and it allows ręka ‘hand’ to be tracked as the antecedent. The masculine / neuter does not allow this. If we place feminine agreement with ręka ‘hand’ on the semantic justification scale, clearly it scores very low, but surely higher than masculine / neuter. To this extent, then, the feminine arguably has has greater semantic justification than the masculine or neuter. We can treat Polish ręka ‘hand’ as a split hybrid (similarly Scottish Gaelic muir ‘sea’); it falls under the constraint of the Agreement Hierarchy and we do not need to introduce additional machinery for it.

7.3 The role of frequency

The evidence has consisted mainly of small classes (in terms of types); thus Serbo-Croat oko ‘eye’ has just uxo ‘ear’ showing similar behaviour. But even here we must be careful to avoid some reasonable false connections. We start from the observation that we have seen the “usual suspects” among the evidence, for example certain paired body parts, as in the Serbo-Croat examples just mentioned. An obvious observation here would be that we are dealing with frequent nouns. But we need to distinguish between absolute and relative frequency (e.g. Corbett et al., 2001:202–203, Hippisley et al., 2014:392–393). It is not primarily that oko ‘eye’ is frequent as a lexeme (there are many items which are more frequent), rather its plural is frequent relative to its singular; and originally its dual was frequent relative to singular and plural. It is this relative frequency, I suggest, that has allowed it to maintain its heteroclitic status and its split in agreement for over 1000 years (Corbett, 2019:67–69). Having suggested that relative frequency is the more important factor here, we must also note the next potential source of confusion, namely that there is a correlation between relative and absolute frequency (Hay, 2001): in our instance, items with high relative frequency of the plural will also tend to be absolutely frequent.

Hay (2001) suggests that different meanings may also be associated with high relative frequency; she is mainly concerned with derivation, but the same argument should be considered for number. Here, again, we must keep the different factors apart: we may link relative frequency to semantic irregularity, and we may link it to morphological irregularity, but we should recall that singular and plural may diverge semantically, irrespective of any morphological irregularity (as with, for instance, groundgrounds, fundfunds, and so on). These singular-plural effects have led some to suggest that plural nouns are distinct lexemes (for instance, Baayen et al., 1997:866–868); these authors were concerned with Dutch, and the arguments are less convincing when we consider larger paradigms (as with oko ‘eye’). While their arguments accommodate nouns like oko ‘eye’ with different gender values in singular and plural, they work less well in explaining why it is so very often the case that nouns have the same gender value across numbers. This issue again illustrates the importance of looking at larger systems, in this instance items with larger paradigms.

Hence we need to keep apart relative and absolute frequency, the possibility of non-compositional use of number, and the potentially confusing properties of small paradigms. If we take account of all these factors, the examples we have examined provide telling evidence bearing on the location of featural specifications.

7.4 Lexicalism and Morphology-free Syntax

Our data involve the specific behaviour of a limited number of lexemes, in some instances a single lexeme. Each one of these can be taken as an argument for lexicalist models. The issues, including the different combinations of principles involved in lexicalism, are carefully laid out by O’Neill (2016). In particular, we have seen quite specific morphology-internal mechanisms at work, which are different in kind from the workings of syntax (for another particularly cogent example see Feist & Palancar, 2021). However, we have gone to some lengths to show a linkage between internal and external splits. Furthermore, some examples have provided evidence that it is the internal split that determines the external one. We must therefore clarify how such instances are consonant with the Principle of Morphology-free Syntax (Zwicky, 1992:354–356). This principle ‘prohibits any rule of syntax from making reference to the internal structure of a word or to purely morphological features’ (O’Neill, 2016:244).

As a representative example, we take specifically oko ‘eye’ in Serbo-Croat (§3.1). We saw that this lexeme combines material from two inflection classes and that this internal split determines its external split in its gender requirement. As we saw in §3.1, the language has principles of gender assignment, which depend: (i) on semantics (nouns denoting females are feminine and those denoting males are masculine); and (ii) when the first do not apply, on inflection class (inflection class I implies masculine, II and III feminine, and IV neuter). Thus the morphological information required as part of the lexeme allows prediction of its gender, which is what the syntax has access to (the suggestion that the syntax might have access to inflection class in Serbo-Croat has been shown to be unnecessary, see Corbett, 2009b). The particular noun oko ‘eye’ follows exactly the same principles. Its morphology is split, this leads to two outcomes from the gender assignment principles, and these are what the syntax has access to. The interface is a featural one, as is shown particularly clearly by Serbo-Croat dokument ‘document’, where the key data involved the relative pronoun in a different case from its antecedent and with forms which had no phonological resemblance to it (§4.3). Specifically in terms of the Principle of Morphology-free Syntax as given above, the syntax does not need to make reference to the internal structure of a word; of course, different cells may provide different information, as is trivially found in number agreement, but that does not imply reference to the structure of the word. Nor does the syntax need to reference purely morphological features, such as inflection class. These determine the values of gender, a morphosyntactic feature, to which the syntax always has access. This line argument holds for the other examples discussed, hence our data are in accord with the Principle of Morphology-free Syntax.

8 Conclusion

Internal and external splits had been documented previously; the new challenge was to build a typology of how they are linked. We set up the Extended Lexeme Consistency Principle as a baseline from which to calibrate a range of largely unreported phenomena. There are clear examples where internal and external splits co-vary, and the evidence in some instances was sufficient to show that other interpretations were highly implausible. More specific versions of the plausibility argument involved lexemes showing overabundance, lexemes showing variation over time and space, and the different types of pluralia tantum nouns: in all of these types of evidence the co-variation of internal and external splits argued strongly for the linkage. Moreover, wherever there was evidence bearing on directionality, it pointed to the internal linkage determining the external one. Since morphosyntactic features (gender and number) were involved, this result is consonant with the Principle of Morphology-free syntax. A further specific finding was that the range of the internal split (sub-paradigm or cell) determines the nature of the external split (consistent or hybrid). This result demonstrates that featural information is associated with the lexeme in a default hierarchy, namely at the level of the lexeme by default, unless overridden at the sub-paradigm level, unless in turn overridden at the level of individual cells (but compare the discussion in Stump, 2016:92–95). The examples involving hybrids led to an extension of the scope of the Agreement Hierarchy. Previous controllers subject to the Agreement Hierarchy had involved meaning-meaning mismatches or form-meaning mismatches. Our new examples provide the missing type, namely form-form mismatches. This means that they can be accommodated within the constraint of the Agreement Hierarchy without the need to propose a new mechanism for them. Often the key evidence is found with few lexemes, or even a single one. In the era of big data, this investigation shows that small data too can have great value.