30 December 2015

Wheels Are Made for Rollin’

Reduplicated nouns certainly existed in Proto-Indo-European, but they are a poorly investigated species. I will leave aside onomatopoeic reduplication, when the echo consists of at least a CVC sequence, as in Proto-Slavic *golgolъ ‘speech’, Greek bárbaros ‘foreign’ (that is, speaking incomprehensibly), Latin murmur (no gloss necessary), or when the whole stem is repeated, as in Hittite harsiharsi- ‘thunderstorm’. There is a more interesting type in which reduplication is “grammatical” rather than purely iconic, the echo template is CV, and only the consonant is copied from the base. The showcase specimen is the celebrated word for ‘wheel’, *kʷékʷlos. It is not attested in the Anatolian subfamily, so its Proto-Indo-European status is uncertain, but it dates back at least to the common ancestor of Core Indo-European.¹

A Bronze Age sun chariot
The ‘wheel’ word is interesting for several reasons. Not all of them need to concern us here. Wheeled transport (in combination with horse domestication) is supposed to have played a crucial role in the early migrations of the Indo-European-speakers, and consequently in the expansion of the Indo-European languages. The appearance of a “technological package” containing terms for ‘wheel’, ‘axle’, ‘cart/wagon’, etc. marks the onset of these historical processes. But I shall concentrate on the linguistic properties of the word, not its cultural importance. The latter is relevant only as an “ecological” factor favouring the frequent use of the word, its successful survival and rich attestation.

*kʷékʷlos is an original masculine – or, if it dates back to Proto-Indo-European after all, an animate, non-neuter noun. One of its forms is conspicuous by its unusually high survival rate – the collective *kʷəkʷláh₂ (see below for details of the reconstruction). It must have been used very frequently, for it tends to occur instead of the expected masculine plural. In Homeric Greek, for example, kúklos has an irregular plural, kúkla (as if the word were neuter rather than masculine). This is quite striking, because the use of the old PIE collective with animate nouns, still productive in Old Hittite, became extremely rare in Core IE. The collective, co-opted already in PIE as the ordinary nominative/accusative plural of the neuter gender, came to be associated exclusively with neuters in most daughter languages. Wheels, however, are more often spoken of  as fixed sets (the two wheels of a chariot, the four wheels of a wagon) than as an arbitrary number of individual objects. The fact that *kʷəkʷláh₂ is preserved so well shows that the word was applied to wheels as vehicle parts when the collective was still a living grammatical category, contrasting with the count plural.³

Let’s take *kʷékʷlos apart into its morphological constituents: *kʷe-kʷl-o-s. The core part is *-kʷl-, in which we can recognise the very common verb root *kʷelh₁- ‘move round, follow one’s course’ (with a variety of secondary meanings, such as ‘become, stay around, inhabit, observe, cultivate, take care of’ and the like). The phonetic reduction of the root, resulting in the loss of the laryngeal segment *h₁, is a normal phenomenon in compounds and reduplications. The reduplicated noun is thematic (has a stem ending in the vocalic suffix *-o-), which suggests adjectival origin. Collectives of o-stems were formed by adding the *-h₂ suffix to the stem-final vowel in the e-grade: *-e-h₂ → *-ah₂. If the singular had initial accent, the collective was accented on the ending (*-áh₂). This accent shift happened early enough to affect the vocalism of some nouns (from a sufficiently old lexical stock). It is therefore probable that the collective was *kʷəkʷláh₂, with a weak prop-vowel rather than a full-grade *e in the first syllable. This would explain the development of the word in Greek: *kʷəkʷ- > *kukʷ- (with the prop-vowel “stealing” lip-rounding from the preceding labiovelar) > kuk- (with a regular delabialisation of * after /u/).

As the accentual difference between the singular and the collective became non-productive, the paradigm was levelled out in various ways to eliminate the mismatch; that is why the accent is consistently initial in Greek (generalised from the singular) and consistently final in Vedic (from the collective). Since *kʷ(e)kʷláh₂ looks like a neuter plural, speakers were tempted to supply an innovated neuter singular to match, *kʷ(e)kʷlóm, instead of the inherited masculine (hence e.g. Vedic cakrám beside much rarer cakrás). The function of the “echo” prefix *kʷé-/*kʷə- isn’t entirely clear, but judging from cross-linguistic tendencies we can speculate that reduplication gave the underlying verb root an iterative colouring (‘go round and round and round’ rather than ‘complete a turn’).

While rare, the derivational pattern visible in the ‘wheel’ word (a thematic noun formed from a reduplicated verb root) is not isolated, and can be found also on the Anatolian side of the oldest split in the Indo-European family-tree. For example, the Hittite word for ‘rake’ was hah(ha)ra-, plausibly reconstructed as *h₂áh₂ro- ← *h₂e-h₂rh₃-o-. Here the root is *h₂arh₃- ‘break the soil, plough’, as in Greek  aróō, Proto-Slavic *orjǫ, Old English erian (all meaning ‘to plough’), or in the widespread Neo-Indo-European instrument noun  *h₂árh₃-trom ‘ard, plough’ (Greek árotron, Old Norse arðr).

The behaviour of the ‘wheel’ word in Germanic so interesting and instructive that it deserves to be covered in a separate post (to appear soon).

¹ My use of the terms “Core Indo-European” and “Neo-Indo-European” is explained here.

² Of course the restoration of *e on the analogy of the singular was possible, and it certainly happened in some branches of Indo-European.

³ Note the semantic development in Tocharian, where *kʷékʷlos > Toch.A kukäl, Toch.B kokále came to mean ‘wagon, chariot’.

28 December 2015

Echoes of the Distant Past: Fossil Reduplications

Modern English has its normal share of nursery words, colloquial interjections, and miscellaneous other onomatopoeic or expressive words involving sound-repetition: daddy, baby, nanny, sissy, pee-pee, bye-bye, ta-ta, goody-goody, ding-dong, pop, riff-raff, hip-hop, bow-wow, cuckoo, hurdy-gurdy, tic-tac-toe, bubble, giggle, mumble, google, etc. English also has reduplicative words borrowed from other languages: dodo, can-can, dum-dum, yo-yo. Some of such imports are old and their reduplicative status is no longer obvious to non-specialists: barbarian, purple, turtle-dove. A few echoic words exhibiting a repetitive pattern are at least as old as the English language, whatever their ultimate origin; cock and chicken belong here.

Traces left by a reduplication
Note, however, that the words listed above are not derived by reduplication. For example, giggle cannot be traced back to a simpler verb with only one occurrence of /ɡ/. In the overwhelming majority of cases the repetition is merely phonetic, not morphological. Reduplication in the proper sense of the word (involving a base and an echo) is not used in English to perform any of its typical, cross-linguistically common tasks, such as the formation of plural or collective nouns, verb stems of a particular aspect or tense, intensive verbs or adjectives, deverbal nouns, etc. This is one of those things that make English, together with some other languages of the northerly latitudes, a little weird.

Interestingly, morphological reduplication is given looser rein in some English-based creole languages, for example in Tok Pisin, where it seems to be on the rise as a derivational device  – presumably as a result of contact with the heavily reduplicating indigenous languages of Papua New Guinea. Here are some examples:
kala ‘colour’ → kalakala ‘colourful’
bruk ‘break, fall apart’ → brukbruk ‘fall apart into many small pieces’
pilai ‘play’ → pilaipilai ‘play round’
ron ‘run’ → ronron ‘keep running’
tok ‘talk’ → toktok ‘conversation’
wil ‘wheel’ → wilwil ‘bicycle’¹
Has English preserved any really old reduplications, with cognates in other branches of the Indo-European family? Yes, but there are only a handful left, and most of them show no transparent reduplicative structure any longer. Among those relics there are at least two nouns, wheel and beaver (probably also tetter ‘skin disease’), one adjective, quick (provided that my etymology of PIE *gʷih₃wó- ‘living’ in Gąsiorowski 2007 is correct), and two verbs in the past tense, ate and did. Despite the fact that the two irregular past tenses represent the same modern category, they go back to different Indo-European verb forms, characterised by different reduplication patterns. Perhaps most surprisingly of all, the regular past-tense ending – and not just the -d of loved, watched, waited, but also the -t/-d of kept, brought, sold – vaguely reflects an ancient reduplication as well, and has in fact the same origin as did. I will trace these connections later in this series.


¹ Since wil = Eng. wheel, which itself is an old reduplicated noun, Tok Pisin wilwil is a quadruplication, etymologically speaking.

27 December 2015

How to Stammer Grammatically: Reduplication

Linguistic signs are mostly arbitrary in the sense that their form is not directly related to the concept they express. For example, there is nothing in the phonetic shape of the Malay word ikan to suggest its meaning – ‘fish’, or, by extension, any ‘marine animal’ (turtle, whale, oyster, etc.). The sound of the word is not intended to evoke swimming or splashing. It is just a regular historical reflex of Proto-Austronesian *Sikan (with the same meaning and also an arbitrary phonetic shape). It has cognates in other Austronesian languages, for example Hawaiian i‘a [ˈiʔa]. None of them makes you say to yourself, “Methinks it is like a fish.” Indeed, even if a word starts out as onomatopoeic, sound changes will in the long run alter its pronunciation beyond recognition, eventually reducing or destroying its imitative value (see the etymology of English pigeon).

Affixes and auxiliary words are usually not iconic either. English regularly indicates the plural number of nouns with the suffix -(e)s (pronounced [s, z, z], depending on the context); some nouns (including fish) form endingless plurals. Neither the suffix nor its absence “portrays” plurality, whether by resemblance or by analogy. The same can be said of irregular plurals like goose : geese or child : children. Is it possible at all to express plurality iconically – that is, to make a linguistic sign sound plural? Yes, it can be achieved by amplifying the sign itself to indicate “more of something”; and one simple way to amplify it is to repeat it. Malay nouns are not inflected for number. Plurality, if it matters in a given situation, may be signalled by the use of numerals or quantifiers, or just inferred from the context. But the speaker may also choose to emphasise the multiplicity of referents by doubling the noun: ikan-ikan ‘fish’ (plural). This is similar to emphatic repetition occasionally encountered in all languages, including English, as in:
We rode for miles and miles.
What do you read, my lord? ― Words, words, words.
In English, word repetition is a syntactic phenomenon; in Malay, it is used as a word-formation mechanism. Note, by the way, that many Malay nouns obligatorily consist of a double occurrence of the same sequence and have no simplex counterpart, e.g. biri-biri ‘sheep’ (singular and plural), while others change their meaning if doubled (mata ‘eye’ : mata-mata ‘spy, detective, police officer’). Root-doubling can also be used with adjectives to indicate intensity (her wild, wild eyes could serve as an English analogue), and with verbs to indicate repetitive or prolonged action. In those cases the doubling is definitely iconic. But duplicated verbs may also refer to a sloppy or leisurely execution of an action, e.g. makan ‘eat’ : makan-makan ‘peck at the food’ (showing lack of interest or appetite). Here the iconicity is less self-evident.

The technical term for such morphological doubling is reduplication. In the Malay examples above the entire root is faithfully repeated, but numerous languages also employ partial reduplication in which the repetition is just hinted at rather than applied in full. Typically, a fixed pattern of consonants (C) and vowels (V) is used as a simplified copy of the morphological base – most often a CV or CVC template. Sometimes only the consonants are copied from the base, while the V position is filled by a fixed default vowel (e.g. [ə]).  Depending on the language, the copy may be attached before the base (as a prefix) or after it (as a suffix), or even inserted inside it (as an infix). The copy is usually called reduplicant, but I prefer the handier and less esoteric term echo. We shall be mostly concerned with reduplicative prefixes, that is cases when the echo is placed before the base. For example, in Yucatec Maya CV reduplication is employed to form intensive adjectives and intensive or iterative verbs:
k’aas ‘bad’ : k’a’-k’aas ‘evil’
p’iik ‘break (something hard)’ : p’i’-p’iik ‘break into many fragments’
Partial reduplication of this kind is not unlike stammering, which may also involve incomplete syllable repetition: b—b—black [bəbəˈblæk]. Of course there is an important difference: reduplication is controlled by the speaker, while stammering is involuntary and has no grammatical function. 

Expressing plurality, intensity, repetition or, more generally, “greater degree” is the most natural use of reduplication, with a clear cognitive motivation. However, once adopted as a derivational or inflectional device, reduplication easily acquires secondary functions, gradually dropping its iconic character and evolving into another “arbitrary” morphological tool. Reduplication, in its numerous variants, has a global distribution. It’s only in a circumpolar belt of the northern hemisphere, including Europe, Northern Asia and the northernmost part of North America that reduplication plays little role in derivational and inflectional morphology. From a Eurocentric perspective grammatical reduplication may look exotic; we shall see, however, that it had important functions in Proto-Indo-European and some of the languages descended from it.

17 December 2015

Sex, Greek, and Rix’s Law

A recent comment by David Marjanović made me reflect on a Sanskrit word, yábhati ‘have sexual intercourse’ (as the Monnier-Williams dictionary tactfully puts it). The verb is of special interest to speakers of Slavic languages, because its exact cognate – Proto-Slavic *jebe/o- (with a host of Slavic derivatives) – remains one of the most favourite obscenities in all the languages belonging to that branch of Indo-European. Interestingly, the verb is only very sparsely attested in Iranian and seems to be completely absent from Baltic. In Modern Indo-Aryan its reflexes are quite numerous, though hard to recognise after more than two millennia of sound change, sometimes combined with euphemistic deformation.

By comparing Indo-Iranian and Slavic cognates, we arrive at the stem *jébʰ-e/o- (3sg. *jébʰeti, 3pl. *jébʰonti) as the most parsimonious reconstruction of their ancestral form. It’s a so-called “simple thematic present” – an imperfective stem built to the root *jébʰ-, with the vowel *e in the root and the “buffer vowel” *-e/o- added before personal endings. If the verb has a deeper origin in Indo-European, its oldest form must have been different. Simple thematic presents occur in large numbers in most of the branches of the family; for example, they accout for much of the third conjugation in Latin. However, they are absent from the most outlying lineage of Indo-European (the Anatolian languages), and their low number in Tocharian, the next group that split off before the divergence of the modern branches, shows that they evolved gradually in post-Proto-Indo-European times. Further speculation about the origin of *jébʰ-e/o- via internal reconstruction is difficult because simple thematics have more than one historical sources.

I hope it is not all Greek to you.
[Source: Wikipedia]
When we run out of exact cognates, we can focus on next best thing – plausibly related words with a different morphological structure. Everybody agrees that Ancient Greek oípʰō (with the same meaning) must be a relative of yábhati. Pre-Greek *jébʰ-e/o-, however, would have produced Gk. ˣzepʰō (here, ˣ, not to be confused with the asterisk, marks an unattested, incorrectly predicted form), so the origin of oípʰō must be different. Since the Greek reflex of the root morpheme (oípʰ-) contains an unexpected o, it is justifiable to suspect that one of the Proto-Indo-European “laryngeal” consonants, the one conventionally written *h₃ (probably a voiced pharyngeal fricative [ʕ], if you prefer phonetic symbols) is lurking about. This consonant was vocalised in Greek as o in some positions; it could also (already in PIE) change an adjacent *e into *o. This is why the root we are discussing is often reconstructed as *h₃jébʰ- to accommodate the o-colouring fricative. Unfortunately, most sources just put the laryngeal there and don’t attempt to explain the Greek form in detail.

The trouble is that oípʰō can’t be derived from *h₃jébʰ-e/o- either. According to recent work on PIE syllable structure (Byrd 2015; see also here), the sequence *h₃j- was simplified to *j- in word-initial positions very early in the history of Indo-European, so in this case too we should expect Gk. ˣzepʰō, just as if the *h₃ weren’t there. Some authors propose that *h₃jébʰ-e/o- had a metathetic byform *h₃óibʰ-e/o-, in which *j and *e had swapped places, which caused the latter to get coloured to *o by the preceding *h₃. Such a solution, however, is desperately ad hoc. There is no morphological or phonological motivation for the metathesis, and the wish to see the desired output is not enough.

Another ad hoc solution is adopted in the Lexicon der indogermanischen Verben (Lexicon of Indo-European Verbs, LIV), where the root is listed as *jebʰ-, and its Greek reflex is reconstructed as a present stem with the zero grade of the root and the prefix *o-, that is, *o-ibʰ-e/o-. The problem is that such an alleged verb prefix is vanishingly rare in Greek (so rare that its very reality is questionable), and its function (if any) is unspecified. Solving one mystery by creating another is not sound etymological practice.

A more ingenious suggestion was made by Johnny Cheung in his Etymological Dictionary of the Iranian Verb (2007). Cheung proposes that the Greek present was reduplicated. Grammatical reduplication in PIE involves copying the initial consonant, extending it with the vowel *e or *i, and pasting it back onto the root as a prefix. There are several classes of Indo-European verb stems formed in this way. Following Cheung’s suggestion, we should reconstruct *h₃e-h₃ibʰ-e/o-, which after the laryngeal colouring of the first *e yields *h₃oh₃ibʰ-e/o- and – hey presto – Gk. oípʰō.

Alas, the formation of reduplicated presents is something we understand rather well – well enough to see a couple of problems with this reconstruction. First, although *e may appear as redupllication vowel in IE present stems, it does so only in so-called “athematic” ones (without the *-e/o- suffix). In thematic presents, i-reduplication occurs instead, as in *si-sd-e/o- ‘sit’ > *sizde/o- > Gk. hízō. Secondly, even in athematics, *e seems to have alternated with *i. The details of the alternation are still debated, but one thing is sure: Greek generalised i-reduplication thoroughly in this class, so that we find it in Ancient Greek present stems (thematic and athematic alike) to the complete exclusion of e-reduplication. Therefore, *h₃e-h₃ibʰ-e/o- just won’t float – not in Greek waters.

There remains another possibility, also considered by Cheung but qualified as less likely than the reduplicated root: a zero-grade thematic present, *h₃ibʰ-é/ó-. Such a stem structure is also well-known; one typical example is *gʷr̥h₃-é/ó- ‘devour, swallow’ (Sanskrit giráti, Slavic *žьreti). Both *h₃ibʰ-é/ó- and *(h₃)jébʰ-e/o- (with an early loss of *h₃) could be independently derived from a still older common prototype, most probably a root verb without any suffixes. Why, then, should *h₃e-h₃ibʰ-e/o- be “more likely” than *h₃ibʰ-é/ó- as the source of oípʰō?

The problem here is that we aren’t really sure what happened to initial *h₃i- in the transition from PIE to Ancient Greek. There was a pre-Greek sound change, known as Rix’s Law, which changed any initial *HR̥- into Greek VR-. In these formulae, R stands for any liquid or nasal (l, r, mn), is its syllabic variant, H is any of the three PIE laryngeals, and V is a vowel whose quality matches the phonetic “colour” of the laryngeal (e, a, o for, respectively, *h₁, *h₂, *h₃). To what extent the sequences *Hi- and *Hu- were also affected by Rix’s Law has been a matter of some dispute. PIE *i, *u can be regarded as syllabic variants of the corresponding glides *j and*w; therefore, it is at least thinkable that Rix’s Law could apply to them as well.

As for the sequence *Hi-, however, it can be demonstrated with good examples that no initial vowel developed if the laryngeal was *h₁. It has furthermore been suggested that the outcome could be Gk. hi- (with an initial aspirate) rather than simply *i- (Bozzone 2013). For *h₂ and *h₃ the evidence is inconclusive (no unambiguous examples). But there is no clear counterevidence either to rule out *h₂i- > Gk. ai- or *h₃i- > Gk. oi- (pace Peters 1980¹, who argues for *Hi- > Gk. *i- across the board). As for *Hu-, we have several convincing cases showing that *h₂u- > Gk. au-, one or two possible cases of *h₃u- > Gk. ou-, but no examples at all of *h₁u- > Gk. ˣeu-. This may mean that the Greek reflexes of *h₁u- are indistinguishable from *u- since both merged as Gk. hu-, while the other two laryngeals followed the pattern of Rix’s Law. It is therefore possible that *Hi- and *Hu- developed in parallel, and that the expected outcome of *h₃i- is Gk. oi-.

This insight has far-reaching consequences for our understanding of the various combinations of *i/*j and *u/*w with the laryngeals in the prehistory of Greek, but I can only skim the surface of the topic in a blog post. It’s getting too long anyway, so it’s time for the moral. The hero of this little essay is a swear-word so obscene that some old ladies in my country might faint if they saw it printed in a newspaper. On the other hand, you can hear it all the time in the street, adorned with modifying prefixes, converted into derived nouns, adjectives and adverbs, and spawning lots of specialised meanings. It has functioned like that literally for millennia – taboo or no taboo. Living the merry life of an outlaw, it has become a respectable archaism, almost a living fossil, with an impeccable pedigree and aristocratic Vedic connections. Together with its equally naughty Ancient Greek cousin, it may provide a precious piece of crucial evidence needed to solve a vexing problem in Greek historical phonology. Not bad for a dirty little word.


24 November 2015

Boontling “Deek”: A Rovin’ Gypsy Word?

The little town of Boonville (Mendocino County, California) was established in the early 1860s near a slightly older place called The Corners. A local general store was moved from The Corners to the present location of the town centre and then sold to Mr W. W. Boone, who modestly named the settlement after himself (it had briefly been called Kendall City in appreciation of another local businessman). The inhabitants of Boonville (now about 1000 people) refer to their town colloquially as Boont.

What makes Boonville special is its local ‘jargon’ which probably arose in the 1890s among children and young people (who then grew up without abandoning it). The community was quite isolated at the time, and kept no records to inform posterity why they chose to develop an extremely hermetic and highly inventive vocabulary of about 1500 words, known as Boontling (Btl.). Boontling was not originally meant to be written down, but a semi-formalised spelling was developed for it in the 1970s. One of the local words is to boont ‘to speak Boontling’. At present Boontling is dying out (Btl. pikin to the dusties) despite having been discovered by linguists and made known to the general public. Many Boontling words remain in circulation, but there are few fluent users left. Boontling has never been a fully fledged dialect: it has a distinct vocabulary incomprehensible to outsiders, but the accent is a rural variety of Northern California English (with historical affinities to the Midwestern and Border South dialects), and Boontling syntax is in nearly all respects the same as that of mainstream US English.

The Old Machine Boys [source]
Despite its recent origin, Boontling vocabulary is etymologically opaque to a surprising extent. Nevertheless, the vast majority of its words are coined from pre-existing elements rather than made up entirely from scratch. Often you have to know the history of the place and rely on anecdotes collected from elderly locals that “explain” the meaning of some words, especially those derived from personal names. (A professional etymologist has to verify their historicity, of course, and this is likely to be the toughest part of the job.) Some words reflect otherwise forgotten dialectal or slangy vocabulary. Some were coined using Humpty Dumpty’s technique of piecing together broken fragments of ordinary English words. Some hide behind strange pronunciations that appear to have been borrowed from Scottish or Ulster Scots speakers. Some came from Spanish (approximately half the population is of “Hispanic or Latino” descent), and a few from the Pomoan languages indigenous to California (there are a few Native Americans as well).

I’m intrigued by a few of them. For example, one of the most common and persistent Boontling words is deek ‘look, see, stare, notice’ (also used as a deverbal noun). I’m not aware of the use of deek anywhere else in North America. However, deek is a well-known colloquial Northernism in Britain. It’s stereotypically associated with Geordie (the dialect of Newcastle and the Tyneside area), but it actually occurs throughout Northern England (including Cumbria, Liverpool and Yorkshire) and much of Scotland. The word is a loan from Romani or rather Angloromani – the Romani-derived lexicon embedded in the varieties of English used by the British Romanies (see Yaron Matras, 2010, Romani in Britain: The Afterlife of a Language, Edinburgh University Press). The Angloromani verb (no longer inflected) is deek, dik, dikkai [diːk, dɪk, dɪkʰaɪ], reflecting European Romani dikh- ‘see’. There are, by the way, quite a few Romani loans in British dialects (some of them, such as pal ‘brother, friend’, no longer dialectal). The Dictionary of the Scots Language gives, among others, these recent examples of the use of deek:
  • Deek that gadgie. ‘Look at that guy.’ (Edinburgh, 1988)
  • The gaffer wis anither big rough-deeking gadgie... (Aberdeen, 1990)
Here, in addition to deek, also gadgie guy, bloke is a Romani loan (Angloromani gadji, gawdjo, gawdja < European Romani gadžo ‘non-Gypsy’).

The root dikh- arrived with the ancestors of the modern Romani all the way from Northwestern India. It is cognate to Hindi dekh- and to Sanskrit dṛś-, dṛkṣ-, all of which continue a well-known Proto-Indo-European root, *derḱ- ‘watch, see’. Incidentally, the Hindi word became independently borrowed into British English via the army slang of British soldiers serving in India, hence have a dekko ‘have a look’.

The Germanic languages also inherited a few words derived from *derḱ-, but English has lost all of them. Old English still had torht ‘bright, splendid, illustrious’ from the PIE deverbal adjective *dr̥ḱ-tó- (cf. Skt. dṛṣṭá- ‘seen, visible’). It was used almost exclusively in poetry, but also served as an element forming personal names. For example, an Old English gadgie called Torhthelm (Totta for friends) owned a farm called Totta’s Homestead (Tottan-hām) in todays north London. The To- part of Tottenham is about all that has survived of the root *derḱ- in Modern English via direct descent. A number of other reflexes, however, have reached English by horizontal transfer from other Indo-European languages, the most spectacular of them being dragon (ultimately from Greek drákōn ‘starer’ → ‘serpent with a deadly stare’). But I’m digressing.

I have no watertight proof that Btl. deek is the same word as Angloromani, Northern English, Scots and Scottish English deek, but I’d be very surprised if somebody proved that Btl. deek had a different origin. Still, I have no idea how the word could have reached an obscure valley in Northern California and become fixed in the local slang without leaving any other traces in American English. If anyone among my readers comes up with an idea how to explain its trajectory in time and space, I’ll be immensely grateful for sharing it.

21 November 2015

A Normally Weird Language

Every week, the digital magazine Aeon publishes several ambitious essays, by competent writers, on culture, philosophy, science, technology and other interesting subjects. One of last week’s authors is John McWhorter, professor of linguistics and American studies at Columbia University; the topic is the English language. The essay is entitled “English is not normal”. Professor McWhorter argues not only that English is genuinely “weird” (anyone who has followed his publications already knows it) but makes a stronger claim that it “really is weirder than pretty much every other language”. Now that is a really weird thing to say, so let’s see how it is argued.

English is not normal
McWhorter begins by discussing English spelling and its caprices (with the reservation that writing is secondary with respect to speech). This is of course due to the conservative character of the spelling system, which has not undergone any major reform since Late Middle English. But English is by no means the only language with such a mismatch between its spoken and written form due to the reluctance of its orthography to catch up with sound change. French, for example, is just as weird. It has plenty of ambiguous spellings with more than one possible pronunciation and alternative spellings for one and the same phoneme in one and the same position. It easily beats English when it comes to mute consonants: vin, vins (verb and noun), vain, vains, vint, vaincs, vainc, vingt are all pronounced /væ̃/. Massive mergers of this kind would surely have caused any normal language to collapse, so French can’t be normal, can it? Irish spelling was even worse before its mid-20th-c. modernisation, and still remains a pretty complicated affair (regular, but you have to master quite a few rules to figure out how to pronounce bhfaighidh). Lhasa Tibetan has lost many consonant both in initial and final clusters, but has retained their spelling representation. And while we are in Asia, isn’t written Chinese even a little weird? Professor McWhorter says that “in countries where English isn’t spoken, there is no such thing as a ‘spelling bee’ competition”. To my knowledge, national spelling competitions are organised in many countries, including Poland. I have finished runner-up in one of them, and I can testify it was tough going. Is Polish a normal language?

The  next claim is that English is not similar enough even to closely related languages to guarantee partial mutual comprehensibility. Well, this depends on what we regard as a “related language”. If, for example, we treat Scots as a close cousin rather than a variety of English, we have to agree that English and Scots are partly comprehensible to each other’s speakers (more so, I presume, than Standard Dutch and High German). English and Frisian are more closely related to each other than either is to the rest of Germanic, but they became separated geographically more than 1500 years ago and, unlike Dutch and German, or Spanish and Portuguese, have not remained in contact or been connected by a continuum of intermediate dialects. If that makes English weird, Greek, Albanian and Armenian are even weirder (not to mention such orphan languages as Japanese, Burushaski or Basque).

According to McWhorter, English is the only Indo-European language without grammatical gender. This sweeping statement is simply false. Let’s begin with the observation that the “classical” three-way distinction (masculine : feminine : neuter) probably did not exist in Proto-Indo-European itself, which only distinguished neuters from non-neuters (a state of affairs thought to be preserved by the extinct Anatolian languages such as Hittite). Once the three-gender system emerged in the rest of the family, it was reduced again in some branches. For example, although Latin had three genders, all the modern Romance language descended from it have only two, having eliminated the neuter. Among the Scandinavian languages, Danish and Swedish have merged the feminine and masculine into one “common” (non-neuter) gender. English has gone one step further. Already at the Early Middle English historical stage all morphological markers of gender were abolished in nouns and adjectives. The only trace of the former three-way system is a “natural gender” distinction in the third person singular of personal pronouns (he : she : it). But even within the Germanic group we find the same development in Afrikaans. If anything is “weird” about gender in English and Afrikaans, it isn’t its loss in nouns, but rather the survival of natural gender in pronouns: having pronominal but no nominal gender is very rare cross-linguistically. As for the rest of the Indo-European family, there is no grammatical gender in modern Persian, Balochi, Ossetic, and several other (though not all) Iranian languages. Armenian (also Indo-European) has no gender either. Both the genderless Iranian languages and Armenian are more consistent than English in their elimination of gender: their personal pronouns are genderless too. Armenian na means ‘he/she/it’; literary Persian has u ‘he/she’ (used only of humans) contrasting with ân ‘it’ (non-human), but the latter has taken place of the former in spoken Persian. As we can see, English is by no means alone even in Indo-European. And since more than 50% languages worldwide have no morphological gender or noun-class system, it is in good company.

The next feature is genuinely weird ­– here I completely agree. No other language known to McWhorter or to me marks the third person singular of present-tense verbs and leaves all the other forms unmarked (the sole exception is the present tense of to be). This is of course due to a historical accident caused by extralinguistic factors – the generalisation of the originally plural polite pronoun ye/you, which led to the disappearance of 2sg. thou/thee together with all the verb forms associated with it (art, wilt, dost, hast, drink(e)st). Nevertheless, it’s strange, though hardly strange enough to justify the claim that English is “deeply peculiar in the structural sense”.

Less convincing is the case for the weirdness of do-support in questions requiring inversion (does she smoke?), in negation (she doesn’t smoke), and in emphatic statements (she does smoke). Professor McWhorter has for a long time argued that the construction is due to Celtic influence and found exclusively in Brittonic Celtic and English. This is doubtful for several reasons. Constructions regarded as precursors of do-support occur sporadically in 14th-c. English, but fully assume their modern functions and begin to spread rapidly after ca. 1500. That’s 1000 years after the initial contact between the Anglo-Saxon and the Brittonic Celts. Why so late? Perhaps the construction existed in informal spoken English and didn’t make it into the written standard until the sixteenth century? Such an explanation could work for Old English, but hardly for the Middle period, from which we have a vast corpus of documents representing different genres, styles, and grammatical registers. There is, furthermore, no evidence of analogous constructions in Celtic pre-dating their début in English, so the direction of influence is uncertain (if it’s influence at all, rather than accidental convergence made likelier by the fact that inversion is used as a syntactic device in both cases). The fact that the Celtic analogue of do-support can also be found in Breton does not prove its great age. Contacts between the Celtic populations of Brittany and Cornwall were regular and intensive until the decline of an independent Duchy of Brittany in the 16th century. Anyway, even if we are dealing with a pattern borrowed from Celtic, English shares it with Welsh, Cornish and Breton, and so can’t be regarded as exceptionally weird in this respect. Again, the claim that such a construction does not occur anywhere else is exaggerated. Do-support analogues have been reported from some Lombard dialects of Northern Italy (the use of the auxiliary fa ‘do’ in questions), and even from Korean (in negation). A related construction (with Old Norse gera ‘prepare, do’) was used in Old Icelandic negation. Even if the English-specific combination of functions is “special”, its components can be found here and there.

The rest of McWhorter’s essay is devoted to the “mongrel vocabulary” of English (with most of it being actually French, Latin or Scandinavian), the richness of synonymy resulting from layers of borrowing, and the impact of Latinate loans on the development of a complex stress system. Though remarkable, these features are hardly unique of even rare. Plenty of languages have been relexified with foreign elements to a comparable degree, and with equally dramatic consequences for their morphology and phonology.

Of course the essay is pop-linguistics, addressed to a general audience, so the author has every right to simplify things for didactic convenience. He justly debunks the all-to-popular idea of English as the “model” language, so ordinary that it can be regarded as a safe testing-ground for linguistic theories (“let’s consider any language – for example, English”). However, in doing so, he errs on the opposite side, trying to make English look more extraordinary than it really is. English does have its structural idiosyncrasies, but so does just about any other human language. Tsakhur (a Northeast Caucasian language) has ‘tourquoise’ as a basic colour term (it’s also weird in having at least about 70 consonant phonemes); Czech is pretty much unique in having a fricative alveolar trill as a phoneme (a sound so rare that the International Phonetic Association has not yet come up a convenient symbol to transcribe it); Hawaiian has [t] and [k] as variants of the same phoneme in its extremely small inventory of consonants; the West !Xóõ language (in Namibia) has 43-111 different clicks (depending on how you analyse the system) in addition to a few dozen other consonants; Winnebago (Siouan) places the main stress on the third mora in longer words, while Macedonian (Slavic) regularly stresses the antipenultimate syllable; in Imonda (in Papua New Guinea) singular and dual nouns are marked with special endings but plurals are expressed as bare stems; Hungarian has 18 noun cases and two basic colour terms for different kinds of ‘red’. Pirahã (in Amazonas, Brazil) has a dozen phonemes (at most), no numerals, and no basic colour terms; the jury is still out on whether it has embedded clauses. On the other hand, it has a rich verb morphology, with an unusually large number od aspects and several shades of evidentiality (expressing the source/reliability of information). There’s a lot of weirdness out there.

The fact is that the total weirdness of a language is not a quantifiable notion. It makes little sense to say that one language is generally weirder than another (as opposed to being weirder in some particular respect). Caprices of history have elevated English to the status of global lingua franca. It doesn’t owe its unique position to any structural features, although the fact that it has an enormous population of speakers is relevant for its current and future evolution. Yes, it has many eccentric features but hardly represents an extreme type of language. “English is not normal”, while a catchy title, is at best a trivial statement that could be true of any language (if you concentrate exclusively on a few selected oddities).

23 September 2015

Nucg Nucg, Winc Winc: The Anglo-Saxon Dairy Business

Those of my visitors who know something about Old English poetry may have realised that the link between the F-word and churning butter (see the previous post) is not just etymological – it’s a literary allusion.  Among the famous Anglo-Saxon riddles preserved in the Exeter Book we find the following one (Riddle 54):

Hyse cwom gangan,    þær he hie wisse
stondan in wincsele,    stop feorran to,
hror hægstealdmon,    hof his agen
hrægl hondum up,    <hrand> under gyrdels
hyre stondendre    stiþes nathwæt,
worhte his willan;    wagedan buta.
Þegn onnette,    wæs þragum nyt
tillic esne,    teorode hwæþre
æt stunda gehwam    strong ær þon <hio>,
werig þæs weorces.    Hyre weaxan ongon
under gyrdelse    þæt oft gode men
ferðþum freogað   ond mid feo bicgað.

An Anglo-Saxon churn lid, with the Freudian hole
The Exeter Book (written more than one thousand years ago) is the largest extant anthology of Old English poetry. It contains diverse stuff, from solemn religious and allegorical poems, saints’ lives, elegies and fragments of heroic legends to comic, somewhat naughty, light compositions, such as Riddle 54. There are as many as 96 Old English riddles in the manusctipt (the genre is hardly documented in any other source). Many of them have very serious religious solutions, but certainly not this one. Good translations of the riddles are hard to get by. Much is lost in translation, and humour is usually the first victim. A specialist can always enjoy the original, but for the sake of those whose Old English is not very fluent I’m going to offer my own translation, for what it’s worth. At least it isn’t a horrible mistranslation (some others are) and it tries to capture the spirit of the original. I also hope it isn’t too stilted (for a piece of Old English verse).

Some things are practically untranslatable. For example, Old English had grammatical gender, and the use of feminine personal pronouns (corresponding to Modern English she and her) doesn’t mean that the pronoun indicates a female human being. It can refer to any object whose Old English name is a feminine noun (e.g. tunge ‘tongue’, bōc ‘book’,  duru ‘door’, etc.). It may suggest a woman, but since the alternative possibility is also probable, the suggestion is much weaker than in Modern English. This subtle ambiguity would be lost completely if she were replaced by it, so I let it stay. Just remember that in Modern English not only ships but also some tools and utensils can be conventionally personified by their users and referred to as “she”. It isn’t quite the same thing as Old English grammatical gender, but must suffice to justify my artistic licence.

Another problem is that Old English is a dead language and its written record if far from perfect. The words in angle brackets represent editorial emendations in places where the text seems to be corrupt. The first of the restored forms, <hrand> actually reads rand in the manuscript, but this can’t be the word intended by the poet. The rules of Old English poetic alliteration demand something beginning with h in the first stressed position of the second half of the line. The most likely emendation is hrand. Unfortunately, such a word-form does not occur anywhere else in the entire Old English text corpus. The context requires a verb in the past tense here. A past tense like hrand presupposes the infinitive *hrindan, past tense plural *hrundon, past participle *hrunden, etc. But what might they mean? Not only is the verb otherwise unknown from Old English; it has left no Middle of Modern English descendants either. To use a technical Greek term, it’s a hapax legomenon, a word appearing only once.

There’s nothing wrong with being a hapax. It’s the inevitable consequence of the fact that words have wildly different frequencies of use (a common motif in my blog posts). In fact, in any large corpus of texts at least about 40% of the words (types, not tokens) occur only once. The same is true of Old English: more than half of the entries in any more-or-less complete Old English dictionary occur only once or twice in the surviving texts. So hrand is not anything unusual, just a little enigmatic.

What about possible cognates in other Germanic languages? We have Old Icelandic hrinda (past tense hratt < *hrant < *hrand) whose precise meaning is known: ‘push, hurl down’ and, figuratively, ‘launch’ or ‘expel, get rid of’ (the verb has survived in Modern Icelandic and Faroese). The literal meaning roughly fits the context of Riddle 54. Most Modern English translations use thrust; I prefer shove because of its greater semantic overlap with Scandinavian hrinda, and also for the sake of alliteration. Last but not least, shove is less dignified than push or thrust, and has the kind of colloquial vigour they lack, which is an advantage in this case. All right, I’ve never tried it before, so here goes!*)

A lad came walking    to where, as he knew,
she stood in a corner;    stepped in from afar,
a brisk bachelor,    tucked up his own
shirt with his hands,    shoved under the girdle
of the one standing    a stout thingumajig
and worked his will;    both rocked back and forth.
The servant quickened up:    at times he was of use,
a handy workman,    he grew weaker though
with every stroke,    strenghtless too soon,
weary from work.    There began to form
under her girdle    that which good men often
dearly desire    and procure with money.

And the solution is yes, yes, you’ve guessed correctly!  a butter churn, that is OE ċyrn. By the way, this word occurs three times in Old English texts: once as cyrin (sg.), once as cyrne (pl.), and once as cirm (misspelt by the scribe). As you can see even the citation forms that we use for convenience represent “Standard Old English” imposed by modern dictionary editors rather than the actual language of the manuscripts.

An early 20th-century postcard
Needless to point out, ĊYRN [wink wink, nudge nudge, say no more, say no more] is the “formal” solution of the riddle. The informal one is as obvious to us as it was to any Anglo-Saxon audience in the tenth century. Other ambiguous riddles in the Exeter Book exploit the same risqué ambiguity: the alternative interpretation is invariably bawdy. Their innuendo-laden humour may be crude, but it still appeals to the modern reader. For the survival of the whole collection we are indebted to Leofric, Bishop of Exeter, a well-educated bibliophile, who died in 1072, bequeathing his impressive manuscript collection to Exeter Cathedral. He apparently did not regard the riddles as subversive enough to be denied the shelter of the cathedral library. Riddle 54 helps us to understand why, back in 1290, a chap from Ipswich, presumably a local dairyman, was called Simon Fukkebotere. It offers us a glimpse into the secret world of naughty associations that existed in the minds of Anglo-Saxon scribes and their audience (and still exist in ours), so we are not making things up when we hypothesise that the original meaning of fuck was ‘strike repeatedly’. Who knows, perhaps the speakers of Old English could use the same word for churning and, with less innocent intent, for [know what I mean? nudge nudge] the other thing.

13 September 2015

The Middle English Dictionary Needs a Fucking Update

Sorry, but I have to comment on this topic.  The news has already spread across the Internet, arousing the interest of several bloggers:

Somewhere among the indictment rolls of the county court of Chester (1310/11), studied by Dr. Paul Booth of Keele University (Staffordshire), a man whose Christian name was Roger is mentioned three times. His less Christian byname is recorded as well, with minor orthographic variations. The repetition guarantees that what the name contains is not an artefact resulting from a spelling mistake but the real thing: to wit, the man’s full name was Roger Fuckebythenavele. Though Roger was finally outlawed by the court and never heard of again, his legacy will make a lasting impact on English word studies. Not only does his second name move back the earliest attestation of fuck in its modern sense by many decades; it also, for the first time, establishes it as a bona fide Middle English word. Inevitably, the question will be raised again whether fuck is a native English word (a view defended, among others, by Lass 1995) or a relatively late newcomer (as argued e.g. by Liberman 2007: 78-87).

Like dog (attested once about 1050 and then again some 150 years later) and shark (attested once in 1442 and then again in 1569), fuck has a “ghost lineage” – a long attestation gap during which it must have existed, although no record of its use has survived. We do see several occurrences of fucke in 13th-century bynames like Fuckebotere (= “Fuckbutter”, 1290) and Fuckebeggar (1286/87), but in these, the verb seems to mean, respectively, ‘churn, beat’, and ‘punch, hit’ rather than you-know-what. Semantic associations leading from such meanings to the rather obvious sexual connotations of Roger’s bizarre cognomen are pretty natural, though. The coexistence of both ca. 1300 suggests that the use of fuck for sexual intercourse is a semantic specialisation which took place a long time ago. We find it not only in English but also (perhaps independently) in a few other Germanic languages. For details, you may consult the etymological information in the beautifully updated entry in the OED.

Fucking, Austria (probably unconnected)
I side with those who believe that fuck is old and has a respectable Germanic pedigree. The stem *fukkō-, with its characteristic double consonant, is easy to explain as a Germanic iterative verb – one of a large family of similar forms. They originated as combinations of various Indo-European roots with *-nah₂-, a suffix indicating repeated action. The formation is not, strictly speaking, Proto-Indo-European; the suffix owes its existence to the reanalysis of an older morphological structure (reanalysis happens when people fail to analyse an inherited structure in the same way as their predecessors). Still, verbs of this kind are older than Proto-Germanic.

One particularly clear example is English lick from Old Englich liccian < PGmc. *likkō-. Numerous cognates in other Indo-European languages show unambiguously that the PIE root was *leiǵʰ- ‘lick’. The expected Germanic reflex of *ǵʰ is a voiced fricative or stop (*ɣ/*g) resulting from the operation of Grimm’s Law. A different development in this case was caused by the suffix *-nah₂- , attached to the root in pre-Germanic times to yield *liǵʰ-náh₂- ‘lick (repeatedly)’. The root occurred in the reduced grade since the suffix carried the accent. After an unaccented syllable, the sequence *-ǵʰn- changed into *-gg-, which, as Grimm’s Law completed its course, became Proto-Germanic *-kk- (if the preceding vowel was short).

Many historical linguists don’t accept this development, known as Kluge’s Law (discovered more than a century ago but neglected for many decades). In recent years, however, so much evidence has been collected to support it that it seems unfair to call it “controversial” (if not something worse) any longer. The outcome of Kluge’s Law is the same for originally voiceless, voiced and “aspirated” (breathy-voiced) Indo-European stops: all of them yielded a voiceless geminate (double consonant) in the environment in which the law applied. After a long vowel or diphthong, however, the geminate was simplified, leaving a single voiceless stop.

There was a Proto-Indo-European root usually reconstructed as *peug- (or possibly *peuǵ-), meaning ‘stab, hit’ (cf. Latin pungō ‘pierce’, pūgnus ‘fist’, pūgna ‘fight’, pugil ‘boxer’; Greek púgmē ‘fist, fist-fight’). In combination with the *-náh₂- suffix we would get *pug-náh₂- > PGmc. *fukkō- ‘strike repeatedly, beat’ (like, say, “dashing” the cream with a plunger in a traditional butter churn). Note also windfucker and fuckwind – old, obsolete words for ‘kestrel’.

A number of words in other Germanic languages may be related to fuck. One of them is Old Icelandic fjúka ‘to be tossed or driven by the wind’ < *feuka-; cf. also fjúk ‘drifting snowstorm’ (or, as one might put it in present-day English, a fucking blizzard). These words fit a recurrent morphological pattern observed by Kroonen (2012): Germanic iteratives with a voiceless geminate produced by Kluge’s Law often give rise to “de-iterativised” verbs in which the double stop is simplified if the full vocalism or the root (here, *eu rather than *u) is restored.

If the verb is really native (“Anglo-Saxon”), one would expect Old English *fuccian (3sg. *fuccaþ, pl. *fucciaþ, 1/3sg. preterite *fuccode, etc.). If these forms already had “impolite” connotations in Old English, their absence from the Old English literary corpus is understandable. We may be absolutely sure that *feortan (1/3 sg. pret. *feart, pret. pl. *furton, p.p. *forten) existed in Old English, since fart exists today (attested since about 1300, just like fuck) and has an impeccable Indo-European etymology, with cognates in several branches. Still, not a single one of these reconstructed Old English verb forms is actually documented (all we have is the scantily attested verbal noun feorting ‘fart(ing)’).

One has to remember that written records give us a strongly distorted picture of how people really spoke in the past. If you look at the frequency of fuck, fucking and fucker in written English over the last 200 years, you may get the impression that these words disappeared from English completely ca. 1820 and magically reappeared 140 years later. Even the first edition of the Oxford English Dictionary (whose ambition was to be exhaustive) pretended they didn’t exist. The volume that should have contained FUCK was published in 1900, and Queen Victoria was still alive.

Google books Ngram Viewer

06 September 2015

A Jan’s Chance: The Fate of Innovations

Imagine that you start a linguistic innovation. One fine day you decide to replace the English word dog with a new, hitherto unused word — for example, jan. As of now, you will say, “I have to walk the jan”, “My jan’s name is Bruno”, and, “The jan is man’s best friend”. You will substitute jan for dog in set phrases such as “go to the jans” and “every jan has its day”. Jan would do its job neither better nor worse than dog. Both are arbitrary sound sequences (their pronunciation does not suggest what they mean); both are short and easily pronounceable. Dog has only one obvious advantage over jan: it is already an established, familiar, commonly used English word. There is no compelling reason why people should find it a good idea to abandon it just like that and learn to use a different word for the same concept. If you are really determined (and perhaps slightly nuts), you can try persuading your family and close friends to humour you and adopt your innovation when they are talking to you. You can bring up your children informing them that your family pet Bruno is a jan. But sooner or later they will find out that everybody else calls jans (including Bruno) dogs. Your experiment will almost certainly fail. Not because the word jan is useless, but because the function you’d like it to have is already carried out equally well by another word. It makes jan a “neutral” innovation — one that could play its role well enough but has no functional advantage over a preexisting competitor.

On the other hand, something similar to this thought-experiment really happened about one thousand years ago. The word docga (the Old English ancestor of dog), coined by an unknown innovator at an unknown date*), somehow became a widespread synonym of the established Old English word hund, and after a few centuries managed to replace it in the mental lexicon of every English-speaker of the time. Although its dethroned predecessor did not become completely obsolete, its frequency of use dropped by at least an order of magnitude, and it had to undergo narrow semantic specialisation in order to survive. Today, a hound is a special type of hunting dog, not just any dog in general. And if you look at other languages, you will occasionally see similar cases of lexical replacement. French chien and Italian cane go back to Latin canis, as expected, but Spanish perro is an innovation (about as mysterious as dog). It seems some new words for old things do catch on, albeit rarely. The chances are slim but apparently larger than zero.

A selfie with a jan (whose name is not Bruno)
A lexical innovation is more likely to succeed if it finds and conquers a functional niche not yet occupied by any other word. In this way it makes itself useful, which may give people a powerful incentive to adopt it. For example, the word selfie made its first recorded appearance in September 2002, in Australia (or rather in the Australian sector of cyberspace). Within the next few years it grew popular among (mostly young) English-speaking Internet users worldwide, slowly gaining the status of buzzword. Then it infected Facebook communities and its popularity soared to the zenith (as did the number of selfies published online). In 2013 the Oxford English Dictionary declared it the word of the year.

How is it possible for an innovation to become “fixed” in a large speech community? How do the the chances of fixation depend on the functional value of the innovation? What is that functional value? What happens to innovations that have enjoyed some success  but haven’t yet reached fixation? This is what my next blog posts will be about.

*) Nobody knows for sure where Old English docga came from. My own modest etymological proposal can be found here.