13 May 2013

Mind the Asterisk!


In the previous post I pointed out that biologists do not use hypothetical taxa and their inferred features as data. But wait, linguistics is a different discipline. Perhaps in linguistics it’s perfectly legal to use asterisked reconstructions (like, say, Proto-Germanic *wulfaz ‘wolf’) as data on which higher-order reconstructions can be based? The structure of language families is hierarchical. We traditionally group uncontroversially related languages into “branches”, and for the members of each branch a respective ancestral protolanguage is reconstructed by applying the comparative method, right? Then we compare the “proto-branch” languages to reconstruct the most recent common ancestor of the whole family, don’t we?

No, we don’t. Proto-Indo-European was not reconstructed by comparing Proto-Indo-Iranian, Proto-Slavic, Proto-Italic, Proto-Celtic, Proto-Germanic, Proto-Anatolian, etc., with one another. It has always been reconstructed by comparing data extracted from a multitude of documented languages such as Vedic, Avestan, Old Church Slavonic, Serbo-Croatian, Latin, Old Irish, Middle Welsh, Albanian, Classical Greek, Biblical Gothic, Old Norse, Old High German, Hittite, Luwian, and so forth. Proto-branch languages are reconstructed first and foremost for the sake of quality control. The nodes of the family tree are where the most conservative features of the whole branch roughly coalesce, and where it is convenient to check the consistency of the reconstruction. Proto-Germanic is not reconstructed just by comparing English with German, Dutch, Icelandic, Gothic, etc. The reconstruction is informed by the rest of the family tree as well. There is considerable feedback from reconstructed PIE to reconstructed PGmc. To give a historically important example (one among many), for more than fifty years in the 19th century a large set of exceptions to Grimm’s Law remained unexplained, until it occurred to Karl Verner to look for evidence in outgroup languages such as Classical Greek and Vedic. The conditioning environment of the Proto-Germanic process now known as Verner’s Law was obliterated in Germanic itself, but preserved elsewhere.

Linguistic reconstruction is not conducted consistently in a bottom-up fashion, by piecing together smaller units before handling larger ones. A phylogeny is assembled as a whole, and if it becomes part of a grander phylogeny, incorporating outgroup evidence, it may have to be reorganised, and hypothetical  protolanguages may have to be redefined in order to optimise the enlarged model. This is what happened to Proto-Indo-European after the Tocharian and Anatolian languages had been added to the family tree. The common ancestor of the remaining IE languages is not the familiar Brugmannian reconstruction from the last decades of the 19th century. It has been deeply affected by new discoveries despite the fact that most Indo-Europeanists place both Tocharian and Anatolian outside the “crown group” containing all the modern IE languages.


WARNING: RECONSTRUCTION!
Thus, protolanguage reconstructions are not “data”. They are forever provisional and hypothetical. Using them as data is a category error. There are some thoroughly studied families like Indo-European whose protolanguages are reconstructible in considerable detail. In such cases even a historical linguist may be tempted to believe that the great success of the model makes it “real”, and so a reconstructed PIE word is as valid a piece of data as a documented word from a documented language. Such a belief is perhaps partly justified when a reconstructed form is used as a piece of shorthand intelligible to experts (who do not have to bother to list the obvious cognates) – and only if the reconstruction is straighforward and uncontroversial. Even in IE, however, reconstructions may contain questionable elements or require special explanations. Other “Eurasiatic” families are even worse off. Reconstructible Proto-Uralic lexemes would barely fill a Swadesh list. The very validity of Altaic as well-defined clade is disputed, and even assuming optimistically that Proto-Altaic is a valid concept, little of its structure can be reconstructed with any precision.

The use of *protoforms in datasets is not justifiable in any way if the reconstructions are highly conjectural, if they might be biassed (“improved” to make a point without sufficient evidence), or if they represent preliminary, speculative research whose quality remains controversial. The Languages of the World Etymological Database (LWED), produced by the Tower of Babel project, is precisely such a pioneering enterprise. Little wonder that the “Eurasiatic” reconstructions therein make liberal use of wildcard symbols, optional segments, variants and reconstructed features poorly supported by the comparative data – hallmarks of questionable comparison. They are also based on material drawn from a haphazard collection of sources, including some hopelessly outdated etymological dictionaries. And yet the compilers of the database claim that those reconstructions “may be used for regular comparative purposes – establishing phonetic correspondences and reconstruction – by future researchers”.

It’s a dangerous declaration: researchers, especially scientists with no linguistic training, may take it literally and believe that the etymologies in principle encapsulate reliable data, so that all the dirty work of actual linguistic analysis can be outsourced to the Tower of Babel team; the scientists can then use the condensed final product and need not worry about the rest. In the PNAS study of the Eurasiatic protolexicon the “final product” is then used as the basis for the determination of the size of cognate sets with a given meaning . So if one “proto-word” generates a cognate set spanning all the seven putative members of the Eurasiatic superfamily, and another one gives rise to a cognate set of just three, this difference can be expected to correlate with something real – even if the basis for the reconstruction is extremely tenuous, e.g. if on closer inspection it turns out that one of the alleged cognates is misreconstructed, another has been assigned the wrong meaning, still another is a loanword, and most present formal problems obvious to linguists but not necessarily to non-specialists. I will show in the next post that the “cognate set sizes” based on the LWED cannot be realistically determined with any reasonable accuracy, given the information provided in the database. The error may be so gross that the sizes determined in the study are simply fictitious.

[► Back to the beginning of the Proto-World thread]

39 comments:

  1. Proto-Germanic is not reconstructed just by comparing English with German, Dutch, Icelandic, Gothic, etc.

    Indeed, it cannot be. Ringe, Tarnow et al. (the Good Guys in phylogenetic linguistics) tried running their software against the modern Germanic languages only, and the software coughed up a hairball instead of sensible trees. There has been so much homoplasy (undetected borrowing) that although we'd know the languages were related if we didn't have the older versions, we would have no idea how.

    ReplyDelete
    Replies
    1. It gets worse. Without historical records as a guide the position of English within Germanic at all gets shaky. The Germanic core vocabulary might decide the question, but not without resistance.

      Delete
    2. Wasn't the dataset in that study limited to just the presence/absence of vocabulary items?

      Delete
  2. Whether some data is 'fact' or 'theoretical'('hypothetical') depends on the context: this is what history and philosophy of sciences have shown. Every fact is theory-laden; or facts are facts of a certain theory.

    For instance, A and B are facts within a theory T. This theory T can become a background theory in another theory U, whose facts are C and D. In this context, A and B are treated as facts (without any question).

    When we test some hypothesis (H), we derive a logical consequence with the aid of other background theories, one of which is theory of boolean logic itself. If someone disputes H, they don't dispute the truth of background theories. What if one uses questionable background theories to derive consequences of the hypothesis in question?

    ReplyDelete
  3. Whatever the theory that interprects a "fact", the fact itself should be the result of empirical observation, not something removed from empirical observation to the degree we see here. Imagine that a biologist wants to use data from a publicly accessible repository of DNA sequences rather than bother to do the sequencing herself. She has every right to do so, even if the data are pre-processed and annotated, i.e. interpreted (in the context of certain theories). But there is an almost 1-1 correspondence between "raw" empirical data and the sequences in the bank (not excluding occasional flaws, mistakes, and poor annotations). Anyone can do the sequencing independently and get the same or almost the results (and document the conflicts, if any). My problem with the LWED database is that it's overloaded with figments of insufficiently constrained comparative methodology, and so cognacy counts based on it are not very different from randomly generated numbers.

    ReplyDelete
  4. Correction: of course their distribution is far from random, but the pattern in them is a reflection of wishful thinking rather than reality, as I am planning to show.

    ReplyDelete
  5. Whether some data is 'fact' or 'theoretical'('hypothetical') depends on the context: this is what history and philosophy of sciences have shown. Every fact is theory-laden; or facts are facts of a certain theory.

    I see this kind of boilerplate assertion all the time, and all it does, as far as I can tell, is to obfuscate important distinctions and make it harder for non-experts to understand what's going on. Of course, that may be the point -- to demolish the hegemonic idea of "expertise." Every person their own historical linguist, and damn the asterisks!

    ReplyDelete
  6. For an example of an earlier expression of this position, or rather methodology, see my (B. Alpher) "Pama-Nyungan: Phonological reconstruction ..." in H. Koch & C. Bowern eds (2004; Benjamins)Australian Languages: Classification and the Comparative Method (concluding remark to section 3.5, p 102), and also introductory remarks to "Some Proto-Pama-Nyungan Paradigms: a verb in the hand is worth two in the phylum" in G.N. O'Grady & D. Tryon (eds) Studies in Comparative Pama-Nyungan (Pacific Linguistics C 11; Canberra: ANU, 1990). Please note that I AM IN NO WAY CLAIMING TO BE THE FIRST TO MAKE THESE POINTS.

    ReplyDelete
    Replies
    1. Your comparative work on Pama-Nyungan could be used as the official standard of careful methodology applied to difficult data. I think good historical linguists have always followed similar principles at least implicitly, even if few of them have been as clear as you about it.

      Delete
  7. "Whatever the theory that interprects a "fact", the fact itself should be the result of empirical observation, not something removed from empirical observation to the degree we see here."

    Let me give you an historical example: Clerical contemporaries of Galileo claimed that the mountains on the moon were illusions created by the latter's telescope. Their opponents saw something there on the moon. Here, one and the same 'empirical' observation is clouded by two different theories, even if the modern man ridicules those clerics. Two competing theories describe the same observation differently.

    As long as both the defenders and opponents agree to the same background theories (like that of telescope), there we can claim that the observation is empirical (no scare quotes, here). Otherwise, such 'empirical' observation depends on the background theory, which many question. So, we need to consider such cases.

    ReplyDelete
    Replies
    1. "Two competing theories describe the same observation differently. "

      And yet the phenomenon they describe is one same phenomenon.

      "As long as both the defenders and opponents agree to the same background theories (like that of telescope), there we can claim that the observation is empirical (no scare quotes, here)."

      No, both may be in error. This has been a huge problem in American languages for generations of researchers to deal with. They have faulty records of languages to work from, that misrepresent the phonemic distinctions because none of the European(-American)people recording the languages were hearing the relevant phonetic distinctions. People educated with the Latin-based grammars twisted onto English were often hindered rather than helped by their education whe it came to analyzing their field data inot grammars for the languages they were describing, and over a period of generations had to develop some pretty fundamentally new schemata.

      Even though because different people agree on a mistake, that doesn't make it correct.

      Delete
    2. Jim, your point is well taken. Yes, both can be in error. Agreement doesn't guarantee the truth: after, people believed that the earth was flat, since that was/is how it is experienced.

      Delete
  8. I doubt if Galileo's critics actually used a telescope to verify his interpretation. It doesn't take modern equipment to see the shadows of the lunar mountain-ranges. Schiaparelli's Martian canals are perhaps a better analogy for Eurasiatic reconstructions. But anyway, science resolves such conflicts by developing better telescopes or other ways of data-collection (eventually sending out space probes). Today we know that the mountains on the Moon are real and the canals on Mars aren't. Because we have gathered more data, and not through scholastic debates.

    ReplyDelete
  9. While I have no objections at all to the practical difficulties of using reconstructions (especially tentative ones) as data for further reconstructions, I wonder about categorically rejecting that principle altogether. The reconstruction of PIE has historically involved direct comparison more often than not, but that doesn't necessarily translate directly into an invalidation of a 'bottom up' approach. When dealing with closely related languages, especially ones whose relationships are already established, reconstructing directly from the attested languages without (explicitly or implicitly) considering intermediate nodes might not actually be the best way to go.

    The thematic genitive plural might be a good discussion case. There are a variety of endings in Germanic: Gothic -ē, ON, OE -a, and OHG -o. The decision about which of these forms actually is comparable to IE *-oʜom requires not just comparing each language to other IE languages, but specifically working out the inner Germanic correspondences. It's this inner Germanic work that shows that Gothic doesn't fit, and that the OE and OHG forms are more diagnostic than ON, and point to an ending that was a some sort of long *ō. This can be refined by looking at other IE languages (and PIE), so that we actually reconstruct a 'trimoric' (whatever that means phonetically) o that might well still have been nasalized: *õⁿ (double macron isn't showing right). Once these inner-Germanic correspondences are worked out, it would be enough to say that the OHG ending is direct comparative evidence for the PIE one, but you can't really do that until you've established how the Germanic sounds relate to each other in descent from PGmc - and that involves a consideration of PGmc, whether or not you directly give the asterisk form.

    I'm not saying that we should reconstruct intermediate proto-languages entirely in isolation and only then compare them (though that might be a helpful *step* for particular problems), but that the two methodologies get used hand-in-hand, and a bottom up approach is and should be part of the regular historical linguistic toolkit.

    (For what it's worth, Verner's Law isn't a straightforward example of PIE in our reconstruction of PGmc, since it's a description of a sound change *between* PIE and PGmc, not part of the latter. It adds some clarification to know the PIE background, but the alternations left by the Law can be reconstructed fairly well from Germanic alone. We don't need to know about PIE accent to get PGmc *faðēr and *brōþēr.)

    (Too long, so I'm continuing in another comment.)

    ReplyDelete
    Replies
    1. When dealing with closely related languages, especially ones whose relationships are already established, reconstructing directly from the attested languages without (explicitly or implicitly) considering intermediate nodes might not actually be the best way to go.

      Far be it from me to advocate such an extreme approach. The reconstruction of a family tree necessarily involves the reconstruction of intermediate nodes (and a combination of "bottom up" and "top down" reasoning). Intermediate nodes are essential for a variety of reasons -- from parsimony (factoring out shared innovations) to being convenient reference points. What I emphasise is the fact that reconstructions are always hypotheses sensitive to any modifications in the family tree. With securely established groupings it makes less difference if one uses asterisked forms for brevity (anyone can supply the actual data and everyone knows the relevant changes). But the same "*" is employed to usher in forms with very little if any factual support.

      For what it's worth, Verner's Law isn't a straightforward example of PIE in our reconstruction of PGmc, since it's a description of a sound change *between* PIE and PGmc, not part of the latter.

      True, but the alternations produced by it were part of the synchronic grammar of the Germanic MRCA. It's hard to tell to what extent the alternations remained transparent. There's little evidence for the dating of the fixation of initial stress, to begin with.

      Delete
  10. How this leads back to long range comparisons could be a very long discussion. The problem isn't necessarily basing a hypothesis on hypotheses. Isn't that one good way to test a hypothesis, to see if useful in explaining more things? But the hypotheses get increasingly tenuous, and the uncertainties (always there, but very obvious when people start resorting to cover symbols and parentheses) amount to saying 'this hypothesis is too shaky to be used any further'. PGmc *wulfaz is a pretty good hypothesis for the Gmc data (whether you think of it as a hypothetical form that was actually spoken in some manner loosely captured as /wulfaz/, or as a very complicated shorthand meaning something like 'enough Germanic languages contain words regularly corresponding to each other in such a way that we feel comfortable using them as a unified data set, with specific phonological-morphological-semantic features implied by the notation *wulfaz 'wolf', in comparison with other related Indo-European languages). It's a good hypothesis because it is so closely grounded in the data of the Germanic languages. The PIE *wĺkʷos is a less good hypothesis, because even though it matches a lot of the recorded data pretty well, there are more discrepancies. It's still OK, particularly because there's so much corroborating evidence that the languages involved are definitely related anyway, with sound correspondences along the lines of what *wĺkʷos shows by and large. So we tend to accept it, but we have to recognize that *wulfaz is on better footing (having no irregularities in its relation to the linguistic data). If a long range comparison person were to come along (as I'm sure someone somewhere has) and compare *wĺkʷos itself to some Uralic form, this would form the proposal of yet another hypothesis. This isn't necessarily an invalid proposal right off the bat, but it would inherit all the uncertainties of the previous one, and add new uncertainties as well. An Indo-Uralic *form involves a much more complicated implied relationship between the actual linguistic forms, and that complexity needs both a lot of corroboration (to establish real regularity from chance) and a recognition that if *wĺkʷos is more uncertain that *wulfaz, *[IUwolf] will inevitably be less certain still, even if someone managed to establish beyond reasonable doubt that those families were related.

    ReplyDelete
  11. I would put it like this: "hypotheses based on hypotheses" are fine at the initial exploratory stage, when we try to establish a preliminary framework (to be abandoned if too many pieces of the jigsaw puzzle seem out of place). When the proposal becomes more solid, we should rethink the whole construction, even its seemingly solid parts. For example, the laryngeal theory was first formulated on the basis of internal reconstruction within PIE (it was "a hypothesis based on a hypothesis"). It seemed at the time that the laryngeals belonged to the pre-PIE stage and did not have to be taken into account when discussing the histories of "the branches". But a rethinking of the their role (not just their partial survival Anatolian) led to regarding them as real consonantal segments with reconstructible phonetic properties, highly relevant for understanding branch-specific developments (capable of blocking Brugmann's Law in Indo-Iranian, causing aspiration in Sanskrit, tonal effects in Balto-Slavic, vowel-breaking in Tocharian, Greek and Armenian, etc.). As such "special effects" were discovered and examined in detail, the laryngeals had to be included in the segmental inventories of some of the proto-branch languages. They are no longer abstract algebraic "coefficients".

    ReplyDelete
  12. "No, we don’t. Proto-Indo-European was not reconstructed by comparing Proto-Indo-Iranian, Proto-Slavic, Proto-Italic, Proto-Celtic, Proto-Germanic, Proto-Anatolian, etc., with one another. It has always been reconstructed by comparing data extracted from a multitude of documented languages such as Vedic, Avestan, Old Church Slavonic, Serbo-Croatian, Latin, Old Irish, Middle Welsh, Albanian, Classical Greek, Biblical Gothic, Old Norse, Old High German, Hittite, Luwian, and so forth. Proto-branch languages are reconstructed first and foremost for the sake of quality control. The nodes of the family tree are where the most conservative features of the whole branch roughly coalesce, and where it is convenient to check the consistency of the reconstruction. Proto-Germanic is not reconstructed just by comparing English with German, Dutch, Icelandic, Gothic, etc. The reconstruction is informed by the rest of the family tree as well. There is considerable feedback from reconstructed PIE to reconstructed PGmc."


    So, do you mean--as per Robert A. Hall Jr. says in his work Proto-Romance Phonology--that the reconstructed proto-language is first reconstructed from the oldest attested sources, and then revised based off new data from the modern languages?

    ReplyDelete
    Replies
    1. PIE is a putative entity which never existed as an actual language spoken by actual people. Although the classical genealogical tree model for representing language relationships is a simplification, in the particular case of the IE family is a huge one.

      Delete
    2. Hi, Brendan,

      The oldest attested sources are often far from ideal as evidence. For example, the Mycenaean Greek corpus is wonderfully archaic but deplorably limited (and written in a script that fails to convey many phonological detais), so it supplements and informs Ancient Greek rather than the other way round. The same goes for Elder Runic, Ogham Irish, etc. In practice, it's nice to have a reasonably old "classical" stage which can be regarded as a representative documentation of the language in question (or at least some of its varieties, styles and registers). The advantage of using, say, Old English rather than Modern English is that you don't have to take into account more recent losses and gains. The disadvantage is that there are inevitable attestation gaps (from individual words to whole dialects accidentally missing from the historical record). Using later material (including modern languages) often makes it possible to fill in those gaps.

      Excessive emphasis on those "classical" varieties can distort the picture too. People sometimes forget that Old Church Slavonic is not Proto-Slavic, Biblical Gothic is not Proto-Germanic, and Mycenaean is not Common Greek. Some modern Indic languages may be more archaic than Vedic in some selected respects (that is, they preserve conservative traits where Vedic had innovated). It's the total evidence that counts, and which part one should consider at the first stage of reconstruction depends on several factors, age among others.

      Delete
  13. PIE is a putative entity which never existed as an actual language spoken by actual people. Although the classical genealogical tree model for representing language relationships is a simplification, in the particular case of the IE family is a huge one.

    Evidence?

    I mean, the PIE reconstructions found in current reference works are certainly time-averaged, as e.g. our esteemed host has pointed out here from slide 22 onwards. But that, ironically, is precisely because people haven't been taking the tree model seriously enough. Once they got to the point of making this kind of map, they mostly just threw up their hands and said "we will never know" instead of trying to find the most parsimonious explanation.

    ReplyDelete
    Replies
    1. The improved PIE tree depicted by Piotr (Rodríguez Adrados has a similar proposal) is of course more accurate than the classical one, but in my opinion it's still insufficient.

      The thing is among the more than 2,000 lexical items reconstructed for PIE, there're doublets (and possibly also triplets) such as e.g. *sa(n)k- 'holy' (Latin, Hittite) vs. *yag´- 'to worship' (Indo-Greek) which could only be explained by language replacements at earlier stages. In other words, the real PIE (if it ever existed) would be conceptually more alike to "Proto-Nostratic" than to classical PIE.

      Delete
    2. ...What. Why couldn't "holy" and "worship" belong to different word families? They do in English. They also do in German, where I can't find any cognate of worship.

      What is the Hittite form, BTW?

      Delete
    3. I also don't understand what you mean by "conceptually more alike to 'Proto-Nostratic'". People who try to reconstruct Proto-Nostratic do of course think that they're trying to reconstruct a real language that was spoken by real people. Some of them have even pinpointed the Urheimat with ludicrous precision (the Natufian culture of late Paleolithic Palestine).

      Delete
    4. ...German, where I can't find any cognate of worship.

      Wert (n. and adj.) are of course cognate to OE weorþ 'worth, value, price' and 'worthy, dignified, respectable', and so they are root-cognates of weorþsċipe 'honour, magnificence, dignity'. It's clear that the religious meaning is secondary.

      Delete
    5. The Hittite form is sākl(ā)i- ‘custom, rule; rite, ceremony’. My point is the words I quoted are long-range cognates à la Nostratic, coming from different proto-languages. Judging from other correspondences, Eastern IE *y- would come a former palatal affricate. Other comparanda for this etymology are Altaic *tʰákʰì 'ceremony, sacrifice' (EDAL 2293) and Caucasian *=əqE 'to raise, high' (NCED 142). The semantic shift is motivated because victims were raised over the altar in sacrifices.

      Delete
    6. Oh, there's "worth" in worship...

      The Hittite form is [...]

      In that case I hope you aren't getting the *(n) from Latin sanctus, the pre-Christian meaning of which wasn't religious. It just meant "Right Honourable"; the city senate of Pompeii called itself sanctus ordo all the time.

      Do you have any other examples of IE *s corresponding to Altaic *tʰ and Caucasian 0?

      Delete
    7. I don't see any problem in linking sanctus < *sank-to- to sacer < *sak-ro- other than semantic specialization at an earlier date.

      In Starostin's reconstructions, Caucasian *= stands for a class prefix, which would correspond to the initial consonant in Altaic and IE. However, this is a bit odd, because we would expect Altaic *s- ~ Eastern IE *y- from an original palatal affricate, as in e,g. Altaic *sè:gù 'healthy; blood' (EDAL 1976) ~ Eastern IE *yak- 'healthy; treatment, cure' ~ Caucasian *tɕ'a:dɮwV 'blood; life' (NCED 420), while we've got Altaic *tʰ corresponding to Caucasian *ts in *tʰàrba 'a k. of small animal' (EDAL 2314) ~ Caucasian *tsa:rgwV 'weasel, marten' (NCED 1162). Surely we're dealing with different linguistic layers and/or borrowing paths.

      Delete
    8. As regarding your earlier post, I think the "real" Nostratic wasn't the common ancestor of IE and a bunch of other language families, but a language (or a group of closely related languages) spoken in the Taurus-Zagros mountains which were the source of several Neolithic Wanderwörter as well as a relative to Caucasian and Asianic languages such as Hurro-Urartian. This is in line with John Kerns's reflection in a coauthored book with Allan Bomhard (Reconstructing Proto-Nostratic. Comparative Phonology, Morphology, and Vocabulary, vol. I, p. 235-241): "I believe that Nostratic languages did not exist except as a part of Dene-Caucasian until the waning of the Würm glaciation, some 15,000 years ago."

      Delete
    9. Interesting; I have a book with that name, but it's by Bomhard alone and has different text on those pages. It's from 2007. I suppose the version you have is younger?

      Delete
    10. It's from their 1994 book, The Nostratic Macrofamily: A Study in Distant Linguistic Relationship, De Gruyter, p. 153. Allan has quoted this passage in extenso in some of his later books (I'm not sure about Reconstructing Proto-Nostratic), but the fragment cited by Octavià surely isn't seven pages long.

      Delete
    11. P.S. Mystery solved by Detective Google: Tavi must have confused his own references when he was copying and pasting stuff (see here, fn. 5 and 6).

      Delete
    12. Yes, that's right. Thank you for correcting my mistake, Piotr. :-)

      Delete
    13. as in e,g. Altaic *sè:gù 'healthy; blood' (EDAL 1976) ~ Eastern IE *yak- 'healthy; treatment, cure' ~ Caucasian *tɕ'a:dɮwV 'blood; life' (NCED 420)

      I would much rather compare the Altaic "blood" word to the PIE "blood" word *h₁esh₂-r/n-. The only obvious obstacle is the PA *-u, which I'd expect to be present in PIE as *-w – but perhaps the PA vowel reconstruction is simply wrong; Altaic vowel correspondences are really hard to figure out.

      The Caucasian word is barely even similar to the others.

      while we've got Altaic *tʰ corresponding to Caucasian *ts in *tʰàrba 'a k. of small animal' (EDAL 2314) ~ Caucasian *tsa:rgwV 'weasel, marten' (NCED 1162). Surely we're dealing with different linguistic layers and/or borrowing paths.

      This is much more interesting, and points to the Altaic word being a loan. Not necessarily from Caucasian specifically, because the word has similar cognates e.g. in Burushaski (/tɕarˈge/, ćargé in Berger's spelling; IIRC it means "squirrel"); I can't remember if a Yeniseian one has been proposed, but I expect so. – Would be an interesting third case of Altaic *b corresponding to some kind of [gʷ] elsewhere.

      Delete
    14. About Altaic *tʰàrba, I agree it must be a loan, and actually we find this word in Latin talpa (probably an Etruscan loanword) and Occitan darbon 'mole'.

      Delete
    15. I would much rather compare the Altaic "blood" word to the PIE "blood" word *h₁esh₂-r/n-.
      I formerly thought so, but there're several problems, including the fact the IE word is a reduplicated form (not of the same kind that Piotr's examples but rather like *Hok´te-h₃u '8'. Obviuosly isn't the same "flavour" of IE) corresponding to Kartvelian *zisx-L-.

      The Caucasian word is barely even similar to the others
      Not really, as the Caucasian lateral affricate have a similar behaviour than IE "palato-velars". Besides, "Kurganic" *y- (or rather *j-) regularly reflexes palatal affricates, as in e.g. *jorko- 'roe deer' ~ Caucasian *ʁHwo:r[ʧˀ]o (˜ -ʨˀ-, -ǝ) 'deer, game', with metathesis: *j ~ *ʧˀ and *k ~ *ʁ.

      Delete
    16. Possibly Latin sānus 'healthy' would be the reflex of the bare stem *seh₂-n-. In my opinion, this word belongs to a different "flavour" of IE in the same way than e.g. Anatolian verbal morphology do.

      Delete
    17. Oh yeah, I had forgotten about the Kartvelian form. But in which branches does *sè:gù have reflexes? Most have merged *s and *z, so I wonder...

      Where's the reduplication in *Hoḱte-h₃u? I can't see it.

      Delete
    18. But in which branches does *sè:gù have reflexes?
      According to the EDAL, it's attested in all of them.

      Where's the reduplication in *Hoḱte-h₃u? I can't see it.
      Surely you remember Lithuanian kek(e)tà 'detachment, flock', Uralic *kakta ~ *kæktæ '2', Altaic *gàgtà 'one of a pair'. The unreduplicated lexeme is *kʷet- 'to group in a pair', which is the base of the IE numeral '4' (thanks, Piotr!).

      Delete