National Geographic Photo Books

Sunday, January 28, 2007

R1b in Italy

In Italy, the most common y-haplogroup is R1b. It accounts for 35% of the members of the Italy DNA Project and about 40% of the total Italian population.

As you can see in the map on the left, it is found at notably higher frequencies in northern Italy than in the south. The regions with the highest concentration are Emilia-Romagna, Lombardia, and Le Marche. The regions with the lowest concentration are Sardegna, Campania, Calabria, and Sicilia.

The regions with the highest concentrations correspond roughly with the area of early Celtic influence and with the territory of the Cisalpine Gaul, and it is thus quite likely that the high frequency of R1b in these northern regions of Italy is due in part to migratory inflows from Celtic areas of Europe. R1b is the dominant haplogroup in northwest Europe, reaching 90% of the population in many areas.


Another explanation is that the modern populations of northern Italy preserve traces of earlier Italian populations (e.g. Etruscans), though this seems less likely to me.

The Etruscans are thought to have been tied to the Veneti, and R1b is present at comparatively low levels in Croat and Slovenian populations (the most direct descendants of the Veneti). On the other hand, R1a is relatively frequent in Balkan areas and in northeastern Italy. You can see traces of this distribution in the haplogroup frequency map for R1a in Italy, on the right, and an incursion of R1b from the Alps would be consistent with the drop in frequency of R1a in the Emilia-Romagna and Marche.



The steep north-south cline of R1b is probably also due in part to migratory inflows from Mediterranean, which resulted in haplgroups E3b and J2 diluting the frequency of R1b in southern Italy.

This can be seen in the map on the left, in which the combined frequency of haplogroups E3b and J2 is shown. E3b and J2 have complex histories, and a full treatment of them is beyond the scope of this discussion. Suffice it to say that southern Italy, Liguria, and Lazio demonstrate a significant impact from the eastern Mediterranean. If there was pre-Neolithic R1b in southern Italy in significant numbers, those numbers are greatly reduced today.

Haplogroup R1b is far from homogenous, however. In recent years professional and amateur geneticists have achieved a much more nuanced understanding of this group.

First, two distinct subclades of R1b have been widely observed and both are represented in the Italy DNA Project. R1b1b is most often found in Central Asia and R1b1c is most often found in Europe. Within R1b1c there are several further subdivisions recoginized in the current ISOGG tree, with the most common ones being R1b1c6, R1b1c7, R1b1c9, and R1b1c10.

R1b1c9 and R1b1c10 are defined by the SNPs S21 and S28, respectively, and both have been observed among our participants. Family Tree DNA does not currently test these two SNPs (EthnoAncestry does, along with some other important novel SNPs), but I expect they will soon. R1b1c9 and R1b1c10 are both associated with contiental Europe and the best thinking is that both originated from in the Balkans or Caucuses (though an Italian origin is, theoretically, possible as well).

R1b1c6 is most often observed in Iberia, while R1b1c7 is most often observed in Ireland. No members of the Italy DNA Project have been found to belong to either of these clades. For this reason, I am reluctant to encourage participants who are predicted to be R1b1c to undertake SNP tests until S21 and S28 (aka U106 AND U152) are included.

Few academic papers have studied the clades of R1b1c in any depth, and John McEwan and others have done an excellent job in collecting data from genealogists who have tested for these markers. I have created frequency maps based on McEwan's data, which show roughly the geographic associations of the four most common R1b1c clades.

One distinction that is potentially relevant to Italy involves a classification scheme that is different from the one used above. This is a bit arcane, but it involves the seperation of R1b1c into two different groups called ht15 and ht35. ht15 is found most in western Europe and ht35 is found most in eastern Europe and Asia. The ht15 and ht35 tests are not commercially available, and these types don't equate perfectly with the SNP-based trees that are currently used.

However one paper, by Cinnioglu et al., examined samples that were previously classified as ht35 for a number of SNPs and Y-STRs. It was found that ht35 contains an absurdly high proportion of DYS393=12. Interestingly, a quick glance at our project's results reveals that DYS393=12 is quite prevalent in Italy (especially southern Italy, as you'll see in a minute).

The vast majority of members of the common R1b1c clades have DYS393 values of 13. In Ireland, Scotland, France, and Germany the frequency of DYS393=13 among R1b folks is over 85%, whereas the frequenc of DYS393=12 is typically less than 6%.

In the Italy DNA Project, by contrast, the frequency of DYS393=12 is 28%: nearly five times as high as in western Europe.

Looking at Italy by region using a larger data set, as the map on the right does, it becomes immediately clear that the DYS393=12 phenomenon in Italy is largely a southern one. Frequencies are fairly high in regions like Puglia, Basilicata, Calabria, and Campania.

In the north of Italy, where R1b is most prevalent, the frequency of DYS393=12 drops to levels more typical of the rest of Europe. Again, this points to the importance of gene flow from Celtic regions to the population structure of northern Italy.

How does this compare with other places?

Well, the Cinnioglu paper found that DYS393=12 reached frequencies approaching 80% in Anatolia and nearly 70% in ht35 samples which were largely collected from the Balkans and Georgia.

I also did a survey of geographical projects at FTDNA and used that data (plus a little more) to create the DYS393=12 frequency map of Europe you see on the left.

I found high levels of DYS393=12 among the Polish project and the Czech project, which is consistent with the notion that high levels of DYS393=12 are associated with a variant of R1b that arose in the Balkans or in Eastern Europe. I also found high levels of DYS393=12 reported in the Dniester-Carpathian region (near Moldova) in a dissertation by Alexander Varzari.

Even among projects that have expressed a great interest in ht35, like the Border Reivers group, I found the overall frequency of DYS393=12 to be quite low and not statistically different from the rest of western Europe. The background levels of DYS393=12 across Europe (about 3-6%) could represent normal diversity within haplogroups that had DYS393=13 as the founder allele or small amounts of eastern R1b that has migrated west.

All in all, the study of R1b in Italy clearly suggests that this haplogroup is associated with both western (Iberian/Celtic) and eastern (Balkan/Asian) sources. It is significantly less clear whether there is a particularly "Italian" variant lurking in all this data.

Friday, January 26, 2007

Two Hundred Members!

This week the Italy DNA Project hit a new milestone with the addition of our 200th participant. As I write this, the project has over 140 Y-DNA participants and 68 mtDNA participants! It seems like just yesterday that our participation reached 150.

I've been working on a lot of new research, and I hope to finish it up soon so I can share it. Let me say now, though, that the Italy DNA Project is one of most genetically diverse projects at Family Tree DNA. We have a tremendous variety of haplogroups represented, including some subclades not yet reported in any other group. Even within the most common European haplogroup, R1b1c, we have greater haplotype diversity than just any other project I've seen.

For example, among R1b1c in Europe, the most common value for DYS393 is 13. In fact, in many projects DY393=13 is present in over 90% of all R1b1c haplotypes. By contrast, in the Italy DNA Project DYS393=13 is present in just 64% of all R1b1c haplotypes. We have far more DYS393=12 than other projects and far more DYS393=15 than other projects.

This could reflect the persistence of some ancient forms of R1b1c, remaining in Italy from the Upper Paleolithic age, or it could reflect the multiple migrations into Italy over the millenia from Balkan or Asian sources (which are not generally well reprsented in FTDNA projects). My bet is on a combination of both factors.

Regardless, the testing our participants are involved in is exciting and the results will almost certainly help shape the future of population genetics.

So, thank you everyone.