National Geographic Photo Books

Tuesday, December 11, 2007

Palindromic Testing for ht35 Haplotypes

I've written before about my suspicions that a significant portion of the R1b1c in Italy may be an "eastern" variant of R1b1c called ht35. 

In short summary, the notion is that R1b1c originally arose in eastern Europe (perhaps Anatolia) and subsequently spread west into western Europe.  The western spread of R1b1c is predominately marked by a TaqI 49a,f RFLP haplotype known as ht15 whereas in eastern Europe both ht15 and ht35 are found.  We have very limited data about ht35, though Cinnioglu et al. contains some STR haplotype from which we can infer that R1b1c with DYS393=12 and DYS461=11 are more likely to be ht35 than ht15.  About 30% of the R1b1c in the Italy DNA Project fits this profile.

One problem is that the TaqI 49a,f test is not commercially available.  It would be nice to find a proxy, therefore, and so I am proposing a hopeful experiment.  Because the TaqI test fragments are in the Yq11 region, there is a chance that extended testing of STRs in this region (i.e. the FTDNA Panel 5 Palindromic Pack) might reveal interesting and useful haplotype data.

To that end, I propose to test several DYS393=12 and DYS461=11 members of the Italy DNA Project for the Palindromic panel of markers.  To the best of my knowledge, no one with that haploype has yet undertaken the full Panel 5 test.

I have set up a ChipIn to help fund this testing.  People who are interested in furthering this particular goal can "chip in" some money to fund the testing.  I have set the goal so that I can fully test at least two people for the entire panel.  If the goal is not met, I'll use all the funds on the palindromic markers most likely (in my judgement) to provide a useful result.  If more money than the goal is raised, I'll test additional haplotypes.


Labels: ,

Tuesday, September 04, 2007

What a difference a year makes. Part Two.

Here is a chart showing the mtDNA haplogroup breakdown of the current Italy DNA Project participants. We now have a total of 116 mtDNA results fo.


Haplogroup

Count of Haplogroups in Italy DNA Project
Frequency of Haplogroups in Italy DNA Project
H
50
43%
HV
2
2%
J
5
4%
J1
3
3%
J2
3
3%
K
7
6%
L
2
2%
M
1
1%
N1a
1
1%
N1b
1
1%
T
5
4%
T1
2
2%
T2
6
5%
T3
1
1%
T5
2
2%
U1a
4
3%
U2
1
1%
U3
4
3%
U4
2
2%
U5
3
3%
U6
1
1%
U7
2
2%
V
1
1%
W
2
2%
X
3
3%

What a difference a year makes

What a difference a year makes.

Less than a year ago, the Italy DNA Project had 100 members. Over the past week, we have surpassed the 300 member mark. As I've done when we passed milestones in the past I wanted to take a minute to update folks on the current Y-DNA and mtDNA haplogroup breakdowns.

Here's the current version of the Y-DNA assignments, based on 207 Y-DNA members:


Haplogroup
Calculated Frequency
of Haplogroups
in Italy DNA Project
E3b1a
11.6%
E3b1b
1.4%
E3b1c
0.5%
G2
8.7%
I1a
2.9%
I1b*
1.0%
I1b1
1.4%
I1b2
2.9%
J1
4.3%
J2a
15.9%
J2b
1.9%
K2
3.9%
L
1.0%
Q
1.0%
R1a
2.9%
R1b1b
1.0%
R1b1c
34.3%
R1b1c6
0.5%
R1b1c9
1.0%
R1b1c10
1.4%
R2
0.5%



You can see that the classification scheme I used has changed a little since the last update, to favor more precise classification. In many cases, the precise classification scheme has depended on individual analysis and some estimation but I think the overall presenation is very accurate. One case where more testing is probably necessary is in the case of R1b1c. The levels of R1b1c9 and R1b1c10 are probably too low, since few members have tested the crucial S-series SNPs offered by Ethnoancestry.

Tuesday, April 03, 2007

Near Eastern Origin of Etruscans

The New York Times has published a very decent summary of several recent studies on the genetic origin of the Etruscans. Several recent papers, including one using mtDNA from cattle breeds, have presented evidence that supports a Near Eastern origin for the people that became known as Etruscans.

Wednesday, March 21, 2007

mtDNA Haplogroup U3

Because of the tremendous genetic diversity in Italy, our project sometimes accumulates a group of folks that would be hard to find elsewhere.

One example is the proportion of mtDNA haplogroup U3 in our project. At nearly 5% of our mtDNA results, U3 is nearly ten times as heavily represented in our project as in mitosearch, for example. Because U3 is not very well studied, I was asked by one of our members to look into it. Here's some of what I found.

Haplogroup U3 is a subclade of haplogroup U, and can be distinguished by two commonly reported markers: 16343G (in HVR1) and 150T (in HVR2). In addition, there are two coding region markers (14139G and 15454C) which seperate U3 from other subclades of U.

Moreover, there are two common subclades of U3. U3a is defined by seven coding region markers and and by the HVR1 marker 16390A. U3b is defined by four different coding region markers (and the absence of 16390A, of course). U3a almost always has the HVR1 marker 16519C as well.

By convention HVR1 results are sometimes reported without the 16000 prefix, so U3b usually has HVR1 results of simply 343G and U3a usually has 343G, 390A, and 519C.

U3 is found at the highest frequency among populations around the Black Sea (e.g Bulgaria and Georgia), but is found throughout Europe. It most likely spread from the Caucusas as part of Neolithic expansion into Europe along the Danube River basin, as the map on the right suggests (click here for a printable version).

U3 (especially U3b) is also found at very high levels among some European Roma populations, likely due to a particularly strong founder effect. U3 is also found throughout the Near East and in North Africa.

So why is U3 disproportionately common in Italy, especially in Sicily? One possibility is that the Neolithic expansion that brought U3 into Europe was particularly successful in Italy. Another possibility is that U3, especially U3b, came to Italy with the Roma people in historic times. Additionally, U3 was likely present among many other peoples that had contact with Italy over the millenia (Phonecians, Byzantines, etc.). Perhaps, with more research and further testing, a more accurate picture of U3 will be forthcoming.

Sunday, January 28, 2007

R1b in Italy

In Italy, the most common y-haplogroup is R1b. It accounts for 35% of the members of the Italy DNA Project and about 40% of the total Italian population.

As you can see in the map on the left, it is found at notably higher frequencies in northern Italy than in the south. The regions with the highest concentration are Emilia-Romagna, Lombardia, and Le Marche. The regions with the lowest concentration are Sardegna, Campania, Calabria, and Sicilia.

The regions with the highest concentrations correspond roughly with the area of early Celtic influence and with the territory of the Cisalpine Gaul, and it is thus quite likely that the high frequency of R1b in these northern regions of Italy is due in part to migratory inflows from Celtic areas of Europe. R1b is the dominant haplogroup in northwest Europe, reaching 90% of the population in many areas.


Another explanation is that the modern populations of northern Italy preserve traces of earlier Italian populations (e.g. Etruscans), though this seems less likely to me.

The Etruscans are thought to have been tied to the Veneti, and R1b is present at comparatively low levels in Croat and Slovenian populations (the most direct descendants of the Veneti). On the other hand, R1a is relatively frequent in Balkan areas and in northeastern Italy. You can see traces of this distribution in the haplogroup frequency map for R1a in Italy, on the right, and an incursion of R1b from the Alps would be consistent with the drop in frequency of R1a in the Emilia-Romagna and Marche.



The steep north-south cline of R1b is probably also due in part to migratory inflows from Mediterranean, which resulted in haplgroups E3b and J2 diluting the frequency of R1b in southern Italy.

This can be seen in the map on the left, in which the combined frequency of haplogroups E3b and J2 is shown. E3b and J2 have complex histories, and a full treatment of them is beyond the scope of this discussion. Suffice it to say that southern Italy, Liguria, and Lazio demonstrate a significant impact from the eastern Mediterranean. If there was pre-Neolithic R1b in southern Italy in significant numbers, those numbers are greatly reduced today.

Haplogroup R1b is far from homogenous, however. In recent years professional and amateur geneticists have achieved a much more nuanced understanding of this group.

First, two distinct subclades of R1b have been widely observed and both are represented in the Italy DNA Project. R1b1b is most often found in Central Asia and R1b1c is most often found in Europe. Within R1b1c there are several further subdivisions recoginized in the current ISOGG tree, with the most common ones being R1b1c6, R1b1c7, R1b1c9, and R1b1c10.

R1b1c9 and R1b1c10 are defined by the SNPs S21 and S28, respectively, and both have been observed among our participants. Family Tree DNA does not currently test these two SNPs (EthnoAncestry does, along with some other important novel SNPs), but I expect they will soon. R1b1c9 and R1b1c10 are both associated with contiental Europe and the best thinking is that both originated from in the Balkans or Caucuses (though an Italian origin is, theoretically, possible as well).

R1b1c6 is most often observed in Iberia, while R1b1c7 is most often observed in Ireland. No members of the Italy DNA Project have been found to belong to either of these clades. For this reason, I am reluctant to encourage participants who are predicted to be R1b1c to undertake SNP tests until S21 and S28 (aka U106 AND U152) are included.

Few academic papers have studied the clades of R1b1c in any depth, and John McEwan and others have done an excellent job in collecting data from genealogists who have tested for these markers. I have created frequency maps based on McEwan's data, which show roughly the geographic associations of the four most common R1b1c clades.

One distinction that is potentially relevant to Italy involves a classification scheme that is different from the one used above. This is a bit arcane, but it involves the seperation of R1b1c into two different groups called ht15 and ht35. ht15 is found most in western Europe and ht35 is found most in eastern Europe and Asia. The ht15 and ht35 tests are not commercially available, and these types don't equate perfectly with the SNP-based trees that are currently used.

However one paper, by Cinnioglu et al., examined samples that were previously classified as ht35 for a number of SNPs and Y-STRs. It was found that ht35 contains an absurdly high proportion of DYS393=12. Interestingly, a quick glance at our project's results reveals that DYS393=12 is quite prevalent in Italy (especially southern Italy, as you'll see in a minute).

The vast majority of members of the common R1b1c clades have DYS393 values of 13. In Ireland, Scotland, France, and Germany the frequency of DYS393=13 among R1b folks is over 85%, whereas the frequenc of DYS393=12 is typically less than 6%.

In the Italy DNA Project, by contrast, the frequency of DYS393=12 is 28%: nearly five times as high as in western Europe.

Looking at Italy by region using a larger data set, as the map on the right does, it becomes immediately clear that the DYS393=12 phenomenon in Italy is largely a southern one. Frequencies are fairly high in regions like Puglia, Basilicata, Calabria, and Campania.

In the north of Italy, where R1b is most prevalent, the frequency of DYS393=12 drops to levels more typical of the rest of Europe. Again, this points to the importance of gene flow from Celtic regions to the population structure of northern Italy.

How does this compare with other places?

Well, the Cinnioglu paper found that DYS393=12 reached frequencies approaching 80% in Anatolia and nearly 70% in ht35 samples which were largely collected from the Balkans and Georgia.

I also did a survey of geographical projects at FTDNA and used that data (plus a little more) to create the DYS393=12 frequency map of Europe you see on the left.

I found high levels of DYS393=12 among the Polish project and the Czech project, which is consistent with the notion that high levels of DYS393=12 are associated with a variant of R1b that arose in the Balkans or in Eastern Europe. I also found high levels of DYS393=12 reported in the Dniester-Carpathian region (near Moldova) in a dissertation by Alexander Varzari.

Even among projects that have expressed a great interest in ht35, like the Border Reivers group, I found the overall frequency of DYS393=12 to be quite low and not statistically different from the rest of western Europe. The background levels of DYS393=12 across Europe (about 3-6%) could represent normal diversity within haplogroups that had DYS393=13 as the founder allele or small amounts of eastern R1b that has migrated west.

All in all, the study of R1b in Italy clearly suggests that this haplogroup is associated with both western (Iberian/Celtic) and eastern (Balkan/Asian) sources. It is significantly less clear whether there is a particularly "Italian" variant lurking in all this data.