Monday, 16 May 2016

Y-DNA matches with Different Surnames

Why do I have Y-DNA matches that don't have the same surname as me? 

This is a common question that is asked when people first get their Y-DNA results. And there are several explanations for it. The Y-DNA test only compares Y chromosome DNA to other Y chromosome DNA. A “match” between two men usually means one of three Scenarios (bear in mind there are exceptions to every general rule):

Scenario 1. 
The two men are related via a common ancestor who lived some time since the appearance of surnames (e.g. within the last 1000 years or so in Britain & Ireland). And there are several sub-scenarios in this situation:
a) the two men have the same surname - in which case, they are probably related via a common ancestor (who bore that same surname) some time within the last c.1000 years. This is the scenario we are most interested in and forms the basis of surname studies.
b) the two men have different surnames - in which case an NPE may be present i.e. Non-Paternity Event (or Not the Parent Expected). In other words, both men have a common ancestor within the last c.1000 years, but the surname on ONE of their lines (we don’t know which one) has changed over the years because of a secret adoption, or infidelity, or illegitimacy, etc. Postscript: as mentioned in the Comments below, there are many other possible causes for "surname discontinuity". For example, some families adopted new surnames after emigrating to the US, changing the name to perhaps sound more English. And of course some societies adopted inherited surnames quite late (e.g. Turkey in 1934) or not at all (e.g. Iceland, Tibet).
Scenario 2. 
The two men are related before the appearance of surnames (e.g. pre-1000 AD) - in this scenario, the two men will have different surnames (with rare exceptions). This scenario can arise where there has been very little mutation in the DNA over the course of the last c.1000 years or so. Or where there has been a degree of Convergence (see below).

Scenario 3. 
The two men are related but much further back then they look. This is because of Convergence, where the two genetic profiles were identical 10,000 years ago (for example), but then mutate away from each other gradually over the millennia, and then (by chance) start mutating back towards each other so that it looks like the common ancestor is closer than he is (say 500 years ago rather than 10,000 years ago). Convergence is still being studied and not a huge amount is known about how commonly it is encountered. It is likely that it is more common in some haplogroup subclades than in others.

So in the situation where a man matches a man with a different surname, these are either cases of Scenario 1b (NPE) or Scenario 2 (pre-surname match) or Scenario 3 (Convergence). How can you distinguish between these three scenarios? Not easily, but there are certain clues that can help.

If one of the men matches other people with his surname, then it is less likely that his particular surname is the result of an NPE. And if the other man matches nobody with his surname (and there are people with his surname in the FTDNA database that he could potentially match), then the likelihood that an NPE has occurred somewhere along that man's direct male line is higher. On the other hand, if both men match others with their surname, then perhaps this is a case of Convergence.

If the two men have tested to 37 markers (or higher) and are exact matches, then this makes Scenario 1b more likely (i.e. an NPE has occurred somewhere in the past). The likelihood increases if there is an exact match at 67 markers or 111 markers. And on the contrary, the less close the match is (say 4/37 or 3/37), then the more likely this is a case of Scenario 2 (pre-surname match) or Scenario 3 (Convergence).

Looking at the terminal SNP results of a man's matches may give a clue as to which of the three scenarios is most likely to be present. You can examine the terminal SNPs of a man's matches (at the 111 marker level down to the 25 marker level) and see which SNPs are most common among his matches. Then by plotting these SNPs on the haplotree* you can get some indication whether or not there is evidence of Convergence (i.e. the SNPs fall onto different branches of the haplotree) or no evidence of Convergence (all of the SNPs fall onto the same branch of the haplotree). If there is no evidence of Convergence, then this makes Scenarios 1b or Scenario 2 more likely.  In the example below, the terminal SNPs of a man's matches all fall below SNP L226, suggesting that he and his matches all sit on the L226 branch of the haplotree. However, there may be some Convergence further downstream, as two of his matches sit on different branches below SNP FGC5628.

Performing additional downstream SNP testing (e.g. a SNP Pack or Big Y test) will help differentiate between the three scenarios. Here is what you might expect:
Scenario 1b (NPE) - the two men sit on the same downstream branch that is associated with the surname of one of them. The age of the common SNP might be somewhere in the last 1000-2000 years.
Scenario 2 (pre-surname match) - the two men sit on the same branch upstream (i.e. representative of a major subclade of the haplogroup, say L226). The age of the common SNP might be somewhere in the last 2000-8000 years.
Scenario 3 (Convergence) - the two men sit on completely different (i.e. very distantly related) branches of the haplotree and the common SNP is (say) >8000 years old.
If a recent NPE is suspected, autosomal DNA testing can help establish if the two men are closely related (i.e. within the past 5 generations or so).

Using these techniques will help distinguish between the three possible scenarios but in many cases there is unlikely to be a single definitive test that will give you the answer. The best you might be able to hope for is that taking all the evidence together, the balance of probabilities points toward a particular scenario as being the most likely.

Example: Plotting terminal SNP results of a man's matches shows that they all fall below SNP L226
(i.e. no evidence of Convergence before SNP FGC5628)
(click to enlarge)

*I use FTDNA's but you can use others too - ISOGG, the Big Tree, or YFULL's tree