Short Version Haplogroup markers

zanipolo

Banned
Messages
2,071
Reaction score
65
Points
0
Ethnic group
Down Under
Y-DNA haplogroup
T1a2 - Z19945
mtDNA haplogroup
K1a4o
For conversations, I recommend that we use short label/terminal Y SNP names, but within the R haplogroup that we don't use the single letter "R", but rather use R1a, R1b or R2 as is appropriate. These haplogroups are only distantly related and both R1a and R1b are quite large.

Some may ask, why three letters instead of four, etc. ? I think the answer is that R1a and R1b are de facto standards both across the academic and hobbyist community. Most people know what they are. I am against using long phylogenetic labels in conversations.

As a case in point. This new paper was just published last week:

"A calibrated human Y-chromosomal phylogeny based on resequencing" by Wei, et al.
http://genome.cshlp.org/content/early/2012/10/04/gr.143198.112

If you open the free PDF text file and scroll down to the last page you will see very clearly that R1a, R1b and R2 are clearly broken and levels below that are just referred to as son, father, etc. No standard is shown below the R1a and R1b depictions.

Try this exercise. Imagine you are a novice and you want to do research. Hopefully, you will do some internet searches. Try googling R1b. Now try googling R. Which is more helpful?

R1a, R1b, R2 are commonly accepted terms that have meaning. R1a and R1b are too big to clump together.


Now, let me switch gears. I am NOT for discontinuing long phylogenetically intelligent haplogroup labels altogether. Apparently FTDNA will do this. However, I hope they, or at least ISOGG will still maintain a phylogenetically intelligent haplogroup labeling system in addition to short terminal SNP labeling.

Y DNA trees require some intelligent recordkeeping, or more specifically data design. Hopefully, we are beyond always having to have a bit based only graphic picture of the Y DNA tree. Every branch on the tree needs a tag that tells you what limb it grew out of. Every branch also needs data tags for multiple children branches or twigs. This is phylogenetic intelligence. The long haplogroup labels provide this and therefore are very useful in data analysis, statistical research and the like.

For instance, you can select "R1b1a2" with a wild card character and get all versions of people who are "R1b1a2" or deeper, regardless of the level of SNP testing they've actually gone through. Ysearch allows you to do this today, or at least the last time I tried this.

What about R-L176.1 and R-L176.2? They would sort out right next to each other in a report. This is a bit misleading, particularly to a novice. R-L176.1 is in R1a and R-L176.2 is in R1b and so these guys aren't closely related at all and probably 20k years apart.

I'm going to start following a conversational nomenclature the U106 project admin recommends. For myself, I'm going list that I'm R1b-L21 in postings. Actually, although not many will know what it is, I'm going to list myself as R1b-L21>L705.2 since L705.2 is my true terminal SNP and most people will get a pretty good idea where I fit by including L21.

I don't have answer for this problem, except to demonstrate why long phylogenetically intelligent labels are still important.
The problem is that all of these terminal haplogroup labels - R1b-M529, R1b-S145, R1b-L21, R1b-Z245, R1b-L459, R1b-rs11799226... are the same. Did you know that? Z245 and L459 are phylogenetically equivalent and the rest are just different names for the same SNP. I wonder what FTDNA will do for a person who is Z245+ but didn't test for L21 specifically. There is no need for them to test for L21 but I guess they will be a different haplogroup.

Here is the lesson learned. Just like pots are not people, SNPs are not clades. SNPs are just tags or signposts on branches of they Y DNA tree of paternal lineages. We don't know if an SNP is part way up a large branch, or at the base, or what. This will be dynamic that is always changes as more is discovered. In a sense, the long phylogenetic labels are more appropriate. They describe the branch, regardless of the number of SNPs on it. STRs can effectively define a branch as well. A case in point would be 490=10 among R1b-SRY2627 (M167) people.

The net is we need both. Long phylogenetically intelligent labels for analysis, as well as short acronyms for ordinary conversation. Even if the formatting is different the long phylogenetic names are still important so a display/charting program can draw the tree.

This is exactly what GEDCOM is. Perhaps we should store the SNPs in a GEDCOM file to drive the tree charts. As we need precise, hiearchical analysis, we can use the GEDCOM coordinate definitions rather than the multi-letter/digit approach. This wasn't idea but I think it is worth considering.
 
Last edited:
For conversations, I recommend that we use short label/terminal Y SNP names, but within the R haplogroup that we don't use the single letter "R", but rather use R1a, R1b or R2 as is appropriate. These haplogroups are only distantly related and both R1a and R1b are quite large.

Some may ask, why three letters instead of four, etc. ? I think the answer is that R1a and R1b are de facto standards both across the academic and hobbyist community. Most people know what they are. I am against using long phylogenetic labels in conversations.

As a case in point. This new paper was just published last week:

"A calibrated human Y-chromosomal phylogeny based on resequencing" by Wei, et al.
http://genome.cshlp.org/content/early/2012/10/04/gr.143198.112

If you open the free PDF text file and scroll down to the last page you will see very clearly that R1a, R1b and R2 are clearly broken and levels below that are just referred to as son, father, etc. No standard is shown below the R1a and R1b depictions.

Try this exercise. Imagine you are a novice and you want to do research. Hopefully, you will do some internet searches. Try googling R1b. Now try googling R. Which is more helpful?

R1a, R1b, R2 are commonly accepted terms that have meaning. R1a and R1b are too big to clump together.


Now, let me switch gears. I am NOT for discontinuing long phylogenetically intelligent haplogroup labels altogether. Apparently FTDNA will do this. However, I hope they, or at least ISOGG will still maintain a phylogenetically intelligent haplogroup labeling system in addition to short terminal SNP labeling.

Y DNA trees require some intelligent recordkeeping, or more specifically data design. Hopefully, we are beyond always having to have a bit based only graphic picture of the Y DNA tree. Every branch on the tree needs a tag that tells you what limb it grew out of. Every branch also needs data tags for multiple children branches or twigs. This is phylogenetic intelligence. The long haplogroup labels provide this and therefore are very useful in data analysis, statistical research and the like.

For instance, you can select "R1b1a2" with a wild card character and get all versions of people who are "R1b1a2" or deeper, regardless of the level of SNP testing they've actually gone through. Ysearch allows you to do this today, or at least the last time I tried this.

What about R-L176.1 and R-L176.2? They would sort out right next to each other in a report. This is a bit misleading, particularly to a novice. R-L176.1 is in R1a and R-L176.2 is in R1b and so these guys aren't closely related at all and probably 20k years apart.

I'm going to start following a conversational nomenclature the U106 project admin recommends. For myself, I'm going list that I'm R1b-L21 in postings. Actually, although not many will know what it is, I'm going to list myself as R1b-L21>L705.2 since L705.2 is my true terminal SNP and most people will get a pretty good idea where I fit by including L21.

I don't have answer for this problem, except to demonstrate why long phylogenetically intelligent labels are still important.
The problem is that all of these terminal haplogroup labels - R1b-M529, R1b-S145, R1b-L21, R1b-Z245, R1b-L459, R1bg-rs11799226... are the same. Did you know that? Z245 and L459 are phylogenetically equivalent and the rest are just different names for the same SNP. I wonder what FTDNA will do for a person who is Z245+ but didn't test for L21 specifically. There is no need for them to test for L21 but I guess they will be a different haplogroup.

Here is the lesson learned. Just like pots are not people, SNPs are not clades. SNPs are just tags or signposts on branches of they Y DNA tree of paternal lineages. We don't know if an SNP is part way up a large branch, or at the base, or what. This will be dynamic that is always changes as more is discovered. In a sense, the long phylogenetic labels are more appropriate. They describe the branch, regardless of the number of SNPs on it. STRs can effectively define a branch as well. A case in point would be 490=10 among R1b-SRY2627 (M167) people.

The net is we need both. Long phylogenetically intelligent labels for analysis, as well as short acronyms for ordinary conversation. Even if the formatting is different the long phylogenetic names are still important so a display/charting program can draw the tree.

This is exactly what GEDCOM is. Perhaps we should store the SNPs in a GEDCOM file to drive the tree charts. As we need precise, hiearchical analysis, we can use the GEDCOM coordinate definitions rather than the multi-letter/digit approach. This wasn't idea but I think it is worth considering.

I entirely agree with you, even if I am in a dilema with my HG

I don't know what I will get from FTDNA , although my 2 projects ( ALPGEN and T ) will use different formats, one using T1b-L446 and the other T1a2b which is ISOGG

I think FTDNA will use the thomas Kahn one

[FONT=&quot]http://ytree.ftdna.com/index.php?name=Draft&parent=root[/FONT]
[FONT=&quot] [/FONT]
 
I2 is a tricky case because standards become different immediately following "I2." The article recommends to just use "I-M223" for I2a2a (ISOGG terminology), but I disagree with that... should we similarly expect people to know what "Z138," "L38," "L233," and "L416" mean? We at least need to say "I1" or "I2" before jumping to an SNP, and I'd suggest that we even go one or two steps farther down the tree then.

The question for I2 is then: Which terminology? I say ISOGG for sure. It's unbelievable how long it's taking FTDNA to incorporate L460 into their tree, and questionable at this point if they ever will. That shouldn't stop us from being scientific and accepting overwhelming evidence that unites I2-P37 and I2-L35 into a single I2a. But we should be clear about it anyway, using nicknames and SNPs.

So, taking the article's I2-Z2062 as an example, I recommend "I2a2a-Roots Z2062+". It's a bit long, but gives the initial tree, the unambiguous nickname, and the terminal SNP. So even if we find a rare branch that unites with I2a beneath L460 and I2a2a becomes I2a1b1a, moving to "I2a1b1a-Roots Z2062+" shouldn't confuse anybody... it's still obviously a branch of the Roots group.

Some other examples:
L147: "I2a1b-Din L147+"
L233: "I2a1c-Western L233+"
Z79: "I2a2a-Cont1 Z79+"
L413: "I2b-Adr L413+"

Aren't those better than trying to do 100% long form or 100% short form? Who knows what "I-L147," "I-L233," and "I2a2a3a2a1a" mean?
 
I2 is a tricky case because standards become different immediately following "I2." The article recommends to just use "I-M223" for I2a2a (ISOGG terminology), but I disagree with that... should we similarly expect people to know what "Z138," "L38," "L233," and "L416" mean? We at least need to say "I1" or "I2" before jumping to an SNP, and I'd suggest that we even go one or two steps farther down the tree then.

The question for I2 is then: Which terminology? I say ISOGG for sure. It's unbelievable how long it's taking FTDNA to incorporate L460 into their tree, and questionable at this point if they ever will. That shouldn't stop us from being scientific and accepting overwhelming evidence that unites I2-P37 and I2-L35 into a single I2a. But we should be clear about it anyway, using nicknames and SNPs.

So, taking the article's I2-Z2062 as an example, I recommend "I2a2a-Roots Z2062+". It's a bit long, but gives the initial tree, the unambiguous nickname, and the terminal SNP. So even if we find a rare branch that unites with I2a beneath L460 and I2a2a becomes I2a1b1a, moving to "I2a1b1a-Roots Z2062+" shouldn't confuse anybody... it's still obviously a branch of the Roots group.

Some other examples:
L147: "I2a1b-Din L147+"
L233: "I2a1c-Western L233+"
Z79: "I2a2a-Cont1 Z79+"
L413: "I2b-Adr L413+"

Aren't those better than trying to do 100% long form or 100% short form? Who knows what "I-L147," "I-L233," and "I2a2a3a2a1a" mean?


ok, but your wants will not get up with FTDNA , plus with Geo 2.0 being run with FTDNA help , I can see only I-L413 as what is to be the norm.

I also heard that if you skip a SNP on the path, you will not be noted to the SNP which was positive, but the original SNP.......as an example, looking at what I think they will use



If you are L you will default to L-M61, but if you do a SNP for M317 and are positive, because you skipped some SNP on the way to M317 , you will still be only L-M61 and not L-M317


BTW is the order of SNP indicate its main SNP as per L...is P326 have mor influence than L811
 
For conversations, I recommend that we use short label/terminal Y SNP names, but within the R haplogroup that we don't use the single letter "R", but rather use R1a, R1b or R2 as is appropriate. These haplogroups are only distantly related and both R1a and R1b are quite large.

I completely agree and this is the style I have been using on Eupedia for several years already. Ironically you are using R-L21 and not R1b-L21 in your forum profile.

I also regard I1, I2, J1 and J2 as haplogroups of their own, so I add the terminal SNP after these denominations. It makes it easier for everyone to understand.
 
I completely agree and this is the style I have been using on Eupedia for several years already. Ironically you are using R-L21 and not R1b-L21 in your forum profile.

I also regard I1, I2, J1 and J2 as haplogroups of their own, so I add the terminal SNP after these denominations. It makes it easier for everyone to understand.

agree ............now try it on my personnel haplotype as ISOGG changed T HG again 3 days ago
 
I completely agree and this is the style I have been using on Eupedia for several years already. Ironically you are using R-L21 and not R1b-L21 in your forum profile.

I also regard I1, I2, J1 and J2 as haplogroups of their own, so I add the terminal SNP after these denominations. It makes it easier for everyone to understand.

Oops. I confess.

1) I'm forgetful and forgot to update this profile.

2) I mindlessly fell in line with the single letter "R" short labeling according to FTDNA's adoption of a new YCC nomenclature.

I've fixed my profile. I'm using the new flexible short format that the R1b-U106 guys in that project started using. Give credit to Charles Moore for this. I've now updated my spreadsheets for R1b subclades to do this. Essentially, we are just using the three letter general term "R1b". After you go with the stub (portion of the branch) most related to the audience.

Technically, I'm R1b-L705.2 but other than the "R1b" part that probably means nothing to most of you so I will always list myself as R1b-L21 or if the audience is not FTDNA knowledgable I'll use R1b-L21(S145). Any way the stub of the branch I'm exposing is L21 because that seems to have enough momentum and discussion behind it that it is memorable and meaningful... and pretty much people in my group all do this. The U106 guys are all starting with U106 and the R1b-U152(S21) guys with R1b-U152(S28). BTW, don't forget about R1b-DF27, it's big too even if it is only a recent discovery.

In discussions within L21 audiences I will explain that my stub of the branch fits in as R1b-L21>DF13>L513(DF1)>L705.2. That's really the precise situation.

Back to item #2, I used to think that the Y Chromosome Consortium (YCC) was important. I still think it could be, but their record just hasn't supported that. Here is what I mean.

The YCC just hasn't kept up. Another blogger counted 60 changes in the R1b part of the ISOGG Y DNA tree over the last year. We haven't seen an update from YCC for well over a year.

I'm not sure there is much consensus in the scientific community behind YCC. I don't think ScottishDNA, Oxford, etc., etc. are in it, but it is not really clear how broad YCC's support is.

The reason that it is not clear is because YCC has done a very poor job of documenting who they are, how (and how often) they function and making this public. FTDNA will tell you at http://www.familytreedna.com/understanding-haplogroups.aspx
The Y Chromosome Consortium (YCC) developed a naming system for the Y-DNA haplogroups designed to easily accommodate expansion as new groups are discovered. The YCC has defined 20 major haplogroups, called A through T, which represent the major divisions of human diversity based on SNPs on the Y-chromosome.
All Family Tree DNA explanations and terminology, including our haplogroup database, use the standard system developed by the YCC and defined in the YCC paper. The Y Chromosome Consortium scientific paper, which describes the Haplogroup naming system, can be found at the link below:
YCC Nomenclature System
The link to the YCC is broken. Broken links are a bad sign as it relates to business or institutional health.
If you try to google them they are hard to get information on. That's not good either. Someone, not the YCC, was kind enough to write a Wiki article on the YCC. That's about the only place I can find a link that works to something that might be a YCC home page. http://ycc.biosci.arizona.edu/ Go ahead and look at that and see what you can find. It could be viewed as humorous. This is at the U of Arizona which makes sense because that's where Dr. Mike Hammer is and he is FTDNA's Chief Scientist as well as a member of the YCC, at least in the past.

I absolutely agree that other haplogroups probably face this circumstance. I'm just not well qualified to speak to them.
 
Oops. I confess.

1) I'm forgetful and forgot to update this profile.

2) I mindlessly fell in line with the single letter "R" short labeling according to FTDNA's adoption of a new YCC nomenclature.

I've fixed my profile. I'm using the new flexible short format that the R1b-U106 guys in that project started using. Give credit to Charles Moore for this. I've now updated my spreadsheets for R1b subclades to do this. Essentially, we are just using the three letter general term "R1b". After you go with the stub (portion of the branch) most related to the audience.

Technically, I'm R1b-L705.2 but other than the "R1b" part that probably means nothing to most of you so I will always list myself as R1b-L21 or if the audience is not FTDNA knowledgable I'll use R1b-L21(S145). Any way the stub of the branch I'm exposing is L21 because that seems to have enough momentum and discussion behind it that it is memorable and meaningful... and pretty much people in my group all do this. The U106 guys are all starting with U106 and the R1b-U152(S21) guys with R1b-U152(S28). BTW, don't forget about R1b-DF27, it's big too even if it is only a recent discovery.

In discussions within L21 audiences I will explain that my stub of the branch fits in as R1b-L21>DF13>L513(DF1)>L705.2. That's really the precise situation.

Back to item #2, I used to think that the Y Chromosome Consortium (YCC) was important. I still think it could be, but their record just hasn't supported that. Here is what I mean.

The YCC just hasn't kept up. Another blogger counted 60 changes in the R1b part of the ISOGG Y DNA tree over the last year. We haven't seen an update from YCC for well over a year.

I'm not sure there is much consensus in the scientific community behind YCC. I don't think ScottishDNA, Oxford, etc., etc. are in it, but it is not really clear how broad YCC's support is.

The reason that it is not clear is because YCC has done a very poor job of documenting who they are, how (and how often) they function and making this public. FTDNA will tell you at http://www.familytreedna.com/understanding-haplogroups.aspx

The link to the YCC is broken. Broken links are a bad sign as it relates to business or institutional health.
If you try to google them they are hard to get information on. That's not good either. Someone, not the YCC, was kind enough to write a Wiki article on the YCC. That's about the only place I can find a link that works to something that might be a YCC home page. http://ycc.biosci.arizona.edu/ Go ahead and look at that and see what you can find. It could be viewed as humorous. This is at the U of Arizona which makes sense because that's where Dr. Mike Hammer is and he is FTDNA's Chief Scientist as well as a member of the YCC, at least in the past.

I absolutely agree that other haplogroups probably face this circumstance. I'm just not well qualified to speak to them.

After 2 years of this new short form, I think the best solution for clarity for genetic forum "amateurs" is as below....example

R1b-U106 ..........correct version and better understood by amateurs

R-U106 .........confuses many into thinking it is a basal marker
 

This thread has been viewed 9599 times.

Back
Top