New topic 17 Oct 2012.
L1029 was a new
SNP last March. L1029 provides a branch of M458, added to the
ISOGG tree this year. The other branch is L260 (update
next topic). L260 was discovered in 2010. Most M458+ L260- samples are coming out L1029+. I have been calling M458+ L260- samples N type (very few exceptions - next topic). It is now clear that L1029 is a major branch, capturing more than 90% of N type (more than 90% of M458+ L260-).
In the
Polish Project, most of the N type L1029- results are samples with Poland given as the ancestral country. This spring,
Mayka started classifying these as the “Np”
cluster.
In this topic I present preliminary evidence that Np corresponds to a Y-DNA clade concentrated in Poland. I also explain why all Polish N type samples (tested or predicted M458 and not L260) would benefit from the L1029 test, because Np cannot be predicted precisely, and because there is a low fraction of L1029-
outlier samples, not fitting Np.
So far (10 Oct data) there are 20 results L1029- (including a few samples that are not M458+) and 42 results L1029+. N type requires 67 or more of the
standard markers for confident assignment. Using samples with those 67 markers the numbers are 114 N type, of which there are 12 L1029- and 41 L1029+. Of the 61 remaining N type samples (at 67 in the Polish Project) not tested for L1029, I estimate only about 5 might come out L1029-, because testing has been concentrated on
STR predictions, discussed below in this topic.
One M458+ L260- L1029- sample is not counted as N type, as discussed in the next topic as
Ry type. This seems to be a very small outlier
clade with an old
node in M458.
Two of the others differ significantly from the other 12, so I am predicting these two as outliers, with M458 nodes older than the main Np hypothetical clade.
Np Cluster Definition: I constructed an STR
definition for the remaining 10 samples with similar STR values and L1029- result. The definition uses 37 of the 67 markers. The
cutoff is 2 (
step less than 2 are considered matches). I uploaded this definition to
Ysearch, code CHFXB. My analysis file is
L1029Study.xls
On this basis, 3 of the untested N type samples fit the definition and are predicted L1029- members of the hypothetical Np clade. Two more are marginal, so perhaps there are 14 Np samples among the 114 N type. N type is 8.8% of the Polish Project, so that means 14 / 114 * 8.8% = 1.1% Np samples in the Polish Project. The statistical uncertainty is wide, so my estimated 80% confidence range is 0.5% to 2 %. Insofar as the Polish Project is representative of Historical Poland, it seems the Np hypothetical clade has roughly 1% frequency in the region of Historical Poland. Of the 10 confirmed Np samples, 8 provide “Poland” as origin, one “Russian Federation” and one “Lithuania”. The 3 predicted Np samples have two “Poland” and one “Belarus”. There is no need to subtract the samples without “Poland” because the Polish Project as a whole has a similar frequency of samples not “Poland”; such samples come from men with evidence of male ancestry from Historical Poland.
DYS460 = 10 is a very strong
signature marker for Np. All 13 of the confirmed and predicted Np samples have this value. Those two outlier samples also have this value. Among those 41 L1029+ samples, only 6 have this 10 value; 3 have 12 and the 32 others all have the N type
modal 11 value. The statistics of this paragraph are misleading because DYS460=10 was used to encourage L1029 testing in the Polish Project. I would expect a few Np to show up in the future with 460 value other than 10 (mutated from the Np ancestral value), and I would expect in the long run a lower fraction (less than 6 / 32) L1029+ to have the 10 value (independent mutations). Among the 49 N type samples not confidently assigned to sub-categories, only 5 have the 10 value, and 1 of these is a marginal Np sample mentioned above.
CDYa = 33 is another good signature. These two markers alone with cutoff 1 (that means both markers match) capture 9 of the 13 Np samples (Np defined as 13 captured by 37 markers cutoff 2). These two markers also capture 2 marginal samples (at the step 2 cutoff of Np at 37), plus only one other N type, plus a few
D type (D are not members of the M458 clade, but DYS460=10 is modal in D). CDY is a fast mutator, so it is unusual to serve as a signature marker. I ran into this on one other occasion, where I postulated a mutation disabled CDYb; see my discussion at
http://www.gwozdz.org/L540.html#CDYb. Actually, another reasonable explanation is that this CDYa=33 signature is just luck, because using only 10 samples we should not be too surprised that one of the rapid mutators looks like a signature, by the luck of random mutations. Yet a third explanation: Np might really be 2 or more clades where the ancestors (
MRCAs) of each clade had the CDYa=33 value by luck, but those ancestors differed at other markers; this explanation is discussed more below.
There are no more good Np signature markers. Np modal values differ from N modal values at only 4 of the 67 markers. There are only two Np samples at 111 markers, and they do not seem to differ from N at those additional 44 markers. On this basis, I am not confident that my definition is very precise, because it takes as little as 2 mutations in the male line history for a sample to be incorrectly predicted, using any STR definition.
There is another reason for my uncertainty about my 37 marker Np definition: I worked harder than usual to construct this definition, so there is selection bias. Markers that just happen to have no mutations in those 10 samples are all in the definition. Any marker got dropped if it produced 2 or more mutations in any sample of those 10. Surely as more samples show up I’ll need to modify my definition. Those 37 markers are only a “good bet” definition for Np prediction today.
I
published my
SBP method of quantifying
confidence in clade predictions based on Y-DNA STRs. Lower SBP means higher confidence. I reserve the word
type for clusters with SBP < 20%. I consider SBP meaningless for SBP > 50%. Np comes out with SBP = 64%. This does not necessarily mean that Np is invalid as a clade prediction. My SBP method gives larger values for SBP with few samples, so valid clades improve with more data (SBP becomes smaller). A clade with modal STR values close to the father clade (N is the father of Np) necessarily comes out with large SBP. Concentration in Poland is evidence of validity for Np. That 460=10 is also evidence of validity. In my estimation, Np has about 80% confidence of validity, all evidence considered, but only 50% confidence of being a unique clade. Np might be primarily one clade with interference from other independent small clades with similar STR values. Or, Np might be 2 or more clades, about the same size, all concentrated in Poland, but distantly related. Clarification: two clades with very close nodes to the father branch might be considered a single clade; here I mean that Np might be 2 clades with nodes that are not close in the tree, perhaps with other small clade nodes between them that do not fit Np STRs (by the luck of random mutations in the ancestor). More discussion below on this idea.
In the
R1a Project, my 37 marker definition captures 11 samples with SBP = 95% (data at 67 markers, download 14 Oct). Eight of the 11 have L1029- result and the others are not tested yet. Seven of the 11 are of “Poland” origin. Two L1029- are N type that do not match Np. There are 38 L1029+ that do not match Np. Summary: L1029- are rarer in the R1a Project (compared to the Polish Project) and the L1029- predominantly match Np. SBP is worse (higher) because of interference at the cutoff by more R1a samples from outside Poland. This paragraph is not conclusive, however, because the administrators of both projects work together; many of the samples come from men who joined both projects. Both projects worked hard on getting L1029 results this year, using 460=10 fit as a guide for emphasis.
As an independent test, I checked (11 Oct) the “RussiaDNA” Project (another
FTDNA project). Of 260 R1a total, only 12 have been tested for L1029, and only 2 of these 12 came out L1029-: one Poland and one Russian Federation. This is preliminary evidence that Np is rare in the Russian federation, although N is common in all Slavic countries.
More projects checked (14-15 Oct):
Russian_impire: 4 L1029 tests, one negative, not Poland
LituaniaPropria: 4 L1029 tests, two negative, both “Lithuania” origin, one L1029- also in the Polish Project, both also in the R1a Project
in addition, both L1029+ are also in the Polish Project, and one in the R1a Project, so these are not independent data
Scottishdna: no L1029 tests
Finland: 1 L1029 positive
BritishIsles: 1 L1029 positive
Other projects are not concentrating on L1029 tests. I hesitate to encourage them, because M458+ L1029- seem to be mostly from Poland.
I have an R1a database at 67 markers with 1816 samples from 15 FTDNA projects. I collected this 20 June, when there were fewer L1029 results. My 37 marker definition captures 13 samples, but 12 of these are in the Polish Project, and the other is in the R1a project. No additional samples fit Np. There are more marginal samples at the cutoff step 2: 10 of them: only 2 in the Polish Project; only one from Poland. This is my strongest evidence that the Np cluster is concentrated in Poland.
Ysearch: 9 samples are captured by my Np definition CHFXB. Only 2 are from Poland. Only 2 of the 13 Polish Project Np joined Ysearch (one Poland and one Lithuania). SBP is poor for Np at Ysearch because there are 6 samples at the step 2 cutoff, none from Poland. In addition, 2 “Central European” modals fall at step 2 (37 markers used), emphasizing how hard it is to separate Np. A simple explanation for these Ysearch results is that there are 1 or more other clades concentrated outside Poland, which might be L1029- or L1029+.
At the top of this topic, I reported “more than 90% of N type” (M458+ L260-) are L1029+. Since L1029- are concentrated in Poland, it may actually be more than 95% worldwide. However, there is a reasonable possibility of one or more small clades showing up L1029- from outside Poland when more samples are tested.
Age of Np: It is too soon to estimate the age (
TMRCA) of L1029, and age based on STR variation is uncertain because of known
caveats. However, L1029 is probably not much younger than N type because L1029 includes almost all of N type. N type is surely older than 2,000 years. Indeed, variation of L1029 STRs is looking similar to N type variation. The L1029- node is necessarily the same or older than the L1029 node, so Np has an old node. However, the age of the node is almost always older than the age of the clade (TMRCA). Np seems very young, as evidenced by the unique 460=10 value discussed above. On the other hand, other markers have significant variation within Np; that may mean Np is not so young; or, that may mean Np is composed of 2 or more clades, each of which is young.
Speculation: Np reminds me of P type (L260 update, next topic). In my 2009 publication, and at this web page, I have speculated that L260 may have a very old node, but the P type ancestor (MRCA) may have lived more recently, perhaps not long before formation of the tribes that led to the Polish nation. It seems to me that M458 is quite old, but not many M458 individuals survived over the millennia, and a few of the M458 survivors were lucky enough to found clades during the population expansion of the last 3 millennia. Perhaps the Np ancestor, with L1029- and 460=10, also lived long ago and left few survivors; most of those few formed what are today very small clades, and one was (or perhaps 2 or more, all with 460=10, were) lucky enough to found the medium sized cluster today apparent as Np. I find it interesting to consider the men who lived 1,000 to 2,000 years ago in the region that is now Poland (and / or maybe in another region from which there was a migration to Poland). Due to the statistics of Y-DNA inheritance, most men do not form clades that last long, and very few men form large clades. Human behavior may perhaps broaden the statistical spread of clade size, allowing rare men to produce relatively larger clades. I speculate that among those proto-Polish men who founded clades that survive today, most were R1a, and many of those were M458, and one or a few of those were Np, and one was P.
My error...the belaruss I was refferring to was for L260 and not L1029. L1029 seems basically Masovian marker.
Masovians.... are they the creation of dark-ages Galidians?
Masovian area ........pliny states ...the guthones lived there. Are guthones the goths?