PDA

View Full Version : Proposed quick and dirty method for TMRCA range, given shared parent clade



FuriousGeorge
09-03-18, 08:46
The quick and dirty method I was thinking about no longer feel sound. Unfortunately, I can't delete the thread. I was thinking you could take the TMRCA for a branch, take the TMRCA for two people who you know are somewhere under there, add those two numbers and divide by 2 to get the average, and there you have it.

For instance, TMRCA calculation to a guy with a known SNP upstream of mine is 1600 ya. The TMRCA for that branch in yfull is currently 3700 years.

(1600 + 3700) / 2 = 2650 ya.

The thing is, I don't think that works all that well. I share 13 snp with some guy downstream of a clade with a TMRCA of 3700, in real life. We don't have .bams, so we don't get a TMRCA. Yfull wants to know how many markers downstream of the parent clade we don't match on first. But I think you could get an idea for TMRCA just by knowing we share 13 SNP downstream, because the mutation rate per snp is currently assumed to be ~144 years * mutations + 60 years for the lifetime of the tester.

That would be ~144 * 13 + 60 = about 2000.

That gets me a TMRCA of 3700 - 2000 = 1700 ya

Based on 111 markers, and TMRCA comes out to around 1600. If you add a little padding for back-mutations and convergence, that doesn't seem too far off.

Another option could be to just upload a bam file for both kits to yfull, but not everyone can do that. For instance, I can only get a VCF, even though I bought the Big Y, and the only ETA I got was 'spring'.

OTOH, here's a complicated solution that might work.

INPUT:
-TMRCA to SNP on ytree
-STRs 1
-STRs 2
-The length of a generation

OUTPUT:
-TMRCA based on 1000 (or w'e simulations)


The algorithm would take the regular linear calculation for TMRCA, that doesn't take into account back mutation, and set that as the lower threshold. Then it would run simulations to see how likely the GD result you got is. Since you don't really know the number of generations, it would repeat adding another generation, and so forth, until adding another generation puts you over the TMRCA for the branch. It would output a range (5% - 95%, or so) for the likelihood of getting that much GD over time.

I think that would work. It's predicated on the idea that as GD increases, and there are less identical markers remaining, the odds of a back-mutation increase, and GD increase over time will tail off at a predictable rate.

I'm sure something like this must exist, but perhaps it isn't in the public domain.

FuriousGeorge
12-03-18, 09:56
BUMP because I rewrote everything, after having deleted the content while failing to delete the thread.