Simulation of a large DNA Profile Database, full alleles , >8 per cent allele frequency co-ancestry

Simulation of a large DNA Profile Database, full alleles, >8 per cent allele frequency co-ancestry


Result - for people with the same sort of ancestry as myself then in that subset of the general population there would probably be between 50 and 80 10 loci matches within 2 million such people.



' Generating 10 loci x2 profiles with
' AF >= 8.7%
' directing pairs and first divider
Dim ph(20)
Dim pb(20)
' initialising Random Number Generator - RNG
count9 = 0
count8 = 0
countf = 0

Randomize
a = 214013
c = 2531011
x0 = Timer
z = 2 ^ 24
'  1 file 'dec29-g' for original, un-directed pairs, source data.
' This file is necessary to check on the performance of the RNG
' when a matched pair is found then it is highly unlikely that
' both sequences as generated, before pair directing, would
' be the same - more likely a manifest of repeat within the RNG
' (reason for adopting the 214013 / 2531011 RNG )
' In final match checking use 'Word' find function on part of the sequences, including pair reversals,
' with luck would include a  'homozygotic' pair eg (3,3) say ,so no reversal
' on that pair

Open "dec29-g" For Output As #1
' outputs directed and divided by first digit
Open "dec29-0" For Output As #10
Open "dec29-1" For Output As #11
Open "dec29-2" For Output As #12
Open "dec29-3" For Output As #13
Open "dec29-4" For Output As #14
Open "dec29-5" For Output As #15
Open "dec29-6" For Output As #16
Open "dec29-7" For Output As #17
Open "dec29-8" For Output As #18
Open "dec29-9" For Output As #19
' change for different total size eg 199999 for 200,000
For x = 0 To 4999
flag = 0
For j = 0 To 1
' vWA ,first locus
' RNG random number generator
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z

ph(j) = phj
If ph(j) < 0.001 Then ph(j) = 11
If ph(j) < 0.106 Then ph(j) = 1
If ph(j) < 0.186 Then ph(j) = 2
If ph(j) < 0.402 Then ph(j) = 3
If ph(j) < 0.672 Then ph(j) = 4
If ph(j) < 0.891 Then ph(j) = 5
If ph(j) < 0.984 Then ph(j) = 6
If ph(j) < 0.998 Then ph(j) = 7
If ph(j) < 1 Then ph(j) = 8

If ph(j) > 10 Then ph(j) = 0

If ph(j) = "0" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1


Next j

For j = 2 To 3
' THO1
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj

If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.243 Then ph(j) = 1
If ph(j) < 0.437 Then ph(j) = 2
If ph(j) < 0.545 Then ph(j) = 3
If ph(j) < 0.546 Then ph(j) = 4
If ph(j) < 0.686 Then ph(j) = 5
If ph(j) < 0.99 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7

If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "7" Then flag = 1


Next j

For j = 4 To 5
' D8
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj

If ph(j) < 0.018 Then ph(j) = 11
If ph(j) < 0.031 Then ph(j) = 1
If ph(j) < 0.125 Then ph(j) = 2
If ph(j) < 0.191 Then ph(j) = 3
If ph(j) < 0.334 Then ph(j) = 4
If ph(j) < 0.667 Then ph(j) = 5
If ph(j) < 0.876 Then ph(j) = 6
If ph(j) < 0.964 Then ph(j) = 7
If ph(j) < 0.995 Then ph(j) = 8
If ph(j) < 1 Then ph(j) = 9
If ph(j) > 10 Then ph(j) = 0

If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "3" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1


Next j

For j = 6 To 7
' FGA
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.025 Then ph(j) = 11
If ph(j) < 0.081 Then ph(j) = 1
If ph(j) < 0.224 Then ph(j) = 2
If ph(j) < 0.226 And ph(j) >= 0.224 Then pb(j) = "A"
If ph(j) < 0.413 Then ph(j) = 3
If ph(j) < 0.415 And ph(j) >= 0.413 Then pb(j) = "B"
If ph(j) < 0.58 Then ph(j) = 4
If ph(j) < 0.591 And ph(j) >= 0.58 Then pb(j) = "C"
If ph(j) < 0.73 Then ph(j) = 5
If ph(j) < 0.734 And ph(j) >= 0.73 Then pb(j) = "D"
If ph(j) < 0.88 Then ph(j) = 6
If ph(j) < 0.882 And ph(j) >= 0.88 Then pb(j) = "E"
If ph(j) < 0.957 Then ph(j) = 7
If ph(j) < 0.992 Then ph(j) = 8
If ph(j) < 0.999 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "F"

If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)

If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1



If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
If pb(j) = "E" Then flag = 1
If pb(j) = "F" Then flag = 1




Next j

For j = 8 To 9
' D21
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.001 Then pb(j) = "A"
If ph(j) < 0.002 And ph(j) >= 0.001 Then pb(j) = "B"
If ph(j) < 0.033 Then ph(j) = 11
If ph(j) < 0.193 Then ph(j) = 1
If ph(j) < 0.419 Then ph(j) = 2
If ph(j) < 0.677 Then ph(j) = 3
If ph(j) < 0.704 Then ph(j) = 4
If ph(j) < 0.773 Then ph(j) = 5
If ph(j) < 0.866 Then ph(j) = 6
If ph(j) < 0.884 Then ph(j) = 7
If ph(j) < 0.974 Then ph(j) = 8
If ph(j) < 0.975 And ph(j) >= 0.974 Then pb(j) = "C"
If ph(j) < 0.997 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "D"

If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "9" Then flag = 1

If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1



Next j

For j = 10 To 11
' D18
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.001 Then pb(j) = "A"
If ph(j) < 0.009 And ph(j) >= 0.001 Then pb(j) = "B"
If ph(j) < 0.021 Then ph(j) = 11
If ph(j) < 0.16 Then ph(j) = 1
If ph(j) < 0.285 Then ph(j) = 2
If ph(j) < 0.449 Then ph(j) = 3
If ph(j) < 0.594 Then ph(j) = 4
If ph(j) < 0.731 Then ph(j) = 5
If ph(j) < 0.846 Then ph(j) = 6
If ph(j) < 0.926 Then ph(j) = 7
If ph(j) < 0.967 Then ph(j) = 8
If ph(j) < 0.982 Then ph(j) = 9
If ph(j) < 0.992 And ph(j) >= 0.982 Then pb(j) = "C"
If ph(j) < 0.997 And ph(j) >= 0.992 Then pb(j) = "D"
If ph(j) < 0.998 And ph(j) >= 0.997 Then pb(j) = "E"
If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "F"
' allele 20 (C) reduced from .017 to .015 as allele
' frequencies summed to 1.002
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1


If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
If pb(j) = "E" Then flag = 1
If pb(j) = "F" Then flag = 1


Next j





For j = 12 To 13
' D2S1338
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.037 Then ph(j) = 11
If ph(j) < 0.222 Then ph(j) = 1
If ph(j) < 0.309 Then ph(j) = 2
If ph(j) < 0.419 Then ph(j) = 3
If ph(j) < 0.557 Then ph(j) = 4
If ph(j) < 0.589 Then ph(j) = 5
If ph(j) < 0.613 Then ph(j) = 6
If ph(j) < 0.725 Then ph(j) = 7
If ph(j) < 0.867 Then ph(j) = 8
If ph(j) < 0.978 Then ph(j) = 9
If ph(j) < 0.997 And ph(j) >= 0.978 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "B"

If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "6" Then flag = 1



If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1

Next j


For j = 14 To 15
' D16
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj

If ph(j) < 0.019 Then ph(j) = 11
If ph(j) < 0.148 Then ph(j) = 1
If ph(j) < 0.202 Then ph(j) = 2
If ph(j) < 0.491 Then ph(j) = 3
If ph(j) < 0.779 Then ph(j) = 4
If ph(j) < 0.965 Then ph(j) = 5
If ph(j) < 0.994 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7

If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1


Next j

For j = 16 To 17
' D19
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.087 Then ph(j) = 11
If ph(j) < 0.309 Then ph(j) = 1
If ph(j) < 0.322 Then ph(j) = 2
If ph(j) < 0.704 Then ph(j) = 3
If ph(j) < 0.719 Then ph(j) = 4
If ph(j) < 0.896 Then ph(j) = 5
If ph(j) < 0.934 Then ph(j) = 6
If ph(j) < 0.975 Then ph(j) = 7
If ph(j) < 0.992 Then ph(j) = 8
If ph(j) < 0.997 Then ph(j) = 9
If ph(j) < 0.999 And ph(j) >= 0.997 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "B"
If ph(j) > 10 Then ph(j) = 0

If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "2" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1



If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1


Next j


For j = 18 To 19
' D3
' RNG
  temp = x0 * a + c
  temp = temp / z
  x1 = (temp - Fix(temp)) * z
  x0 = x1
  phj = x1 / z
ph(j) = phj

If ph(j) < 0.001 Then ph(j) = 11
If ph(j) < 0.007 Then ph(j) = 1
If ph(j) < 0.139 Then ph(j) = 2
If ph(j) < 0.404 Then ph(j) = 3
If ph(j) < 0.651 Then ph(j) = 4
If ph(j) < 0.846 Then ph(j) = 5
If ph(j) < 0.987 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7

If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "7" Then flag = 1

Next j

If flag = 1 Then countf = countf + 1

If flag = 0 Then
' output the original generated file

Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)



' Because in real DNA profiles without further info ,no one
' knows which allele in each pair came from the mother or father
' by convention they are written smaller ,larger (or equal).
' The following directs each pair


For j = 0 To 18 Step 2
If ph(j + 1) < ph(j) Then
jjj = ph(j)
ph(j) = ph(j + 1)
ph(j + 1) = jjj
End If
Next j

' put extra conditional statements here to reduce
' the number of files or just delete some of the following
'
' dividing on first column, file by file
If ph(0) = 0 Then
Write #10, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count0 = count0 + 1

End If
If ph(0) = 1 Then
Write #11, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count1 = count1 + 1

End If
If ph(0) = 2 Then
Write #12, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count2 = count2 + 1

End If
If ph(0) = 3 Then
Write #13, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count3 = count3 + 1

End If
If ph(0) = 4 Then
Write #14, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count4 = count4 + 1

End If
If ph(0) = 5 Then
Write #15, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count5 = count5 + 1

End If
If ph(0) = 6 Then
Write #16, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count6 = count6 + 1

End If
If ph(0) = 7 Then
Write #17, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count7 = count7 + 1

End If
If ph(0) = 8 Then
Write #18, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count8 = count8 + 1

End If
If ph(0) = 9 Then
Write #19, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count9 = count9 + 1
End If
End If
Next x
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19

Close #1

' count file for data to fix for - next loops in sucessive dividings
' count for >= 8.7% alleles is x minus countf
' countf is the count of rejected profiles with at least one 
' allele less than 8.7% af

Open "dec29-c" For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, "countflag", countf
Close #20

Results
My own genetic background is parents and grandparents from 2 separated English counties. Then a generation further back one from another county and one from Ireland. Further back , then a Huguenot input but too far back to influence my DNA profile. So for these purposes '4 fourths English' presumably reflected in my own DNA profile which contains alleles of 8.7% or more frequency of occurance, ie nothing rarer than 8.7%. Unless anyone can show me otherwise I will take that as farely representative, I cannot believe my background is that exceptional for England. I have adjusted my simulation to only generate profiles with allele frequencies more than 8% This increases the chance of false matches within THAT subset from the general UK Caucasian of 1.2 in 2 million to about 1 in 270,000. 381,843 such >8% profiles produced 2 ' 10 loci ' matches "33566645121479143336" "44164545232517133546" Which translate to standard form vWA,THO1,D8,FGA,D2,D18,D2,D16,D19,D3 (16,16)(9,9.3)(14,14)(22,23)(28,29)(12,15)(23,25)(9,12)(14,14)(15,18) and (17,17)(6,9.3)(12,13)(22,23)(29,30)(13,16)(17,23)(9,11)(14,15)(16,18) and as usual the lowest frequency of any allele found in the above 2 matches is increased, this time to 11.1% Matches started as 33566654124179413363 and 33656645121479143336 44615454325271315364 and 44165445232571313546 so RNG good Other matches for first (not any ) 12 digit 19,089 14 837 16 104 18 8 19 true excluding '20s' 5 20 2 So 2 matches in 381,843 scales to about 1 in 220,000 to 270,000 (as 2 represents >2 and <3 ) or going the other way assuming square law doesn't breakdown then about 54 to 82 in 2 million or 1,400 to 2,000 10 loci matches in 10 million. The corresponding result for 19 out of 20 matches and that is permutate any 19 from 20 and for a population of 4 million is about (4*60) * 120 <> 28,000 ( 60 between 54 and 82, scaled for 4 million and 120 from dnas5.htm results) or 1 in 140 If the NDNAD was still using the pre-1999 6 loci structure then for 2 million profiles of this >= 8% subset then that would have meant about 523,000 matching profiles within the database. For the NDNAD as it stood with 840,000 profiles in Spring 2000 mainly 6 loci profiles then number of matches must be greater than 5,000 and less than 90,000 . 5,000 scaled from dnas5.htm simulation (no co-ancestry ) and 90,000 scaled from the dnas6.htm simulation (>= 8% allele frequency only). On top of this would be people using aliases and double entries but that figure is unquantifiable by simulation. For the brother matching situation then the general UK Caucasian figure of 1 in 30,000 probably drops to about 1 in 600 for '4-fourths English'.

Email Paul Nutteing by removing 4 of the 5 dots
or email Paul Nutteing ,remove all but one dot
Or a message on usenet group uk.legal has got to me recenently a couple of times.
A lot of the contents of this file plus other material 'peer reviewed' on the main forensic science usergroup

Background
A simulation of a large DNA profile database
A simulation of DNA profile 'families'
A simulation of DNA profile families with consanguinity
A simulation of DNA profile 'families' for 6 generations
dnas.htm revisited with all alleles represented
dnas.htm revisited for >8 percent allele frequency subset (similar ancestry )
Simulation of Taiwanese Tao and Rukai populations to explore the effect of within and without ancestral clusters
Basques autochthonous DNA profiles simulation, 9 loci
Australian Capital Caucasian 9 loci simulation
Australian Capital Caucasian 9 loci simulation, >= 5% allele frequency
CODIS, 13 Loci Caucasian Simulation
Automating the macros
Exploring other DNA profile match scenarios
Suspect familial matching
Return to co-ancestry factor in the NDNAD simulations
144 random matches in 65,000 -- ONLY?


Powered by counter.bloke.com