The Rule of Probabilities: A Practical Approach for Applying Bayes' Rule to the Analysis of DNA Evidence
Summary (5 min read)
INTRODUCTION
- With the recent Supreme Court decision allowing the collection of DNA samples from any person arrested and detained for a serious offense, it seems inevitable that the justice system will collect and use large DNA databases.
- There is concern that as database size increases, so too will the rate of false positives, and thus innocent peole will be convicted when their DNA matches evidence left at a crime scene.
- The authors will show how estimation of both the prior probability and relevant database size can be assessed under alternative assumptions that are appropriately open to literal and figurative cross-examination to assure the robustness of the bottom-line conclusion: the defendant was or was not the true source of the crime scene evidence.
. THE ISLAND OF EDEN
- Imagine that a singular crime has been committed on the otherwise idyllic island of Eden.
- All of the other people in the population have been ruled out by the lack of a DNA match, Prior to the test, both Mr. Baker and Mr. Fisher were equally likely to have been the criminal.
- Had the population been above 51,294, a test with a one-ina-million chance of a false positive would lead to more than a five percent chance that at least one person would match even when everyone is innocent.
- What matters is the probability that the guilty party is in the database.
- The defendant might have been convicted even without the confirmation of DNA evidence.
II.
- BAYES FOR AN ERA OF BIG DATA Court cases introducing DNA evidence have traditionally focused on three different numbers: 1. The random match probability: the probability that a randomly selected person will be a DNA match.
- See infra notes 76, 85 and accompanying text.
- Indeed, if the expected number of innocent matches in the database were two, the authors would not say that the chance of a database match is 200% and thereby violate a fundamental tenet of probability that all probabilities must be at or below one.
- This Article describes how, in practical terms, to convert the inputs into the number the trier of fact should care about.
- It will be of enormous help to introduce some notation: S will stand for the result that the defendant is the source of DNA found at the crime scene.
35. See
- The authors start with the aggregate probability that someone in the database is the source.
- This probability will tell us a great deal about the posterior source probability with regard to every individual in the database.
- Because the authors are assuming no false negatives, the posterior source probability of all unmatched indi-37.
- For large databases, this makes little difference in that this changes the size of the database by one.
- If there is a single unalibied match, then the posterior database source probability will be entirely focused on that matching individual (and the remaining unmatching individuals in the database will have a zero source probability).
38, See
- The rate of false negatives varies depending on the width of the DNA band, or "match window," used to match the suspect to the source.
- While this does not rule out the possibility of a false negative, for that to have happened, the authors would have to have experienced both a false positive and a false negative at the same time.
- The intuition for this formula follows from a Venn diagram:.
P(S IM) P(S) P(M I S) P(-SI M) P(~ S) P(M I-S)
- The odds of observing M unalibied matches is the ratio of the probability this would happen when the source is in the database versus the probability this would happen when the source is not in the database (and the M unalibied matches occur by chance).
- The binomial distribution provides the probability for each possible number of heads, from 0 to N.
- If it turns out that these two numbers perfectly coincide, then the authors do not update the prior probabilities.
M : rD(1a)
- (5) This likelihood ratio can be restated simply as the ratio of the actual number of unalibied matches relative to the expected number of unalibied, nonsource matches 46 : M: E[M] (6) This likelihood ratio indicates how strong the new information is in terms of changing the prior opinion.
- If the likelihood ratio in equation (6) is 10:1, then it is ten times more likely that the M matches observed are the result of the true match being in the database than all being there by luck.
- If their initial view was that it was twice as likely that the database did not contain the true match (prior odds are 1:2), then Bayes' rule tells us (via equation ( 4)) that putting these together means the new odds are 5:1 in favor of the database containing the true match.
- Bayes' rule says that to derive the updated, posterior odds of the source being in the dataset, all the authors need to do is simply multiply the prior odds by the likelihood ratio of equation (6).
III. COMPARATIVE STATICS
- This Part explores how the source probability of equation (8) changes as the authors change the four underlying variables (a, r, D, and p) while holding M constant.
- The authors also speculate on how these variables are likely to change over time.
- Increasing the random match probability, r, while holding everything else equal decreases their confidence that the matching individual was the source of the forensic DNA.
A. Trawling a Larger Database
- The question that often arises with a database trawl is how to adjust for the size of the database.
- To answer this question, the authors first assume that the two databases are each comprised of individuals who, from a Bayesian perspective, have identical prior likelihoods of being the source of the forensic DNA.
- In other words, the larger database has the same average quality as the smaller one in terms of finding matches.
- 58 As it turns out, there are two forces that almost exactly cancel each other out.
IV. APPLICATION TO PEOPLE V. COLLINS
- The authors analysis of DNA evidence can be usefully compared to the use of eyewitness evidence in the famous People v. Collins case.
- Thus, the relevant population should be the number of couples in greater Los Angeles, where the crime was committed.
- If the couple in court is guilty, then the chance some other innocent couple will match is I -(Ir)7, where T is here the number of couples not yet examined by the police.
- If the police had searched the entire population of possible couples and found that the defendants were the only match, then the authors would know that the couple is guilty.
- The fact that they were dead broke just prior to the robbery and yet had unexplained spending right after the robbery should factor into the equation.
V. APPLICATION TO PEOPLE V. PUCKET
- On February 21, 2008, John Puckett was found guilty of first-degree murder for the 1972 death of Diana Sylvester.
- 82 Smith describes the ambiguous evidentiary record: [Lead homicide investigator].
- 8 3 The jury also heard of Puckett's three prior rape and assault convictions.
- But that is not his burden (and it would be difficult for most people who had lived in San Francisco to explain in 2008 where they were on a particular night in 1972).
p:l-p
- With the new data, the updated (or posterior Bayesian) odds become: 125x p:lx(l-p) or 125p:1-p (12) Associated with these odds is a probability that Puckett is the guilty party.
- If the authors imagine that proof beyond reasonable doubt requires establishing a source probability at or above 99%, then they can work backward to derive a minimum prior that would produce that posterior probability range: l25p a 0.99 1+ 124p.
- If the authors believe there is a 44% or higher chance that the guilty party is in the 338,711-felon database, they can conclude that Puckett has a 99% or higher chance of being the person whose DNA was left at the crime scene.
B. A Model for Calculating Priors
- Certainly a random person on the street would not have a 44% chance of being the guilty party.
- One approach to estimating the prior would be to compare the size of the database to the size of the population without alibis.
- The authors assume that criminals behave in the following manner.
- Again, fractionf are caught and (1 -f) are not.
- Those caught are entered into the database, and those who have escaped conviction twice are still not in the database.
1-d
- Imagine thatf, the chance of getting caught, is 50%.
- In addition, if a random criminal retires with a 39% chance, this says that the average criminal would commit 2.6 crimes before "retiring," This seems like a small number.
- This modeling approach to estimating the prior probability of database guilt-more precisely, the prior probability that someone in the database is the source of the crime scene DNA-has to their knowledge never before been used.'.
- That fraction and the database size are determined by the probability of being caught.
- Thus, if the database contains felons from all of California rather than just from San Francisco, then moving to Los Angeles is not enough to retire.
VI. EMPIRICIZING THE ALIBI AND PRIOR PROBABILITIES
- The application to People v. Puckett motivates a broader discussion of how to empirically assess the underlying parameters that influence the estimation of the posterior source probability.
- At the other extreme, the authors would assume that Baker comprises 30% of the 40% other category, and thus make no change to the 60% priors.
2. Empiricizing the prior probability
- Of course, there will be and should be reasonable disagreement about what constitutes a similar crime.
- Similarity would have to be with regard to a host of factors-including not just the crime type but also the modus operandil41 and the characteristics of the defendant.
- 14 3 A sensible way forward would be to derive alternative priors based on alternative assumptions of what constitutes similar crimes as well as on plausible structural models, and then see if the defendant's source probability is sufficiently high even after combining the likelihood ratio with the most conservative (i.e., lowest) probability estimate within this range,.
B. Small Database Trawls
- The analysis above is done under the stylized assumption that everyone in the database has the same prior probability of having been at the crime scene.
- Take the case where a woman, who had a documented history of being a victim of spousal battery, is found murdered.
- The authors again suggest that the prior can be inferred from adjusting the match rate in similar confirmation cases.
- It might be tempting to infer that thirty percent of the time the husband was the source of the forensic DNA.
VII. ADMISSIBILITY
- The authors goal of this Part is to suggest specific ways an expert might present his or her opinion under existing law and when existing evidentiary rules should change to accommodate a more coherent factfinding process.
- To be admitted, the proposed probability evidence must also be consistent with Rules 104(a), 702, 703, and 403, or their state equivalents.
- 1 63 However, changes in courts' approaches to similarity with respect to DNA evidence provide some reason for predicting that adjusted match probabilities are increasingly likely to be admissible.
- Courts that consider the relevance of statistical evidence have different philosophies, are influenced by a host of situation-specific variables, and occasionally make rulings that might go the other way.").
172. The
- Prior data are just as important as data that allow us to update their beliefs.
- What ultimately matters is where the authors end up, and that they arrived at that destination via a path that employs sound logic and reasoning.
CONCLUSION
- In the 2012 presidential election, Nate Silver caused a stir by correctly predicting the winner of all fifty states and the District of Columbia in the general election, 17 6 Beyond accuracy, Silver's larger impact has been in changing the central polling metric and improving the way that metric is calculated.
- In case 2, each company only has an 80% chance of being found liable for its accidents (as the eyewitness may read the license plate incorrectly), and thus each company only has 80% of the full incentive.
- After Silver, the same evidence can be described as a 97.5% chance that Obama will win the election.'.
- And, like Silver, the authors advocate that the method of estimating this probability be explicitly Bayesian.
Did you find this useful? Give us your feedback
Citations
1,336 citations
11 citations
Cites methods from "The Rule of Probabilities: A Practi..."
...Numerical processing can be performed with the use of Bayes theorem (Ayres & Nalebuff, 2015; Carrier, 2012; Edwards et al., 2007; Finkelstein & Fairley, 1970; Howard, 1965) as one of the clearest mathematical examples whereby a computation of belief and/or evidence can be carried out; it is the…...
[...]
7 citations
Cites background or methods from "The Rule of Probabilities: A Practi..."
...Many papers attempt to derive various optimal legal standards by relying in part on Bayesian inference, such as Kaplan (1967), Posner (1998), Ayres and Nalebuff (2015), and Salop (2015)....
[...]
...Many papers attempt to derive various optimal legal standards by relying in part on Bayesian inference, such as Kaplan (1967), Posner (1998), Ayres and Nalebuff (2015), and Salop (2015).6 In Section 4, we similarly rely on Bayesian inference in considering judicial reliance on presumptions to modify plaintiffs’ evidentiary burden....
[...]
7 citations
6 citations
Cites background from "The Rule of Probabilities: A Practi..."
...BT is useful to assess other evidence against statistical and scientific evidence (Kadane, 2008) or to analyse DNA and other evidence (Ayres & Nalebuff, 2015; Finkelstein & Fairley, 1970; 1971), but it can serve also for a better elaboration of ED....
[...]
References
1,336 citations
79 citations
68 citations
11 citations
7 citations
Related Papers (5)
Frequently Asked Questions (6)
Q2. What is the probability of a rapist being overrepresented in the database?
It might, for example, be possible that African Americans, because of disparities in policing, are overrepresented in the database.
Q3. How do you calculate the probability that there would be another couple that matches?
To calculate the correct probability that there would be another couple that matches, it would be necessary to average the two numbers, weighting the 100% result by the chance the couple is innocent and the The author- (I - r)T probability by the chance the couple is guilty.
Q4. How can one reject the conclusion that the true source probability is anything below 19.7%?
With a 30% hit rate based on 100 trawls, the authors show that with 99% confidence one can reject the conclusion that the true source probability is anything below 19.7%.
Q5. Why have courts been reluctant to introduce evidence about the prior probability of an individual defendant's being?
2. Empiricizing the prior probabilityCourts have been particularly reluctant to introduce evidence about the prior probability of an individual defendant's being the source of forensic DNA 1 34because the very process of constructing a prior probability seems inconsistent with the notion of a fair trial.
Q6. What is the probability of a trawl from a larger database?
In this case, the chance that the source match is in the database is unchanged at p, but the expected number of innocent matches has gone up by r, Now the result of a trawl from a larger database is less convincing:Source Probability [S The authorM = 11= p(1 p + (1- p)r(D+1)(1- a)An increase in D to D + 1, holding p constant, increases the denominator and reduces the posterior source probability.