Laboratory 18: DNA Forensics

Objectives: After completing this laboratory exercise, you should be able to:
• Identify the flanking sequences and the number of repeat units of a particular STR, and record this STR using the shorthand method
• Identify all of the possible STR allele combinations that a child could inherit at a particular locus, given the STR allele combinations of the parents
• Model the DNA profiling process in order to solve different criminal cases, and relate the results obtained to overall criminal investigation

INTRODUCTION

Most individuals of the same species, whether African elephants, portobello mushrooms, white oak trees, or humans, have nearly identical DNA. But the DNA sequence at certain locations, or loci, throughout the genome varies among individuals. These variations can be used to distinguish one individual from another of the same species. The process of analyzing these DNA variations for the purpose of identification is known as DNA profiling, or genetic fingerprinting.

DNA profiling techniques have been used for a variety of reasons, including forensic science (matching a suspect’s or victim’s DNA with samples found at the scene of a crime or catastrophe), paternity testing, historical investigations, missing-person investigations, identifying victims of accidents and disasters, and cataloging convicted offenders in a database. In this activity, you will learn about DNA profiling and how to apply it to solve different cases. Think what you learn in the classroom has no practical use? Think again.

PART 1: DNA PROFILING BASICS

While most of the genome is identical among individuals of the same species, differences do exist. DNA profiling takes advantage of these differences. Variations occur throughout the genome, and in particular, in regions of noncoding DNA, which is DNA that is not transcribed and translated into a protein. Variations in noncoding regions are less likely to affect an individual’s phenotype, and therefore changes in these regions are less likely to be eliminated by natural selection.

DNA profiling uses a category of DNA variations called short tandem repeats. STRs are comprised of units of bases, typically two to five bases long, which repeat multiple times. The repeat units are found at different locations, or loci, throughout the genome.

Every STR has multiple alleles, or variants, each defined by the number of repeat units present or by the length of the sequence. They are surrounded by nonvariable segments of DNA known as flanking regions.

For example, the STR allele in Figure 1 could be designated as “6” because the repeat unit (GATA) repeats six times, or as 70 bp (where bp stands for base pairs) because it is 70 bp in length, including the flanking
regions. A different allele of this same STR would have a different number of GATA repeat units but the same flanking regions.

Flanking regions are important because knowing their sequences enables geneticists to isolate the STR using polymerase chain reaction, or PCR, amplification.

Figure 1. Each one of the rectangles above represents a repeat unit. In this example, the STR is comprised of six repeats of the four-base unit GATA. On either side of the STR is a flanking region of DNA.

If you were to write out the STR sequence in Figure 1, it would be GATAGATAGATAGATAGATAGATA. For STRs with many repeat units, writing out the sequence can get very unwieldy, so geneticists use shorthand. The repeat unit is placed in brackets with a subscript indicating the number of times it repeats. The shorthand for the STR in the example above would be [GATA]6.

  1. Identify the flanking sequences and the number of repeat units [GAAT] in the following STR, known as TPOX, on human chromosome 2:
    CCACACAGGTAATGAATGAATGAATGAATGAATGCCTAAGTGCC
    a. partial flanking sequences: ________ and _________
    b. number of repeat units: _
  2. Write out the STR shown above using genetic shorthand. _____
    STRs are inherited just like any gene or segment of DNA. Every individual has two alleles per STR, one inherited from each parent. However, many different alleles are often present within a population. If the inherited alleles for a given STR in an individual are identical (i.e., contain the same number of repeat units), the individual is homozygous for that STR. If the individual has inherited two different alleles for a given STR, then he or she is heterozygous for that STR. Figure 2 shows a simple model of STR inheritance.

Figure 2. Model of inheritance for a single STR. The numbers refer to the allele number. Here, the mother is homozygous for an allele with 10 repeat units. The father is heterozygous at this STR locus, with one allele having 12 repeats and the other having 14. Their daughter is heterozygous, having inherited allele 10 from her mother and allele 12 from her father.

  1. Is there any way that the daughter in Figure 3 could have been homozygous for the STR shown? Explain your answer.
  2. Using the diagram on the right, identify all of the possible STR allele combinations that the son could have inherited at this particular locus.  
    PART 2: HOW DNA PROFILING IS DONE

So how do scientists analyze DNA to determine an individual’s STR alleles?

First, DNA is extracted and isolated from the other materials in an individual’s cells. Once extracted, the target STRs are amplified by PCR. Different STRs are tagged using a different fluorescent marker so that they can easily be detected and differentiated. For efficiency, many STRs are analyzed at once during genetic profiling.

You may be familiar with gel electrophoresis as a technique for separating DNA fragments by size. Researchers analyzing many fragments from many samples use a variation of the technique called capillary electrophoresis, which is faster and more easily automated.

Figure 3 shows a model of capillary electrophoresis. As DNA samples go through the system, the results appear as an electropherogram, or a graph showing the quantity of light at specific wavelengths detected over time. DNA fragments appear as different-colored peaks that can easily be compared across samples and to the DNA ladder.

Figure 3. A simplified schematic of how an electropherogram for a single STR is generated. In this example, the individual has one allele with five repeats and another with 11 repeats. During PCR, the DNA fragments are amplified and a fluorescent tag is applied. After PCR, the DNA samples are run through ultrathin capillary tubes filled with a polymer. An electrical field is then applied, causing the negatively charged DNA fragments to move toward a positively charged electrode, with larger DNA fragments moving more slowly than smaller fragments. As the sample is drawn through the capillary tube, a laser excites the tags and the emitted fluorescence is measured by an electronic camera (the “detector”). Fragments of the same length pass the detector at the same time. Running fragments of known DNA sizes (known as a DNA ladder or DNA standards) through the system allows scientists to determine the sizes of alleles in the DNA samples. The alleles are shown in the electropherogram as labeled peaks. This technique also allows scientists to construct a graph that relates fragment sizes (measured as number of base pairs) versus time. Computer software uses data from the detector to determine the sizes of alleles in the DNA samples, in this example 55 bp (allele 5) and 79 bp (allele 11).

Figure 4. An electropherogram from an individual, showing the alleles at four STR loci: D8S1179, D21S11, D7S820, and CSF1PO. Each locus shows the alleles below it as peaks in the electropherogram. The number of repeats for each allele is indicated.

  1. Examine the electropherogram in Figure 4. Describe how you determine whether an individual is heterozygous or homozygous for a particular STR.
  2. List the STR locus or loci at which this individual is homozygous. ___________
  3. Which locus has the longest DNA fragments? _________ How do you know?

Different individuals have unique sets, or profiles, of STR alleles. For many forensic analyses, researchers use a core set of 13 STR loci established in 1996 by the FBI Laboratory. They are CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11. In addition, a gene marker called AMEL is used to determine an individual’s gender and is analyzed along with the 13 STR loci as part of that individual’s genetic fingerprint.
When all 13 loci are analyzed, the chance of two people who aren’t identical twins having the same exact set of alleles for each of these 13 STRs is extremely low—some put it in the neighborhood of 1 in 10,000,000,000,000. In other words, the chance of a random perfect match at all 13 loci is about one in 10 trillion.

PART 3: BUILD A DNA PROFILE AND SOLVE A CRIME

Today’s technology has made it easier to quickly and accurately generate DNA profiles. In this part of the activity, you will model the process yourself to solve a crime. Good luck, detective!

Crime Report: A thief has stolen a priceless collection of jewels from the Museum of Precious Jewels. Forensic technicians obtained skin cells from a forehead print left on the glass enclosure of the jewel exhibit. DNA has been isolated and PCR amplified for some of the standard STR loci. A partial genetic profile generated from the collected DNA is shown in Figure 5.

Figure 5. The DNA profile of the forehead print from the scene of the crime. Each colored line shows the alleles for one of four of the core CODIS STR loci (D5S818, CSF1PO, D7S820, D8S1179).

A suspect was identified in the case. Her DNA was collected, and data for the four STR loci that were included in the analysis of the forehead print were obtained. The suspect’s profile data is shown on the next page.

Your task: Compare the suspect’s data below to the electropherogram in Figure 5. Analyze each STR locus by locating the repeats for each locus and counting the number of repeats, and indicate whether the suspect is homozygous or heterozygous at each locus. Record your answers on the next page.

SUSPECT’S DNA PROFILE DATA
D5S818
allele 1: 5'–CAATCATAGCCACAAGATAGATAGATAGATAGATAGATAGATACCAAAGAG–3'
allele 2: 5'–CAATCATAGCCACAAGATAGATAGATAGATAGATAGATAGATAGATACCAAAGAG–3'
CSF1PO
allele 1: 5'–GGCCATCTTCAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATGCTAGTCC–3'
allele 2: 5'–GGCCATCTTCAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATGCTAGTCC–3'
D7S820
allele 1: 5'–CCTCATTGACGATAGATAGATAGATAGATAGATACATAGTCAG–3'
allele 2: 5'–CCTCATTGACGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATACATAGTCAG–3'
D8S1179
allele 1: 5'–GTTCATTTTCATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTACGAATGTACA–3'
allele 2: 5'–GTTCATTTTCATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTACGAATGTACA–3'

Locus Repeat unit # of repeats Allele 1 # of repeats Allele 2 Homozygous or Heterozygous Total bp

STR D5S818 on Chromosome 5 AGAT
STR CSF1PO on Chromosome 5 TAGA
STR D7S820 on Chromosome 7 GATA
STR D8S1179 on Chromosome 8 TCTA

Now, use the data to fill in the suspect’s electropherogram and answer the analysis questions.

Analysis Questions:

  1. Compare your DNA profile to the one generated by the suspect’s DNA. Do they match?
  2. Make a claim about this suspect’s guilt or innocence based on this evidence. How confident are you that your claim is correct?

PART 4 – EXTENSION ACTIVITY
Allele frequencies tell you how common a given allele is in a population. An allele frequency of 0.03 means that 3% of the population can be expected to have that particular allele. The frequency in the U.S. population of each STR allele from the suspect’s profile has been provided in the table below.

STR allele from suspect’s DNA profile Frequency in the U.S. population
D5S818 allele 1 0.0106
D5S818 allele 2 0.0198
CSF1PO allele 1 0.0656
CSF1PO allele 2 0.0656
D7S820 allele 1 0.0005
D7S820 allele 2 0.1361
D8S1179 allele 1 0.0787
D8S1179 allele 2 0.0787

Table 1: Frequencies for suspect alleles. (Data source: Promega allele frequencies
https://www.promega.com/products/pm/genetic-identity/population-statistics/allele-frequencies/)

To calculate the probability of a given genotype, we can use the formulas below:
Heterozygous genotype frequency = 2pq, where p is the frequency of the first allele and q is the frequency of the second allele
Homozygous genotype frequency = p2, where p is the frequency of the allele

  1. Using the U.S. frequency data, calculate the probability of having the given genotype at each locus. Show your work.
    a. Probability of genotype for D5S818: _____________

b. Probability of genotype for CSF1PO: _____________

c. Probability of genotype for D7S820: _____________

d. Probability of genotype for D8S1179: ______________

Remember that the alleles above are inherited independently of one another. To determine the probability of a locus being inherited with another locus, we use the product rule for combining probabilities. To calculate the probability of having a given genetic fingerprint, the probabilities of having each STR genotype are multiplied by one another.

  1. Calculate the probability of someone else having a DNA profile identical to that of the suspect.
    Show your work.

The probability calculated in Question 4 represents the likelihood that someone other than the suspect left the DNA found at the crime scene.

  1. Based on your calculations, explain to the members of the jury why they should feel confident that the suspect was at the scene of the crime.  
    PART 5 – CASE STUDY: THE INNOCENCE PROJECT

INTRODUCTION
Note: In this case, DNA profiling is used to reexamine evidence and challenge the guilt of a convicted individual. Although this activity discusses a real case explored by the Innocence Project, the DNA profiles provided do not represent any specific individuals.

DNA is a powerful tool used to exonerate or incriminate people accused of committing crimes. The technology to determine an individual’s DNA profile has not always been available, however. The Innocence Project is an organization that uses genetic fingerprinting, among other techniques, to review evidence in cases that occurred before DNA analysis was possible or commonplace. Since it was founded in 1992, the Innocence Project has been involved in overturning the convictions of more than 340 people. In addition, the new evidence they provide has led to the capture of nearly 150 people who actually committed some of the crimes. The Innocence Project provides all of its services free of charge. More information about the Innocence Project can be found at http://www.innocenceproject.org.

One case examined by the Innocence Project was that of Malcolm Bryant. Bryant was accused of using a knife to kill a 16-year-old girl walking in Baltimore, MD, in 1998. A friend who was walking with the victim and was also dragged away by the attacker later identified Bryant as the killer. This eyewitness account led to Bryant’s conviction for murder on August 5, 1999, and Bryant was sentenced to life in prison. In 2009, a petition was granted to test the victim’s fingernail clippings for DNA in case she had struggled with her attacker and some of his skin got under her nails. The testing revealed a DNA profile that belonged to the victim, as well as a DNA profile of an unrelated male. A partial genetic profile of the male is shown in Figure 1.

Figure 1. Electropherogram showing the profile of DNA recovered from under the victim’s fingernails, which may match the killer’s DNA profile.

A forensic technician hired by the Innocence Project obtained a sample of Malcolm Bryant’s DNA and determined his genetic fingerprint. Information from the analysis is in Table 1.

  1. Complete Table 1 using mathematical reasoning. Remember that the total fragment length is equal to the length of the STR plus the length of the flanking sequences. Further, flanking sequences will always be identical for each allele of a given STR.
    Table 1. Data from Malcolm Bryant’s Partial DNA Profile.

STR allele Repeat structure Length of flanking sequences (bp) Total fragment length (bp)
TPOX allele 1 [GAAT]9 118
TPOX allele 2 138
D18S51 allele 1 [AGAA]8 266
D18S51 allele 2 [AGAA]15
D7S820 allele 1 233 257
D7S820 allele 2 [GATA]8 233
CSF1PO allele 1 [TAGA]6 267
CSF1PO allele 2 [TAGA]6 267

  1. Draw Bryant’s partial genetic fingerprint using the data from Table 1.
  2. Does the evidence suggest that Bryant is guilty or innocent of the crime? Explain your answer.
  3. Suggest one or two reasons why DNA fingerprinting is more reliable than an eyewitness account.
  4. Identify one or two reasons why DNA fingerprinting isn’t always possible.

After this initial discovery, authorities conducted additional DNA testing. In 2015, a full genetic fingerprint was obtained from the victim’s T-shirt near the site of the knife wound. The full genetic profile did not match Bryant’s genetic profile. In May of 2016, Malcolm Bryant was exonerated of the crime he had been convicted of almost 17 years earlier. To learn more about the Bryant case, visit Bryant’s page at The National Registry of Exonerations website: https://www.law.umich.edu/special/exoneration/pages/casedetail.aspx?caseid=4883

Adapted from http://www.hhmi.org/biointeractive/dna-profiling-activity