Genotyping techniques to address diversity in tumors.
Summary (5 min read)
I. INTRODUCTION
- Cancer development and tumor formation involves acquired genomic aberrations, such as sequence mutations and copy number changes.
- Thus, the amplitude of signal associated with a copy number alteration is dependent on the fraction of cells harboring the alteration.
- Importantly, with aCGH data it is not straightforward to discriminate between contamination of normal genomes and varying magnitude of underlying net copy number changes, although there have been efforts aimed at resolving this issue (Tolliver et al., 2010) .
- The authors finally discuss how these data can be used and interpreted with the aim of deducing intermixture of nonaberrant cells within tumor biopsies, as well as subclonal events and intra-tumor heterogeneity.
A. Platforms and Probe Design
- There are two SNP array platforms predominantly in use, provided by Affymetrix and Illumina, respectively.
- Here, the authors will confine to describe the basic principles of the platforms and highlight some of the differences between them.
- Since the first SNP array platforms were presented (Wang et al., 1998) , array density has increased by several orders of magnitude and the current platforms comprise millions of probes in a single assay.
- Illumina utilizes their BeadChip technology that permits probes to be immobilized on silica beads rather than directly onto the array surface.
- After target hybridization, alleles are differentiated by a subsequent enzymatic single-base extension of the probe using the hybridized target as template.
B. Principles of Data Extraction and Normalization
- Raw data acquisition and processing varies depending on array platform.
- Arrays are hybridized and labeled according to chemistry-dependent experimental procedures followed by imaging and data extraction.
- Preprocessing and normalization of probe data is performed to achieve pairs of allele-specific measurements for each SNP locus, and to this end there are various methods described (LaFramboise, 2009) .
- For calling genotype and calculating allele ratio, observed normalized intensities are related to expected values derived from collections of reference data.
- Transformation of intensities to relative copy number estimates is essentially also performed by relating values to a collection of normal reference samples or to a matched control.
C. The B Allele Frequency and Relative Copy Number
- The B allele frequency (BAF), first presented using Illumina data (Peiffer et al., 2006) , is calculated for each SNP individually by transformation of allele intensities and represents the proportion of DNA content for allele B as compared to the total DNA content of A and B alleles together.
- The proposed transformation involves linear interpolation of allele frequencies from reference data derived from normal samples.
- These probes can be used for the analysis of CNVs but many are also added to provide increased power and resolution when analyzing acquired copy number aberrations in tumors.
- Data from Affymetrix can be converted into BAF and LRR by appropriate normalization and transformation (Wang et al., 2007; Sun et al., 2009) .
D. Expected BAF and LRR for a Normal Genome
- In a diploid genome, there are only three possible allele combinations for a given locus: homozygosity for the A allele (AA), heterozygosity (AB) or homozygosity for the B allele (BB).
- Three seemingly horizontal bands representing AA, AB, and BB genotypes are apparent, closely clustered around the theoretical BAF values of 0, 0.5, and 1, respectively (Fig. 2B ).
- Such chimeric patterns may be observed in clinical samples, for example, when 13 analyzing recurring leukemias after the patient has undergone bone marrow transplantation (Paulsson et al., 2011) .
- In section II.C the authors described how SNP arrays estimate copy numbers for each SNP locus.
- By definition, a normal diploid genome has two copies of each autosome.
III. WHOLE GENOME GENOTYPING OF TUMOR SAMPLES
- Since the introduction of SNP arrays, a large number of studies have proved these platforms to be important means of analysis of acquired genomic changes.
- Since SNP arrays can detect chromosomal imbalances at both the copy number level, measured as deviation of LRR, and at the genotype level, measured as deviations of BAF, the combined use of these two measurements can be used for interpretation of underlying genomic imbalances.
- The authors will here discuss the basic concept of how copy number, and allelic ratios are affected by common genetic 14 alterations such as deletions, copy number gains, and copy number neutral events.
A. Changes in BAF and LLR upon Acquired Genomic Alterations
- As described above, there are three possible genotypes for a given SNP locus in the normal diploid genome, either heterozygous (AB) or homozygous (AA or BB).
- Thus, BAF values for all germline heterozygous SNPs are shifted from BAF=0.5 to either BAF=0, or BAF=1, depending on which chromosomal homologue that has been lost.
- For more complex alterations involving higher allele copy numbers, multiple paired genotype combinations are possible within the gained region, again depending on which homologues are present and in what proportions.
- It must be stressed that definition of copy number neutral alterations are intimately linked to the ploidy state of the tumor.
- Due to its narrow definitionhomozygosity caused by two copies from the same parent -and close association with constitutional genetics, the authors will refrain from using the term UPD when discussing copy number neutral allelic imbalance events.
B. The Mirrored B Allele Frequency (mBAF)
- In the examples above the authors demonstrated how different types of acquired chromosomal alterations influence the BAFs of constitutionally heterozygous SNP loci.
- A consecutive series of SNP alleles (a haplotype series) on a chromosome homologue is in practice random with respect to its sequence of As and Bs.
- If the authors consider a region affected by a specific genetic alteration they also note that BAF values for the SNPs within this region are symmetrically positioned around the 0.5 axis.
- In Fig. 3B the authors demonstrate this inherent symmetry for the regions of copy number and/or allelic imbalance presented in Fig. 3A .
C. Delineating Regions of Genomic Imbalance
- A number of computational methods have been described for the automated identification of altered regions in tumor genomes analyzed by SNP arrays.
- Even in case of a matched normal genotype, individual SNPs are generally not sufficient for determining the genotype at a given loci due to possible technical noise.
- The high resolution of SNP arrays permits inference of allelic imbalance from a continuous stretch of LOH without the need of a matched normal genotype.
- Fig. 4 displays typical BAF and mBAF patterns obtained from a SNP array analysis of a tumor and illustrate how data can be segmented in order to reduce data dimensionality.
- It then becomes intuitive that most acquired alterations will introduce a shift in BAF and/LRR, and that changing from one underlying state to another will involve breakpoints in the data delineating genomic alterations (Fig. 4 ).
D. BAF vs LRR Plots
- The authors have shown that SNP array data provide both genotype and copy number estimates for each SNP that is queried, and that these can be visually represented using mBAF and LRR profile plots.
- To interpret a specific genetic alteration it is needed to take both mBAF and LRR into account, and their respective relationship can be queried by plotting LRR versus mBAF (Fig. 5 ).
- When plotting segmented LRR versus mBAF (or BAF) from a tumor with a diploid chromosomal number a characteristic pattern will emerge where genomic regions with identical allele combinations will appear close to each other within the mBAF/LRR space (Fig. 5 ).
- Segments of one copy gain (BBA) will appear together as a cluster of values with elevated LRR and mBAF, approaching their theoretical values of mBAF=0.67 and LRR=0.58.
- All unaltered segments (AB) will form a dense cluster at mBAF=0.5 and LRR=0.
IV. WGG ANALYSES OF COMPLEX AND HETEROGENEOUS CELL POPULATIONS
- The authors have so far discussed relatively simple examples of alterations affecting one homogenous population of tumor cells.
- In practice however, WGG analyses are often performed on heterogeneous tumor samples that contain more than one distinct population of cells.
- Thus, the proportion of nonaberrant cells will vary from sample to sample.
- Regardless of the cause and nature of included nonaberrant cells, the presence of normal diploid cells within a tumor sample can cause problems in downstream analyses and subsequent interpretations of the data.
- Moreover, cancers may to varying degrees be composed of multiple clones harboring divergent aberrations that are acquired subsequent to the tumor-initiating event.
B. BAF and LRR in an Admixture of Tumor and Normal Cells
- The above theoretical examples have focused on situations when there is only one clone present within the sample, i.e., all analyzed cells have identical genotypes.
- Several studies have successfully demonstrated this using tumor biopsies by comparing BAF derived estimates with cellularity scores from histological examination (Nancarrow et al., 2007; Assie et al., 2008; Sun et al., 2009) .
- The principles of estimating the fraction of normal cells can be illustrated using a simple example (Fig. 8D ).
- The combination of normal contamination and increased clonal heterogeneity can rapidly increase the complexity of the data and thereby reduce the possibility to resolve underlying genotype status.
C. Tumor Subclonality
- The presence of genetic variation between different subclones within a tumor mass is a well-known phenomenon.
- Subclonal genetic alterations may readily be identified at the individual cell level by conventional cytogenetics or fluorescence in situ hybridization.
- Current molecular analyses of bulk samples will however only give an average estimate of all imbalances.
- If the authors further expand their example of a sample of 80% tumor cells and 20% normal diploid cells (Figs. 5 and 8D ) and hypothesize that 50% of the tumor cells carry 27 some additional alterations, they can simply calculate expected mBAF for these using Eq. ( 2).
- Subsequent validation is necessary to definitively resolve the underlying states.
D. Tracing Clonal Relationships Using SNP Arrays
- Depiction of copy number gain and loss frequencies across large tumor cohorts highlight recurrent alterations and can be used to classify tumors into 28 groups with related karyotypes (Russnes et al., 2010) .
- To be able to discern and model the underlying chronology of events, repeated samples from the same individual has to be studied.
- The authors will here present some hypothetical examples of how SNP array data can be used to analyze multiple tumors from the same patient in order to investigate clonal expansion, chronology of events, and divergence in clonal evolution.
- The latter scenario will suggest that the two alterations were in fact confined to separate subclones in the primary.
- The authors exemplify this for a deletion in which BAF is used to infer the complete haplotype sequences of the parental alleles (Fig.
V. CONCLUDING REMARKS
- Throughout recent years, molecular techniques to study cancer have progressed in terms of resolution and sensitivity, but also with respect to accessibility due to decreased cost.
- Undoubtedly, technologies will continue to evolve and much of what is considered at the forefront today will be superseded tomorrow.
- The authors have aimed to present some basic concepts pertaining to the analysis of tumor-heterogeneity using genotyping techniques.
- In the AAAB/ABBB two copy gain segment, the AAAB genotypes (BAF=0.2) will be transformed to the mirrored genotype (BBBA) with mBAF=0.8.
- Non-informative homozygous SNPs are excluded from this plot.
Did you find this useful? Give us your feedback
Citations
219 citations
116 citations
66 citations
32 citations
17 citations
References
51,099 citations
"Genotyping techniques to address di..." refers background in this paper
...The interplay between cells within the tumor microenvironment has been highlighted as important hallmarks of cancer and its composition has been shown to represent an intrinsic property of tumors (Hanahan and Weinberg, 2011)....
[...]
10,287 citations
"Genotyping techniques to address di..." refers background in this paper
...With the advent of array-technology (Schena et al., 1995), the analysis of cancer genomes advanced rapidly with greatly increased resolution and sensitivity....
[...]
3,413 citations
"Genotyping techniques to address di..." refers background in this paper
...Conventional GCH, first described by Kallioniemi and coworkers (Kallioniemi et al., 1992), use differentially fluorescently labeled DNA from tumor sample and reference DNA to reveal regions of loss and gain by competitive hybridization to immobilized normal metaphase chromosomes....
[...]
2,937 citations
"Genotyping techniques to address di..." refers background in this paper
...However, it is worth to mention that constitutional CNVs are quite common (Iafrate et al., 2004; Sebat et al., 2004)....
[...]
2,572 citations
"Genotyping techniques to address di..." refers background in this paper
...However, it is worth to mention that constitutional CNVs are quite common (Iafrate et al., 2004; Sebat et al., 2004)....
[...]
Related Papers (5)
Frequently Asked Questions (14)
Q2. Why do the authors refrain from using the term UPD when discussing copy number neutral imbalance events?
Due to its narrow definition – homozygosity caused by two copies from the same parent – and close association with constitutional genetics, the authors will refrain from using the term UPD when discussing copy number neutral allelic imbalance events.
Q3. What are the main uses of LOH analysis?
LOH analyses have, on the other hand, been widely used in cancer research to detect regions of allelic imbalances indicating regions of genomic deletion or copy number neutral LOH, and have been used to identify tumor suppressor genes inactivated by mutation followed by loss of the wild-type allele.
Q4. What makes SNP arrays ideal for the identification of copy number neutral imbalances?
The combination of genotype and copy number measurements makes SNP arrays ideal for the identification of copy number neutral imbalances.
Q5. How many bands are possible for a normal genome?
The BAF profile of a homozygous genome, e.g., a haploid genome, will consequently present only 2 bands, restricted to theoretical BAF values 0 and 1, whereas a triploid genome will show four bands.
Q6. What technology is used to immobilize probes?
Illumina utilizes their BeadChip technology that permits probes to be immobilized on silica beads rather than directly onto the array surface.
Q7. What can be done to reduce the complexity of data?
Although values from individual SNPs can be plotted, various segmentation approaches can effectively reduce the complexity of data, i.e., defining regions of genomic balance or imbalance and treating these as individual events assigned representative mBAF and LRR values.
Q8. How can the authors calculate BAF values for heterogeneous samples?
Equation (1) can with some minor modifications be used to calculate BAFvalues for any given locus in case of heterogeneous samples.
Q9. What are the main advantages of SNP arrays?
SNP array platforms have also successfully been applied to address problems regarding intermixture of nonaberrant cell populations.
Q10. What is the significance of the interplay between cells within the tumor microenvironment?
The interplay between cells within the tumor microenvironment has been highlighted as important hallmarks of cancer and its composition has been shown to represent an intrinsic property of tumors (Hanahan and Weinberg, 2011).
Q11. What is the purpose of the transformation of allele intensities to relative copy number estimates?
Transformation of intensities to relative copy number estimates is essentially also performed by relating values to a collection of normal reference samples (HapMap) or to a matched control.
Q12. How can one circumvent the limited availability of multiple samples from individual patients?
The limited availability of multiple samples from individual patients can be circumvented by macro or micro dissection (Navin et al., 2010) or cell sorting procedures followed by expansion in animal models (Navin et al., 2011), effectively performing multiple samplings of the same tumor.
Q13. What is the expected BAF and LRR for a normal genome?
Examples of expected BAF and LRR values for a normal genome and how these values are affected by acquired genetic aberrations is further discussed below.D. Expected BAF and LRR for a Normal GenomeIn a diploid genome, there are only three possible allele combinations for agiven locus: homozygosity for the A allele (AA), heterozygosity (AB) orhomozygosity for the B allele (BB).
Q14. How many bands are seen when analyzing a normal diploid genome?
The authors previously described that, when considering a larger series of SNPs, a BAF plot will appear as banded and that three bands are seen when analyzing a normal diploid genome.