&get_box_var;
ORIGINAL ARTICLE
An Exome Sequencing Study to Assess the Role of Rare Genetic
Variation in Pulmonary Fibrosis
Slav ´e Petrovski
1,2
*, Jamie L. Tod d
3,4
*, Michael T. Durheim
3,4
, Quanli Wang
1
, Jason W. Chien
5
, Fran L. Kelly
3
,
Courtney Frankel
3
, Caroline M. Mebane
1
, Zhong Ren
1
, Joshua Bridgers
1
, Thomas J. Urba n
6
, Colin D. Malone
1
,
Ashley Finlen Copeland
3
, Christie Brinkley
3
, Andrew S. Allen
7
, Thomas O’Riordan
5
, John G. McHutchison
5
,
Scott M. Palmer
3,4‡
, and David B. Goldstein
1‡
1
Institute for Genomic Medicine, Columbia University Medical Center, New York, New York;
2
Department of Medicine, Austin Health and
Royal Melbourne Hospital, The University of Melbourne, Melbourne, Victoria, Australia;
3
Division of Pulmonary, Allergy, and Critical Care
Medicine, Department of Medicine, Duke University Medical Center, Durham, North Carolina;
4
Duke Clinical Research Institute, Durham,
North Carolina;
5
Gilead Sciences, Foster City, California;
6
Division of Pharmacotherapy and Experimental Therapeutics, Center for
Pharmacogenomics and Individualized Therapy, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North
Carolina; and
7
Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
ORCID IDs: 0000-0002-1527-961X (S.P.); 0000-0002-4186-5698 (M.T.D.); 0000-0003-4695-2923 (J.W.C.); 0000-0002-7232-2143 (A.S.A.).
Abstract
Rationale: Idiopathic pulmonary fibrosis (IPF) is an increasingly
recognized, often fatal lung disease of unknown etiology.
Objectives: The aim of this study was to use whole-exome
sequencing to improve understanding of the genetic architecture of
pulmonary fibrosis.
Methods: We performed a case–control exome-wide collapsing
analysis including 262 unrelated individuals with pulmonary fibrosis
clinically classified as IPF according to American Thoracic
Society/European Respiratory Society/Japanese Respiratory
Society/Latin American Thoracic Association guidelines (81.3%),
usual interstitial pneumonia secondary to autoimmune conditions
(11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The
majority (87%) of case subjects reported no family history of
pulmonary fibrosis.
Measurements and Main Results: We searched 18,668
protein-coding genes for an excess of rare deleterious genetic
variation using whole-exome sequence data from 262 case subjects
with pulmonary fibrosis and 4,141 control subjects drawn from
among a set of individuals of European ancestry. Comparing
genetic variation across 18,668 protein-coding genes, we found a
study-wide significant (P , 4.5 3 10
27
) case enrichment of
qualifying variants in TERT, RTEL1, and PARN. A model qualifying
ultrarare, deleterious, nonsynonymous variants implicated TERT
and RTEL1, and a model specifically qualifying loss-of-function
variants implicated RTEL1 and PARN. A subanalysis of 186 case
subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as
study-wide significant contributors to sporadic IPF. Collectively,
11.3% of case subjects with sporadic IPF carried a qualifying variant
in one of these three genes compared with the 0.3% carrier rate
observed among control subjects (odds ratio, 47.7; 95% confidence
interval, 21.5–111.6; P = 5.5 3 10
222
).
Conclusions: We identified TERT, RTEL1, and PARN—three
telomere-related genes previously implicated in familial pulmonary
fibrosis—as signi ficant contributors to sporadic IPF. These results
support the idea that telomere dysfunction is involved in IPF
pathogenesis.
Keywords: genetics; exome sequencing; pulmonary fibrosis;
interstitial lung disease; collapsing analysis
( Received in original form October 18, 2016; accepted in final form January 18, 2017 )
*These authors contributed equally to this work.
‡
These authors contributed equally to this work.
Supported by internal funding from the Duke University School of Medicine and Department of Medicine, National Institutes of Health (NIH)/NHLBI K24
grant 1K24 HL091140-01A1 (S.M.P.), R. D. Wright Career Development fellowship 1126877 (S.P.), and a grant from Gilead Sciences (D.B.G.). J.W.C., T.O’R.,
and J.G.M. are employees of Gilead Sciences, Inc. The Duke and NIH funds supported sample collection and clinical phenotyping, and Gilead funding
supported exome analysis.
This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org
Am J Respir Crit Care Med Vol 196, Iss 1, pp 82–93, Jul 1, 2017
Copyright © 2017 by the American Thoracic Society
Originally Published in Press as DOI: 10.1164/rccm.201610-2088OC on January 18, 2017
Internet address: www.atsjournals.org
82 American Journal of Respiratory and Critical Care Medicine Volume 196 Number 1
|
July 1 2017
Pulmonary fibrosis describes a group of
diseases generally characterized by
progressive inflammation and/or fibrosis of
the lung parenchyma. Pulmonary fibrosis
may be further categorized to known or
idiopathic causes on the basis of clinical
history, radiographic appearance, and
laboratory and/or histological evaluation.
Idiopathic pulmonary fibrosis (IPF) is an
increasingly diagnosed and often fatal lung
disease associated with an estimated survival
of 3–5 years (1, 2). Novel therapeutic agents
for IPF have recently been approved and
have been shown to slow the course of
disease progression. Lung transplant,
however, remains the only treatment
definitively shown to improve survival in
select patients with end-stage pulmonary
fibrosis, including IPF. In this context,
enhancing understanding of the basic
pathogenesis of pulmonary fibrosis is
critical.
Approximately 10% of pulmonary
fibrosis cases are considered familial in
origin (3). Genome-wide association studies
have revealed associations of common
variants in MUC5B, SPPL2C, and TOLLIP
with both familial pulmonary fibrosis (FPF)
and sporadic IPF (4, 5). Of these, the most
promising has been the MUC5B promoter
allele (rs35705950G.T) (4); however, these
associations generally explain a small
proportion of the heritability and have not
always resolved to the underlying causal
gene variants. By giving researchers the
ability to focus specifically on the protein-
coding regions of the genome, whole-
exome sequencing allows identification of
alleles with direct functional consequences
on protein products. Indeed, most known
human disease–causing variants reside
within the exome (6, 7).
Early studies of surfactant proteins A2
(8) and C (9) associated these two proteins
with FPF. Yet, the best characterized
examples in pulmonary fibrosis have been
genes related to telomerase; specifically,
variants in TERT and TERC have been
found to associate with telomere shortening
and increased susceptibility to FPF (10, 11).
More recently, exome sequencing of 78
European individuals with FPF identified
an excess of deleterious variants in two
novel genes important to telomere
maintenance: PARN and RTEL1 (12).
Exome analysis provides a high-
throughput, cost-effective strategy to
discover disease-causing mutations, but to
date it has not been extended to the study of
sporadic pulmonary fibrosis. The objectives
of this study therefore were to (1) gain
insight into the genetic architecture of
pulmonary fibrosis by applying whole-
exome sequencing to identify genes
carrying an excess of rare deleterious
variants in precisely phenotyped
individuals with pulmonary fibrosis, (2)
focus specifically on the genetics of a subset
of patients with sporadic IPF, and (3)
examine the relationship between rare
variants associated with pulmonary fibrosis
risk as identified through whole-exome
sequencing and the previously reported
common risk allele in the MUC5B
promoter region.
Methods
Study Cohort
The case population comprised 262
unrelated individuals of European ancestry
who underwent lung transplant at Duke
University Medical Center for pulmonary
fibrosis. All patients were clinically well
characterized prior to transplant through
physician interview for a complete medical
history, including for family history of
pulmonary fibrosis, chest computed
tomography, serological evaluation, and
pulmonary function testing. At the time of
transplant, explanted native lung tissue was
examined by an experienced lung
pathologist for histological characterization
of pulmonary fibrosis. For this study, a
specific pulmonary fibrosis phenotype was
adjudicated to each case on the basis of
review of the comprehensive medical,
radiographic, and histological data by a
pulmonologist with expertise in fibrosing
lung disease. As illustrated in Table 1, the
final case cohort included patients
confirmed to have IPF according to the
American Thoracic Society/European
Respiratory Society/Japanese Respiratory
Society/Latin American Thoracic
Association (ATS/ERS/JRS/ALAT)
Author Contributions: D.B.G. and S.M.P. conceived of and designed the study; J.L.T., M.T.D., F.L.K., C.F., A.F.C., C.B., and S.M.P. acquired and
processed the clinical samples; M.T.D. performed the clinical phenotyping with support from J.L.T. and S.M.P.; S.P., Q.W., Z.R., and J.B. performed the
bioinformatic processing; C.M.M. and C.D.M. performed the TaqMan genotyping; S.P. analyzed the data with support from Q.W., A.S.A., and D.B.G.; S.P.,
J.L.T., A.S.A., S.M.P., and D.B.G. interpreted the data; S.P., J.L.T., M.T.D., S.M.P., and D.B.G. drafted the manuscript; and all authors critically revised the
manuscript for important intellectual content.
Correspondence and requests for reprints should be addressed to Slav ´e Petrovski, Ph.D., Institute for Genomic Medicine, Columbia University Medical Center, 701
West 168th Street, 14th Floor, New York, NY 10032. E-mail: slavep@unimelb.edu.au
At a Glance Commentary
Scientific Knowledge on the
Subject:
Idiopathic pulmonary
fibrosis (IPF) is an increasingly
diagnosed and often fatal lung disease
for which no curative treatments exist.
Prior genetic studies of pulmonary
fibrosis have predominantly been
focused on familial forms of the disease
and have associated several genes and
common risk alleles with disease
development. The degree to which these
familial genetic associations extend to
sporadic IPF remains uncertain.
What This Study Adds to the
Field:
We performed whole-exome
sequencing to identify rare variants of
protein-coding genes in a cohort of
patients with predominantly sporadic
IPF. Our results demonstrate a case
enrichment for ultrarare deleterious
qualifying variants in three study-wide
significant genes: TERT, RTEL1,and
PARN. Collectively, these variants,
which included dominant loss-of-
function alleles in RTEL1 and PARN
and ultrarare missense variants
predicted to be damaging in TERT and
RTEL1, contributed to more than 10%
of the sporadic IPF cases. This provides
the first evidence that telomere-related
genes previously implicated in familial
pulmonary fibrosis also make a major
contribution to the genetic architecture
of sporadic IPF and supports a body of
literature implicating telomere
dysfunction as a contributor to disease
development in this population.
ORIGINAL ARTICLE
Petrovski, Todd, Durheim, et al.: Pulmonary Fibrosis: An Exome Sequencing Study 83
guidelines (1) (213 [81.3%] of 262), patients
with usual interstitial pneumonia secondary
to autoimmune conditions (30 [11.5%]
of 262), and patients with fibrosing
nonspecific interstitial pneumonia (19
[7.2%] of 262). The majority of the cohort
(229 [87%] of 262) reported no family
history of pulmonary fibrosis. All case
subjects consented to DNA collection
and participation in institutional review
board–approved genetic studies according
to Duke Institutional Review Board
Protocols 00009091 and 00056268. The
control population comprised 4,141
unrelated individuals of European ancestry
selected for control purposes through
unrelated studies not focused on pulmonary
disorders, severe pediatric disorders, or other
clinical phenotypes where pulmonary
fibrosis is a recognized comorbidity (see
Table E1 in the online supplement).
Exome Sequencing and Bioinformatic
Processing
Exome sequencing of blood-extracted DNA
was performed at the Institute for Genomic
Medicine at Columbia University using the
SureSelect Human All Exon (65 MB; Agilent
Technologies, Santa Clara, CA) or the
NimbleGen SeqCap EZ version 2.0 or 3.0
exome enrichment kit (Roche NimbleGen,
Madison, WI) on HiSeq 2000 or 2500
sequencers (Illumina, San Diego, CA)
according to standard protocols. Whole-
exome sequence data from the 262 case
subjects with sporadic IPF and 4,141 control
subjects were processed using the same
bioinformatic pipeline (see M
ETHODS section
in online supplement).
On average, at least 10-fold sequencing
read coverage was achieved for 96.9% and
95.7% of the 33.27 megabase pairs (Mbp) of
the Consensus Coding Sequence (CCDS;
release 14) for case and control subjects,
respectively. To alleviate confounding
attributable to differential coverage, for all of
the 33.27-Mbp positions in the CCDS
sequence, we determined both the
percentage of case subjects and the
percentage of control subjects who had at
least 10-fold coverage at the site (see
M
ETHODS section in online supplement). An
individual CCDS site was excluded from
analysis if the absolute difference in
percentages of case subjects compared with
control subjects who achieved at least
10-fold coverage at the site was greater than
6.0% (Figure E2). This site-based pruning
resulted in 7.8% of the CCDS sites being
excluded. All collapsing tests were then
performed on the pruned 30.67 Mbp of
CCDS sites (i.e., 92.2% of the CCDS) where
case and control subjects had a similar
opportunity to call variants. For the
remaining 30.67 Mbp, on average, case and
control subjects had at least 10-fold
coverage for 98.1% and 97.9% of CCDS
sites, respectively. To further confirm no
preferential inflation of background
variation, we assessed the exome-wide
tally of rare autosomal synonymous
(i.e., presumed neutral) variants per
individual and did not find a significant
difference between the case and control
groups (P = 0.68) (Figure E3, Table E3).
Autosomal read depth (i.e., sequencing
coverage) was also consistent between case
and control subjects, with a case average of
96.35 6 25.78 reads and a control average
of 97.88 6 24.22 reads (P = 0.35 by two-
sample t test) (Table E3).
Statistical Analysis
To search for genes conferring pulmonary
fibrosis risk, we implemented a genetic
collapsing test (13, 14). After site-based
pruning, we focused our analyses on CCDS
protein-coding sites with minimal
variability in coverage between the case and
control populations. As initially introduced
in our earlier work (13), we use the term
qualifying variant to refer to the subset of
genetic variations within the sequence data
that meets specific population allele
frequency and predicted variant effect
criteria. We defined seven different
qualifying variant models (Table E4). Our
primary model was focused on searching
for “ultrarare” nonsynonymous variants to
capture the category of genetic variation
expected to be most enriched for variants of
high effect. To identify ultrarare variants,
we use internal (test cohort) and external
(Exome Variant Server and Exome
Aggregation Consortium release 0.3 [15])
sequence data to find variants with a minor
allele frequency (MAF) of less than 0.05%
among our combined case and control test
populations and absent (MAF of 0%)
among the two external reference control
cohorts. For the primary model, qualifying
variants were restricted to indels and single-
nucleotide variants annotated as having
either a loss-of-function (LoF) effect, an in-
frame indel, or a “probably damaging”
missense prediction by Polymorphism
Phenotyping version 2 (PolyPhen,
HumDiv; http://genetics.bwh.harvard.edu/
pph2/) (16). These analyses relied on the
predicted effects of the LoF and missense
annotated variants whose functions have
not been individually confirmed in the
laboratory. We subsequently performed
analyses of CCDS genes using six
alternative qualifying variant models as
defined in Table E4, including an autosomal
recessive model and a synonymous variant
negative control model.
For each of the seven models, we
tested the list of 18,668 CCDS genes. For
each gene, an indicator variable (1/0 states)
was a ssigned to each individual on the basis
of presence of at least o ne qualifying
variant in the gene (state 1) or no
qualifying variant s in that gene (state 0).
Atwo-tailedFisher’sexacttest(FET)was
then performe d for each gene to com pare
Table 1. Characteristics of the Pulmonary Fibrosis Cohort
Characteristics Pulmonary Fibrosis Cohort
Number of individuals 262
Sex, n (%)
Male 204 (78%)
Female 58 (22%)
Transplant age, yr, mean 6 SD 63.2 6 8.2
Clinical pulmonary fibrosis phenotype, n (%)
IPF 213 (81.3%)
CTD UIP 30 (11.5%)
Fibrosing NSIP 19 (7.2%)
Self-reported family history of pulmonary fibrosis, n (%)
No 229 (87.4%)
Yes 33 (12.6%)
Definition of abbreviations: CTD UIP = usual interstitial pneumonia associated with connective tissue
disease; IPF = idiopathic pulmonary fibrosis; NSIP = nonspecific interstitial pneumonia.
ORIGINAL ARTICLE
84 American Journal of Respiratory and Critical Care Medicine Volume 196 Number 1
|
July 1 2017
therateofcasesubjectscarryinga
qualifying variant compared with the rate
of control subjects. For our study-wide
significance threshold, after B onferroni
correction f or the n umber of genes
tested across the six nonsyn onymous
models, the study-wide multiplicity-
adjusted significan ce threshold was
calculated as a = (0.05/[6 3 18,668]) =
4.46 3 10
27
(Table E4). We di d not
correct for the synonymous (negative
control) model.
Because of the discordance in sex-
sampling rates between the c ase (78% male)
and cont rol (48% male) cohorts, for genes
on the X chromosome, we randomly
sampled 565 control female subjects from
among the original female co ntrol group
and ran a separate X chromosome
assessment using m atched male/female
ratios. T hus, despite following the same
qualifying criteria as the auto somes,
the X chromosome tests are reported
separately.
To investigate the genetics of sporadic
IPF, we subsequently examined only the 186
(71.0%) case subjects con firmed to have IPF
on the basis of ATS/ERS/JRS/ALAT
guidelines (1) and without a family history
of pulmonary fibrosis. We repeated the
primary and LoF collapsing analyses to
compare just these 186 case subjects with
sporadic IPF with the 4,141 control
subjects.
Collapsing analyses were performed
using an in-house package, Analysis Tool
for Annotated Variants (https://redmine.
igm.cumc.columbia.edu/projects/atav).
Additional binomial analyses, logistic
regression analyses, and FETs were
completed using the ‘stats’ package in
R version 3.2.2 (R Foundation for Statistical
Computing, Vienna, Austria).
MUC5B Risk Allele Genotyping
Genotyping of rs35705950 within the
promoter region of the MUC5B gene
(chr11:g.1241221G.T, NCBI Build 37) was
performed with the Applied Biosystems
TaqMan SNP Genotyping Assay on a
7900HT Fast Real-Time PCR System
(Thermo Fisher Scientific, Foster City, CA).
The context sequence is CCTTCCTTTATC
TTCTGTTTTCAGC[G/T]CCTTCAACTG
TGAAGAGGTGAACTC. Amplification
was performed according to the TaqMan
Universal PCR protocol in a 5-ml
reaction volume using 23 TaqMan
Universal PCR Master Mix (Life
Technologies, Carlsbad, CA). Of the
262 case subjects with pulmonary fibrosis,
258 were successfully genotyped at this
locus. We also genotyped 342 European
control subjects to generate in-house
control frequency estimates for the
rs35705950 variant.
Results
Insights into the Genetic Architecture
of Pulmonary Fibrosis Using Exome
Sequencing
In our primary analysis of the pulmonary
fibrosis cohort (n = 262), we identified two
genes that achieved study-wide significance
(Figure 1, Tables E5 and E6). Five percent
of case subjects with pulmonary fibrosis
had a qualifying variant in TERT,a
well-known FPF gene, compared with 0.1%
of control subjects (odds ratio [OR], 35.9;
95% confidence interval [CI], 12.6–116.2;
P = 1.7 3 10
212
by two-tailed FET). The
second gene to achieve study-wide
12
AB
λ
= 0.980
λ
= 1.037
TERT
1.7×10
–12
PARN
2.5×10
–9
RTEL1
2.8×10
–7
TM4SF5
2.1×10
–4
MYSM1
1.7×10
–4
RTEL1
4.2×10
–8
PARN
1.5×10
–6
MYSM1
4.0×10
–4
OR51I1
2.1×10
–4
8.0×10
–4
TPI1
10
8
6
Observed – log
10
(p)
Observed – log
10
(p)
Expected – log
10
(p)
4
2
0
8
6
4
2
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Expected – log
10
(p)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Figure 1. Quantile–quantile plot of pulmonary fibrosis collapsing analyses. Results are shown for the analysis of 262 pulmonary fibrosis case and 4,141
control subjects. (A) A total of 15,393 genes had at least one case or control carrier. Qualifying variants have a minor allele frequency less than 0.05% in the
test cohort and are absent among external reference cohorts. Variants are annotated as loss-of-function, in-frame indel, or missense predicted to be
“probably damaging” by Polymorphism Phenotyping version 2 (HumDiv). Two genes, TERT and RTEL1, achieved study-wide significance (adjusted a =
[0.05/(6 3 18668)] = 4.46 3 10
27
). (B) A total of 10,710 genes had at least one loss-of-function case or control carrier. Qualifying variants are variants with
a population minor allele frequency less than or equal to 0.1% and are annotated as loss-of-function single-nucleotide variants or indels. PARN and RTEL1
achieved study-wide significance (P , 4.46 3 10
27
).
ORIGINAL ARTICLE
Petrovski, Todd, Durheim, et al.: Pulmonary Fibrosis: An Exome Sequencing Study 85
significance was the recently implicated
FPF gene RTEL1 (12, 17, 18), with 2.3% of
case subjects with pulmonary fibrosis
carrying a qualifying variant, as compared
with 0% of control subjects (OR, .96.8;
95% CI, 18.9 to .4,335; P = 4.2 3 10
28
by
two-tailed FET). The third ranked gene,
just under the prespeci fied level of
statistical significance, was another recently
implicated FPF gene, PARN (12). We
found PARN qualifying variants in 2.7% of
case subjects compared with 0.1% of
control subjects (OR, 22.7; 95% CI,
6.1–91.0; P = 1.5 3 10
26
by two-tailed FET)
(Figure 1). These results reflect the 95.7%
(TERT), 84.6% (RTEL1), and 98.9%
(PARN) of the protein-coding sequence of
these genes that had reliable sequence
coverage in both the case and control
samples (Table E5). No individual case
subject carried a qualifying variant in more
than one of these three genes, and a higher-
resolution cryptic relatedness screen
confirmed that no two case subjects
carrying one of four recurring TERT,
RTEL1,orPARN qualifying variants
(Table 2) shared more than 1% of their
exome-wide rare protein-coding variants
(Table E10).
The results derived from the LoF model
suggested haploinsufficiency as a leading
disease mechanism for both RTEL1 and
PARN (Tables E5 and E6), consistent with
earlier literature (12). Among our case
subjects, 2.7% had an LoF qualifying
variant in PARN, as compared with no
control carriers, enabling PARN to achieve
study-wide significance under the LoF
model (OR, .113.3; 95% CI, 23.2 to
.4,593; P = 2.4 3 10
29
by two-tailed FET).
RTEL1 was also significantly enriched for
LoF alleles in the case subjects (2.3% vs.
0.02%; OR, 96.7, 95% CI, 11.7–4,333; P =
2.8 3 10
27
by two-tailed FET). Although
TERT LoF alleles have been reported
to segregate with disease in familial
pulmonary fibrosis pedigrees (10, 11), our
present sample of 262 case subjects with
pulmonary fibrosis is not significantly
enriched for putative LoF TERT alleles (1 of
262 case subjects vs. 0 of 4,141 control
subjects; uncorrected P = 0.06) (Table E5).
No additional genes achieved study-wide
significance across the six nonsynonymous
models.
The cumulative findings for the three
study-wide significant genes (TERT, RTEL1,
and PARN) indicate that 11.8% of case
subjects (31 of 262) and 0.3% of control
subjects (12 of 4,141) carry a qualifying
variant (Figure 2, Table 2). This suggests
that approximately 11.5% of our case
subjects with pulmonary fibrosis could be
partially explained by the identified
qualifying variants (OR, 46.1; 95% CI,
22.6–99.5; P = 1.5 3 10
229
by two-tailed
FET). Indeed, given the 0.3% rate of
qualifying variation among control subjects,
we expected to see 0.76 (95% CI, 0.41–1.37)
carriers among 262 case subjects rather
than the observed 31 carriers. Comparison
of the 31 TERT, RTEL1,orPARN
qualifying variant carriers with the
remaining 231 noncarriers did not reveal
distinguishing clinical features. No
significant difference was found for the
average age at transplant among the 31
carriers (62.2 6 7.8 yr) compared with 231
noncarriers (63.3 6 8.3 yr) (P = 0.5 by two-
sample t test). No significant difference was
found when we assessed the proportion of
case subjects where the clinical pulmonary
fibrosis diagnosis was strictly IPF and not
usual interstitial pneumonia associated with
connective tissue disease or fibrosing
nonspecific interstitial pneumonia (29 of
31 carriers and 184 of 231 noncarriers;
P = 0.08 by FET), and no significant
difference was observed for sex, with the
male proportion among carriers being
74.2% (23 of 31) compared with that
among noncarriers at 78.4% (181 of 231)
(P = 0.6 by FET). A negative control
analysis was also performed, which
confirmed no enrichment of synonymous
genetic variation across the three study-
wide significant genes (1.1% of case subjects
vs. 0.9% of control subjects; P = 0.74 by
two-tailed FET).
The 262 case subjects accounted for
5.95% of the overall test cohort. Of the
autosomal synonymous qualifying variants
(neutral model), 5.85% were found to belong
to case subjects, which indicated a close
match between the proportion of
individuals in the test set who were case
subjects and the proportion of synonymous
(neutral) qualifying variants in the test set
that were found in case subjects (P = 0.77 by
binomial exact test). After establishing the
lack of case enrichment for synonymous
variation, we binned the collection of
variants found among the three study-wide
significant genes into various frequency
and effect bins to identify categories
that significantly departed from the
synonymous variation background rate.
Among these three genes, the two classes
that stood out were LoF annotated variants
(P =83 10
217
) and ultrarare missense
variants predicted to be “probably
damaging” by PolyPhen-2 (P =53 10
215
).
There was little additional signal
contributed by increasing the MAF to
0.1% or relaxing the in silico PolyPhen-2
criteria (Figure 3A, Table E7).
We also used a multivariate logistic
regression model to specifically assess the
relative contribution that variant effects and
allele frequency bins have on pulmonary
fibrosis risk among these three genes
(see M
ETHODS section in online supplement;
Figure 3B). This approach ensured that
each qualifying category is relative to the
same baseline category and is naturally
adjusted for variation in the other
categories (see M
ETHODS section in online
supplement). For missense variants, in
comparison with the risk c ontribution
from PolyPhen-2 “probably damaging”
variants that are ultrarare in the
population (OR, 30.7; 95% CI, 14.1–70.5;
P =3.03 10
217
), neither the “probably
damaging” missense variants that are more
common nor those predicted to be
nondamaging by PolyPhen-2 c ontributed
substantial additional disease risk
(Figure 3).
Subanalysis of Familial Pulmonary
Fibrosis and Sporadic Idiopathic
Pulmonary Fibrosis
Our cohort of 262 case subjects with
pulmonary fibrosis included 33 subjects
with FPF. We found that 8 (24.2%) of our
33 case subjects with FPF have a qualifying
variant in one of these three pulmonary
fibrosis genes, as compared with 0.3% of our
control cohort. The contribution of these
three genes to the genetics of this FPF group
is striking (24.2% vs. 0.3%; OR, 108.6;
95% CI, 35.4–323.3; P = 7.2 3 10
213
by
two-tailed FET).
To assess the genetic signal in a strictly
homogeneous sporadic IPF cohort, we
restricted our case cohort to the 186
individuals with no reported family history
of pulmonary fibrosis and who were
clinically confirmed to have IPF on the basis
of ATS/ERS/JRS/ALAT guidelines (1).
Comparison of these 186 IPF case subjects
with the 4,141 control subjects showed that
the three genes remained study-wide
significant in sporadic IPF (Figure E4,
Table E8). TERT achieved a P value of
1.7 3 10
29
on the basis of 4.8% of case
subjects with sporadic IPF carrying a
ORIGINAL ARTICLE
86 American Journal of Respiratory and Critical Care Medicine Volume 196 Number 1
|
July 1 2017