Home
/
Authors
/
Beifang Niu

Author

Beifang Niu

Other affiliations: Washington University in St. Louis, University of California, San Diego, University of Washington ...read more

Bio: Beifang Niu is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Medicine & Biology. The author has an hindex of 21, co-authored 48 publications receiving 17240 citations. Previous affiliations of Beifang Niu include Washington University in St. Louis & University of California, San Diego.

Topics: Medicine, Biology, Computer science, Cluster analysis, Computational biology ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2007

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Cd-hit

[...]

Limin Fu¹, Beifang Niu¹, Zhengwei Zhu¹, Sitao Wu¹, Weizhong Li¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

01 Dec 2012-Bioinformatics

TL;DR: A new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets to reduce sequence redundancy and improve the performance of other sequence analyses is developed.

...read moreread less

Abstract: Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ~24 cores and a quasi-linear speedup for up to ~8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

5,959 citations

Journal Article•DOI•

Comprehensive molecular characterization of gastric adenocarcinoma

[...]

Adam J. Bass¹, Vesteinn Thorsson², Ilya Shmulevich², Sheila Reynolds² +254 more•Institutions (32)

11 Sep 2014-Nature

TL;DR: A comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project is described and a molecular classification dividing gastric cancer into four subtypes is proposed.

...read moreread less

Abstract: Gastric cancer was the world’s third leading cause of cancer mortality in 2012, responsible for 723,000 deaths1. The vast majority of gastric cancers are adenocarcinomas, which can be further subdivided into intestinal and diffuse types according to the Lauren classification2. An alternative system, proposed by the World Health Organization, divides gastric cancer into papillary, tubular, mucinous (colloid) and poorly cohesive carcinomas3. These classification systems have little clinical utility, making the development of robust classifiers that can guide patient therapy an urgent priority. The majority of gastric cancers are associated with infectious agents, including the bacterium Helicobacter pylori4 and Epstein–Barr virus (EBV). The distribution of histological subtypes of gastric cancer and the frequencies of H. pylori and EBV associated gastric cancer vary across the globe5. A small minority of gastric cancer cases are associated with germline mutation in E-cadherin (CDH1)6 or mismatch repair genes7 (Lynch syndrome), whereas sporadic mismatch repair-deficient gastric cancers have epigenetic silencing of MLH1 in the context of a CpG island methylator phenotype (CIMP)8. Molecular profiling of gastric cancer has been performed using gene expression or DNA sequencing9–12, but has not led to a clear biologic classification scheme. The goals of this study by The Cancer Genome Atlas (TCGA) were to develop a robust molecular classification of gastric cancer and to identify dysregulated pathways and candidate drivers of distinct classes of gastric cancer.

...read moreread less

4,583 citations

Journal Article•DOI•

Mutational landscape and significance across 12 major cancer types

[...]

Cyriac Kandoth¹, Michael D. McLellan¹, Fabio Vandin², Kai Ye¹, Beifang Niu¹, Charles Lu¹, Mingchao Xie¹, Qunyuan Zhang¹, Joshua F. McMichael¹, Matthew A. Wyczalkowski¹, Mark D.M. Leiserson², Christopher A. Miller¹, John S. Welch¹, Matthew J. Walter¹, Michael C. Wendl¹, Timothy J. Ley, Richard K. Wilson¹, Benjamin J. Raphael², Li Ding - Show less +15 more•Institutions (2)

Washington University in St. Louis¹, Brown University²

17 Oct 2013-Nature

TL;DR: Data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types are presented as part of the TCGA Pan-Cancer effort, and clinical association analysis identifies genes having a significant effect on survival.

...read moreread less

Abstract: The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.

...read moreread less

3,658 citations

Journal Article•DOI•

CD-HIT Suite

[...]

Ying Huang¹, Beifang Niu¹, Ying Gao¹, Limin Fu¹, Weizhong Li¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

01 Mar 2010-Bioinformatics

TL;DR: A new web server, CD-HIT Suite, is developed for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels and users can now interactively explore the clusters within web browsers.

...read moreread less

Abstract: Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels. Availability: Free access at http://cd-hit.org Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

2,084 citations

Journal Article•DOI•

Pan-cancer analysis of whole genomes

[...]

Peter J. Campbell¹, Gad Getz², Jan O. Korbel³, Joshua M. Stuart⁴ +1329 more•Institutions (238)

06 Feb 2020-Nature

TL;DR: The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.

...read moreread less

Abstract: Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1,2,3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10,11,12,13,14,15,16,17,18.

...read moreread less

1,600 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

[...]

Paul J. McMurdie¹, Susan Holmes¹•Institutions (1)

Stanford University¹

22 Apr 2013-PLOS ONE

TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.

...read moreread less

Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

...read moreread less

11,272 citations

Journal Article•DOI•

PD-1 Blockade in Tumors with Mismatch-Repair Deficiency

[...]

Dung T. Le, Jennifer N. Uram¹, Hao Wang², Bjarne Bartlett³, Holly Kemberling, Aleksandra Eyring⁴, Andrew D. Skora⁵, Brandon Luber⁶, Nilofer S. Azad, Daniel A. Laheru, Barbara A. Biedrzycki, Ross C. Donehower, Atif Zaheer, George A. Fisher¹, Todd S. Crocenzi, James J. Lee³, Steven M. Duffy, Richard M. Goldberg⁵, Richard M. Goldberg⁴, Albert de la Chapelle⁴, Albert de la Chapelle⁵, Minori Koshiji⁶, Feriyl Bhaijee⁶, Thomas Huebner⁶, Ralph H. Hruban, Laura D. Wood, Nathan Cuka⁶, Drew M. Pardoll, Nickolas Papadopoulos, Kenneth W. Kinzler, Shibin Zhou, Toby C. Cornish, Janis M. Taube, Robert A. Anders, James R. Eshleman, Bert Vogelstein, Luis A. Diaz - Show less +33 more•Institutions (6)

Stanford University¹, Providence Health & Services², University of Pittsburgh³, Ohio State University⁴, Merck & Co.⁵, Johns Hopkins University⁶

24 Jun 2015-The New England Journal of Medicine

TL;DR: This study showed that mismatch-repair status predicted clinical benefit of immune checkpoint blockade with pembrolizumab, and high somatic mutation loads were associated with prolonged progression-free survival.

...read moreread less

Abstract: BackgroundSomatic mutations have the potential to encode “non-self” immunogenic antigens. We hypothesized that tumors with a large number of somatic mutations due to mismatch-repair defects may be susceptible to immune checkpoint blockade. MethodsWe conducted a phase 2 study to evaluate the clinical activity of pembrolizumab, an anti–programmed death 1 immune checkpoint inhibitor, in 41 patients with progressive metastatic carcinoma with or without mismatch-repair deficiency. Pembrolizumab was administered intravenously at a dose of 10 mg per kilogram of body weight every 14 days in patients with mismatch repair–deficient colorectal cancers, patients with mismatch repair–proficient colorectal cancers, and patients with mismatch repair–deficient cancers that were not colorectal. The coprimary end points were the immune-related objective response rate and the 20-week immune-related progression-free survival rate. ResultsThe immune-related objective response rate and immune-related progression-free survival ...

...read moreread less

6,835 citations

Journal Article•DOI•

Salmon provides fast and bias-aware quantification of transcript expression

[...]

Rob Patro¹, Geet Duggal, Michael I. Love², Rafael A. Irizarry², Carl Kingsford³ - Show less +1 more•Institutions (3)

Stony Brook University¹, Harvard University², Carnegie Mellon University³

01 Apr 2017-Nature Methods

TL;DR: Salmon is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

...read moreread less

Abstract: We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA-seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

...read moreread less

6,095 citations

Journal Article•DOI•

Cd-hit

[...]

Limin Fu¹, Beifang Niu¹, Zhengwei Zhu¹, Sitao Wu¹, Weizhong Li¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

01 Dec 2012-Bioinformatics

...read moreread less

5,959 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse