Home
/
Authors
/
Hélène Touzet

Author

Hélène Touzet

Other affiliations: Laboratoire d'Informatique Fondamentale de Lille, Centre national de la recherche scientifique, Lille University of Science and Technology ...read more

Bio: Hélène Touzet is an academic researcher from university of lille. The author has contributed to research in topics: Edit distance & RNA. The author has an hindex of 19, co-authored 59 publications receiving 2399 citations. Previous affiliations of Hélène Touzet include Laboratoire d'Informatique Fondamentale de Lille & Centre national de la recherche scientifique.

Topics: Edit distance, RNA, Genome project, Metagenomics, Multiple sequence alignment ...read more

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2001
1999
1998
1997
1996

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data.

[...]

Evguenia Kopylova¹, Laurent Noé¹, Hélène Touzet¹•Institutions (1)

Laboratoire d'Informatique Fondamentale de Lille¹

01 Dec 2012-Bioinformatics

TL;DR: SortMeRNA, a new software designed to rapidly filter rRNA fragments from metatranscriptomic data, is presented, capable of handling large sets of reads and sorting out all fragments matching to the rRNA database with high sensitivity and low running time.

...read moreread less

Abstract: MOTIVATION: The application of Next-Generation Sequencing (NGS) technologies to RNAs directly extracted from a community of organisms yields a mixture of fragments characterizing both coding and non-coding types of RNAs. The tasks to distinguish among these and to further categorize the families of messenger RNAs and ribosomal RNAs is an important step for examining gene expression patterns of an interactive environment and the phylogenetic classification of the constituting species. RESULTS: We present SortMeRNA, a new software designed to rapidly filter ribosomal RNA fragments from metatranscriptomic data. It is capable of handling large sets of reads and sorting out all fragments matching to the rRNA database with high sensitivity and low running time. AVAILABILITY: http://bioinfo.lifl.fr/RNA/sortmerna CONTACT: evguenia.kopylova@lifl.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

...read moreread less

1,868 citations

Journal Article•DOI•

Efficient and accurate P-value computation for Position Weight Matrices

[...]

Hélène Touzet¹, Hélène Touzet², Jean-Stéphane Varré², Jean-Stéphane Varré¹•Institutions (2)

Centre national de la recherche scientifique¹, French Institute for Research in Computer Science and Automation²

11 Dec 2007-Algorithms for Molecular Biology

TL;DR: A novel algorithm is described that achieves better performance in terms of computational time and precision than existing tools and is capable of calculating the exact P-value without any error even for matrices with non-integer coefficient values.

...read moreread less

Abstract: Background: Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. Results: The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with noninteger coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available. Conclusion: We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.

...read moreread less

129 citations

Journal Article•DOI•

Algorithms with polynomial interpretation termination proof

[...]

Guillaume Bonfante, Adam Cichon, Jean-Yves Marion, Hélène Touzet

01 Jan 2001-Journal of Functional Programming

TL;DR: It is demonstrated that functions with exponential interpretation termination proofs are super-elementary, because they turn out to be exactly the deterministic (resp. non-deterministic) polynomial time, linear exponential time and linear doubly exponential time computable functions.

...read moreread less

Abstract: We study the effect of polynomial interpretation termination proofs of deterministic (resp. non-deterministic) algorithms defined by con uent (resp. non-con uent) rewrite systems over data structures which include strings, lists and trees, and we classify them according to the interpretations of the constructors. This leads to the definition of six function classes which turn out to be exactly the deterministic (resp. non-deterministic) polynomial time, linear exponential time and linear doubly exponential time computable functions when the class is based on con uent (resp. non-con uent) rewrite systems. We also obtain a characterisation of the linear space computable functions. Finally, we demonstrate that functions with exponential interpretation termination proofs are super-elementary.

...read moreread less

92 citations

Journal Article•DOI•

CARNAC: folding families of related RNAs.

[...]

Hélène Touzet¹, Olivier Perriquet¹•Institutions (1)

Laboratoire d'Informatique Fondamentale de Lille¹

01 Jul 2004-Nucleic Acids Research

TL;DR: This work presents a tool for the prediction of conserved secondary structure elements of a family of homologous non-coding RNAs that successfully applies to datasets with low primary structure similarity and does not require any prior multiple sequence alignment.

...read moreread less

Abstract: We present a tool for the prediction of conserved secondary structure elements of a family of homologous non-coding RNAs. Our method does not require any prior multiple sequence alignment. Thus, it successfully applies to datasets with low primary structure similarity. The functionality is demonstrated using three example datasets: sequences of RNase P RNAs, ciliate telomerases and enterovirus messenger RNAs. CARNAC has a web server that can be accessed at the URL http://bioinfo.lifl.fr/carnac.

...read moreread less

78 citations

Journal Article•DOI•

Finding the common structure shared by two homologous RNAs.

[...]

Olivier Perriquet¹, Hélène Touzet¹, Max Dauchet¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jan 2003-Bioinformatics

TL;DR: It is shown that CARNAC provides a good partial prediction for a wide range of sequences and in presence of a whole family of sequences, can be used to detect whether the sequences actually share the same structure.

...read moreread less

Abstract: Motivation: CARNAC is a new method for pairwise folding of RNA sequences. The program takes into account local similarity, stem energy, and covariations to produce the common folding. It can handle all RNA types, and has also been adapted to align a new homologous sequence along ar eference structured sequence. Results: Using different data sets, we show that CARNAC provides a good partial prediction for a wide range of sequences (16S ssu rRNA, RNase P RNA, viruses) with only two sequences. In presence of a whole family of sequences, we also show that CARNAC can be used to detect whether the sequences actually share the same structure. Availability: CARNAC is available at the URL http://www. lifl.fr/∼perrique/rna/

...read moreread less

67 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Mapping and analysis of chromatin state dynamics in nine human cell types

[...]

Jason Ernst¹, Pouya Kheradpour¹, Pouya Kheradpour², Tarjei S. Mikkelsen¹, Noam Shoresh¹, Lucas D. Ward², Lucas D. Ward¹, Charles B. Epstein¹, Xiaolan Zhang¹, Li Wang¹, Robbyn Issner¹, Michael Coyne¹, Manching Ku¹, Manching Ku³, Manching Ku⁴, Timothy Durham¹, Manolis Kellis², Manolis Kellis¹, Bradley E. Bernstein³, Bradley E. Bernstein⁴, Bradley E. Bernstein¹ - Show less +17 more•Institutions (4)

Broad Institute¹, Massachusetts Institute of Technology², Howard Hughes Medical Institute³, Harvard University⁴

05 May 2011-Nature

TL;DR: This study presents a general framework for deciphering cis-regulatory connections and their roles in disease, and maps nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions.

...read moreread less

Abstract: Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Here we map nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We use correlations between these profiles to link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease single nucleotide polymorphisms are frequently positioned within enhancer elements specifically active in relevant cell types, and in some cases affect a motif instance for a predicted regulator, thus suggesting a mechanism for the association. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease.

...read moreread less

2,646 citations

Journal Article•DOI•

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin

[...]

Nicholas A. Bokulich¹, Benjamin D. Kaehler², Jai Ram Rideout¹, Matthew R. Dillon¹, Evan Bolyen¹, Rob Knight³, Gavin A. Huttley², J. Gregory Caporaso¹ - Show less +4 more•Institutions (3)

Northern Arizona University¹, Australian National University², University of California, San Diego³

17 May 2018-Microbiome

TL;DR: The results illustrate the importance of parameter tuning for optimizing classifier performance, and the recommendations regarding parameter choices for these classifiers under a range of standard operating conditions are made.

...read moreread less

Abstract: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

...read moreread less

2,475 citations

Journal Article•DOI•

Annotation of functional variation in personal genomes using RegulomeDB

[...]

Alan P. Boyle¹, Eurie L. Hong¹, Manoj Hariharan¹, Yong Cheng¹, Marc A. Schaub¹, Maya Kasowski¹, Konrad J. Karczewski¹, Julie Park¹, Benjamin C. Hitz¹, Shuai Weng¹, J. Michael Cherry¹, Michael Snyder¹ - Show less +8 more•Institutions (1)

Stanford University¹

01 Sep 2012-Genome Research

TL;DR: A novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome, which includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants.

...read moreread less

Abstract: As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

...read moreread less

2,355 citations

Integrative Genomics Viewer

[...]

James T. Robinson¹, Helga Thorvaldsdottir¹, Wendy Winckler¹, Mitchell Guttman¹, Eric S. Lander², Eric S. Lander¹, Gad Getz¹, Jill P. Mesirov¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

01 Jan 2011

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

...read moreread less

2,187 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse