scispace - formally typeset
Search or ask a question
Author

Sophie Brouillet

Bio: Sophie Brouillet is an academic researcher from Pierre-and-Marie-Curie University. The author has contributed to research in topics: Synthetic lethality & Protein structure database. The author has an hindex of 6, co-authored 11 publications receiving 193 citations. Previous affiliations of Sophie Brouillet include Centre national de la recherche scientifique.

Papers
More filters
Journal ArticleDOI
01 Oct 2005-Proteins
TL;DR: YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure that searches for the longest common substructures called SHSPs existing between a query structure and every structure in the structural database.
Abstract: YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (α angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query–database structure pair; and (5) ranking the query–database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches. Proteins 2005. © 2005 Wiley-Liss, Inc.

81 citations

Posted ContentDOI
13 Nov 2015-bioRxiv
TL;DR: AGELLAN is a web-based graphical software to explore small fitness/energy landscapes through dynamic visualization and quantitative measures that can be used to explore input custom landscapes, previously published experimental landscapes or randomly generated model landscapes.
Abstract: In a fitness landscape, fitness values are associated to all genotypes corresponding to several, potentially all, combinations of a set of mutations. In the last decade, many small experimental fitness landscapes have been partially or completely resolved, and more will likely follow. MAGELLAN is a web-based graphical software to explore small fitness/energy landscapes through dynamic visualization and quantitative measures. It can be used to explore input custom landscapes, previously published experimental landscapes or randomly generated model landscapes.

24 citations

Journal ArticleDOI
TL;DR: An algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual, which can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies and red deers.
Abstract: Microsatellites are widely used in population genetics to uncover recent evolutionary events. They are typically genotyped using capillary sequencer, which capacity is usually limited to 9, at most 12 loci for each run, and which analysis is a tedious task that is performed by hand. With the rise of next-generation sequencing (NGS), a much larger number of loci and individuals are available from sequencing: for example, on a single run of a GS Junior, 28 loci from 96 individuals are sequenced with a 30X cover. We have developed an algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual (e.g. specific PCR amplifications of a locus or a collection of reads that encompass a locus of interest). As the sequencing and the PCR amplification introduce artefactual insertions or deletions, the set of reads from a single microsatellite allele shows several length variants. The algorithm infers, without alignment, the true unknown allele(s) of each individual from the observed distributions of microsatellites length of all individuals. MicNeSs, a python implementation of the algorithm, can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies (a model species) and red deers (a nonmodel species). Without any parallelization, it automatically genotypes 22 loci from 441 individuals in 11 hours on a standard computer. The comparison of MicNeSs inferences to the standard method shows an excellent agreement, with some differences illustrating the pros and cons of both methods.

22 citations

Journal ArticleDOI
14 Dec 2012-PLOS ONE
TL;DR: This work proposes a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome, which generates large mitogenome datasets.
Abstract: Background: Researchers sorely need markers and approaches for biodiversity exploration (both specimen linked and metagenomics) using the full potential of next generation sequencing technologies (NGST). Currently, most studies rely on expensive multiple tagging, PCR primer universality and/or the use of few markers, sometimes with insufficient variability. Methodology/Principal Findings: We propose a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. It relies on the properties of metazoan mitogenomes for enrichment, on careful choice of the organisms to multiplex, as well as on the wide collection of accumulated mitochondrial reference datasets for post-sequencing sorting and identification instead of individual tagging. Multiple divergent organisms can be sequenced simultaneously, and their complete mitogenome obtained at a very low cost. We provide in silico testing of dataset assembly for a selected set of example datasets. Conclusions/Significance: This approach generates large mitogenome datasets. These sequences are useful for phylogenetics, molecular identification and molecular ecology studies, and are compatible with all existing projects or available datasets based on mitochondrial sequences, such as the Barcode of Life project. Our method can yield sequences both from identified samples and metagenomic samples. The use of the same datasets for both kinds of studies makes for a powerful approach, especially since the datasets have a high variability even at species level, and would be a useful complement to the less variable 18S rDNA currently prevailing in metagenomic studies.

18 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.
Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

38,522 citations

Journal ArticleDOI
TL;DR: A computational method that facilitates the analysis and objective prediction of mitochondrially imported proteins has been developed and it is revealed that many of the unknown yeast open reading frames that might be mitochondrial proteins have been predicted and are clustered.
Abstract: Most of the proteins that are used in mitochondria are imported through the double membrane of the organelle. The information that guides the protein to mitochondria is contained in its sequence and structure, although no direct evidence can be obtained. In this article, discriminant analysis has been performed with 47 parameters and a large set of mitochondrial proteins extracted from the SwissProt database. A computational method that facilitates the analysis and objective prediction of mitochondrially imported proteins has been developed. If only the amino acid sequence is considered, 75-97% of the mitochondrial proteins studied have been predicted to be imported into mitochondria. Moreover, the existence of mitochondrial-targeting sequences is predicted in 76 -94 % of the analyzed mitochondrial precursor proteins. As a practical application, the number of unknown yeast open reading frames that might be mitochondrial proteins has been predicted, which revealed that many of them are clustered.

1,668 citations

Journal ArticleDOI
TL;DR: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment that simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.
Abstract: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties, which is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into subgroups on the basis of sequence similarity, functional, evolutionary or other criteria. All pairs of subgroups are then compared to highlight positions that confer the unique features of each subgroup. The algorithm is encoded in the computer program AMAS (Analysis of Multiply Aligned Sequences) which provides a textual summary of the analysis and an annotated (boxed, shaded and/or coloured) multiple sequence alignment. The algorithm is illustrated by application to an alignment of 67 SH2 domains where patterns of conserved hydrophobic residues that constitute the protein core are highlighted. The analysis of charge conservation across annexin domains identifies the locations at which conserved charges change sign. The algorithm simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

649 citations

Journal ArticleDOI
TL;DR: This work has shown that structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments.

390 citations

Journal ArticleDOI
TL;DR: The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by the method are in good agreement with the best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions.
Abstract: Background In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment.

264 citations