scispace - formally typeset
Search or ask a question

Showing papers by "Jens Allmer published in 2012"


Journal ArticleDOI
TL;DR: This work focuses on ab initio prediction methods throughout; and therefore homology-based miRNA detection methods are not discussed.
Abstract: MicroRNAs are small RNA sequences of 18-24 nucleotides in length, which serve as templates to drive post transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing via the Microprocessor complex, yielding a hairpin structure. Which is then exported into the cytosol where it is processed by Dicer and then incorporated into the RNA induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, their modes of action are just beginning to be elucidated and therefore computational prediction algorithms cannot model the process but are usually forced to employ machine learning approaches. This work focuses on ab initio prediction methods throughout; and therefore homology-based miRNA detection methods are not discussed. Current ab initio prediction algorithms, their ties to data mining, and their prediction accuracy are detailed.

37 citations


Journal ArticleDOI
TL;DR: Current available free software, which either allow analysis of PTM or are easily adaptable for this purpose, is briefly reviewed in this paper and shall be used to highlight the current ability to quantitate PTMs.
Abstract: Mass spectrometry (MS)-based proteomics, by itself, is a vast and complex area encompassing various mass spectrometers, different spectra, and search result representations. When the aim is quantitation performed in different scanning modes at different MS levels, matters become additionally complex. Quantitation of post-translational modifications (PTM) represents the greatest challenge among these endeavors. Many different approaches to quantitation have been described and some of these can be directly applied to the quantitation of PTMs. The amount of data produced via MS, however, makes manual data interpretation impractical. Therefore, specialized software tools meet this challenge. Any software currently able to quantitate differentially labeled samples may theoretically be adapted to quantitate differential PTM expression among samples as well. Due to the heterogeneity of mass spectrometry-based proteomics; this review will focus on quantitation of PTM using liquid chromatography followed by one or more stages of mass spectrometry. Currently available free software, which either allow analysis of PTM or are easily adaptable for this purpose, is briefly reviewed in this paper. Selected studies, especially those related to phosphoproteomics, shall be used to highlight the current ability to quantitate PTMs.

19 citations


Journal ArticleDOI
28 Oct 2012
TL;DR: The field of mass spectrometry-based proteomics is introduced and the expectations of a well-designed benchmark dataset are defined and the current situation is compared to this ideal.
Abstract: Proteomics is a quickly developing field. New and better mass spectrometers, the platform of choice in proteomics, are being introduced frequently. New algorithms for the analysis of mass spectrometric data and assignment of amino acid sequence to tandem mass spectra are also presented on a frequent basis. Unfortunately, the best application area for these algorithms cannot be established at the moment. Furthermore, even the accuracy of the algorithms and their relative performance cannot be established. This is due to the lack of proper benchmark data. This letter first introduces the field of mass spectrometry-based proteomics and then defines the expectations of a well-designed benchmark dataset. Thereafter, the current situation is compared to this ideal. A call for the creation of a proper benchmark dataset is then placed and it is explained how measurement should be performed. Finally, the benefits for the research community are highlighted.

12 citations


Proceedings ArticleDOI
19 Apr 2012
TL;DR: This work designed a standard file format for the representation of de novo sequencing results and developed an application programming interface since it identified the lack of proper APIs as another obstacle, introducing a needlessly high learning curve for developers.
Abstract: Proteomics is the study of the proteins that can be derived from a genome. For the identification and sequencing of proteins, mass spectrometry has become the tool of choice. Within mass spectrometry-based proteomics, proteins can be identified or sequenced by either database search or de novo sequencing. Both methods have certain advantages and drawbacks but in the long run we envision de novo sequencing to become the predominant tool. Currently, de novo sequencing results are stored in arbitrary file formats, depending on the developers of the algorithms. We identified this as a large and unnecessary obstacle while integrating results from multiple de novo sequencing algorithms. Therefore, we designed a standard file format for the representation of de novo sequencing results. We further developed an application programming interface since we identified the lack of proper APIs as another obstacle, introducing a needlessly high learning curve for developers.

2 citations


Proceedings ArticleDOI
19 Apr 2012
TL;DR: A workaround has been presented before, but the process was able to simplify the process and an implementation is provided and the amount of contamination found in the EST sequences available on NCBI is analyzed.
Abstract: DNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI.

1 citations