Home
/
Authors
/
Merja Oja

Author

Merja Oja

VTT Technical Research Centre of Finland

Other affiliations: Helsinki University of Technology, SK Group, University of Helsinki ...read more

Bio: Merja Oja is an academic researcher from VTT Technical Research Centre of Finland. The author has contributed to research in topics: Trichoderma reesei & Saccharomyces cerevisiae. The author has an hindex of 15, co-authored 30 publications receiving 1296 citations. Previous affiliations of Merja Oja include Helsinki University of Technology & SK Group.

Papers

PDF

Open Access

More filters

Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum

[...]

Merja Oja, Samuel Kaski, Teuvo Kohonen

01 Jan 2003

TL;DR: This work has provided a keyword index to help finding articles of interest, and additionally a modern automatically constructed variant of a thematic index: a WEBSOM interface to the whole article collection of years 1981-2000.

...read moreread less

Abstract: The Self-Organizing Map (SOM) algorithm has attracted a great deal of interest among researches and practitioners in a wide variety of fields. The SOM has been analyzed extensively, a number of variants have been developed and, perhaps most notably, it has been applied extensively within fields ranging from engineering sciences to medicine, biology, and economics. We have collected a comprehensive list of 5384 scientific papers that use the algorithms, have benefited from them, or contain analyses of them. The list is intended to serve as a source for literature surveys. The present addendum contains 2092 new articles, mainly from the years 1998-2002. We have provided a keyword index to help finding articles of interest, and additionally a modern automatically constructed variant of a thematic index: a WEBSOM interface to the whole article collection of years 1981-2000. The SOM of SOMs is available at http://websom.hut.fi/websom/somref/search.cgi for browsing and searching the collection.

...read moreread less

402 citations

Journal Article•DOI•

Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates.

[...]

Mari Häkkinen¹, Mikko Arvas¹, Merja Oja¹, Nina Aro¹, Merja Penttilä¹, Markku Saloheimo¹, Tiina Pakula¹ - Show less +3 more•Institutions (1)

VTT Technical Research Centre of Finland¹

04 Oct 2012-Microbial Cell Factories

TL;DR: Comparison of the expression profiles of the CAZyme genes under the different conditions identified co-regulated groups of genes, suggesting common regulatory mechanisms for the gene groups.

...read moreread less

Abstract: Trichoderma reesei is a soft rot Ascomycota fungus utilised for industrial production of secreted enzymes, especially lignocellulose degrading enzymes. About 30 carbohydrate active enzymes (CAZymes) of T. reesei have been biochemically characterised. Genome sequencing has revealed a large number of novel candidates for CAZymes, thus increasing the potential for identification of enzymes with novel activities and properties. Plenty of data exists on the carbon source dependent regulation of the characterised hydrolytic genes. However, information on the expression of the novel CAZyme genes, especially on complex biomass material, is very limited. In this study, the CAZyme gene content of the T. reesei genome was updated and the annotations of the genes refined using both computational and manual approaches. Phylogenetic analysis was done to assist the annotation and to identify functionally diversified CAZymes. The analyses identified 201 glycoside hydrolase genes, 22 carbohydrate esterase genes and five polysaccharide lyase genes. Updated or novel functional predictions were assigned to 44 genes, and the phylogenetic analysis indicated further functional diversification within enzyme families or groups of enzymes. GH3 β-glucosidases, GH27 α-galactosidases and GH18 chitinases were especially functionally diverse. The expression of the lignocellulose degrading enzyme system of T. reesei was studied by cultivating the fungus in the presence of different inducing substrates and by subjecting the cultures to transcriptional profiling. The substrates included both defined and complex lignocellulose related materials, such as pretreated bagasse, wheat straw, spruce, xylan, Avicel cellulose and sophorose. The analysis revealed co-regulated groups of CAZyme genes, such as genes induced in all the conditions studied and also genes induced preferentially by a certain set of substrates. In this study, the CAZyme content of the T. reesei genome was updated, the discrepancies between the different genome versions and published literature were removed and the annotation of many of the genes was refined. Expression analysis of the genes gave information on the enzyme activities potentially induced by the presence of the different substrates. Comparison of the expression profiles of the CAZyme genes under the different conditions identified co-regulated groups of genes, suggesting common regulatory mechanisms for the gene groups.

...read moreread less

160 citations

Journal Article•DOI•

Trustworthiness and metrics in visualizing similarity of gene expression.

[...]

Samuel Kaski¹, Janne Nikkilä¹, Merja Oja¹, Jarkko Venna¹, Petri Törönen², Eero Castrén³, Eero Castrén² - Show less +3 more•Institutions (3)

Helsinki University of Technology¹, University of Eastern Finland², University of Helsinki³

13 Oct 2003-BMC Bioinformatics

TL;DR: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data.

...read moreread less

Abstract: Background: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, selforganizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. Results: The trustworthiness of hierarchical clustering, multidimensional scaling, and the selforganizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. Conclusions: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.

...read moreread less

151 citations

Journal Article•DOI•

Comparative Genome-Scale Reconstruction of Gapless Metabolic Networks for Present and Ancestral Species

[...]

Esa Pitkänen¹, Paula Jouhten², Jian Hou¹, Muhammad Fahad Syed², Peter Blomberg², Jana Kludas³, Merja Oja², Liisa Holm¹, Merja Penttilä², Juho Rousu³, Mikko Arvas² - Show less +7 more•Institutions (3)

University of Helsinki¹, VTT Technical Research Centre of Finland², Aalto University³

06 Feb 2014-PLOS Computational Biology

TL;DR: This work introduces a novel computational approach, CoReCo, for comparative metabolic reconstruction and provides genome-scale metabolic network models for 49 important fungal species, demonstrating the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment.

...read moreread less

Abstract: We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.

...read moreread less

113 citations

Journal Article•DOI•

Array comparative genomic hybridization analysis of Trichoderma reesei strains with enhanced cellulase production properties

[...]

Marika Vitikainen¹, Mikko Arvas¹, Tiina Pakula¹, Merja Oja¹, Merja Penttilä¹, Markku Saloheimo¹ - Show less +2 more•Institutions (1)

VTT Technical Research Centre of Finland¹

19 Jul 2010-BMC Genomics

TL;DR: This study characterized genomic alterations in high-producing mutants of T. reesei by high-resolution array comparative genomic hybridization to obtain genome-wide information which could be utilized for better understanding of the mechanisms underlying efficient cellulase production, and would enable targeted genetic engineering for improved production of proteins in general.

...read moreread less

Abstract: Background: Trichoderma reesei is the main industrial producer of cellulases and hemicellulases that are used to depolymerize biomass in a variety of biotechnical applications. Many of the production strains currently in use have been generated by classical mutagenesis. In this study we characterized genomic alterations in high-producing mutants of T. reesei by high-resolution array comparative genomic hybridization (aCGH). Our aim was to obtain genome-wide information which could be utilized for better understanding of the mechanisms underlying efficient cellulase production, and would enable targeted genetic engineering for improved production of proteins in general. Results: We carried out an aCGH analysis of four high-producing strains (QM9123, QM9414, NG14 and Rut-C30) using the natural isolate QM6a as a reference. In QM9123 and QM9414 we detected a total of 44 previously undocumented mutation sites including deletions, chromosomal translocation breakpoints and single nucleotide mutations. In NG14 and Rut-C30 we detected 126 mutations of which 17 were new mutations not documented previously. Among these new mutations are the first chromosomal translocation breakpoints identified in NG14 and Rut-C30. We studied the effects of two deletions identified in Rut-C30 (a deletion of 85 kb in the scaffold 15 and a deletion in a gene encoding a transcription factor) on cellulase production by constructing knock-out strains in the QM6a background. Neither the 85 kb deletion nor the deletion of the transcription factor affected cellulase production. Conclusions: aCGH analysis identified dozens of mutations in each strain analyzed. The resolution was at the level of single nucleotide mutation. High-density aCGH is a powerful tool for genome-wide analysis of organisms with small genomes e.g. fungi, especially in studies where a large set of interesting strains is analyzed.

...read moreread less

95 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

The Self-Organizing Map

[...]

Teuvo Kohonen¹•Institutions (1)

Helsinki University of Technology¹

01 Jan 1990

TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.

...read moreread less

Abstract: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.

...read moreread less

2,933 citations

Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

[...]

Brian J. Haas, Steven L. Salzberg, Wei Zhu, Mihaela Pertea, Jonathan E. Allen, Joshua Orvis, Owen White, C R Buell, Jennifer R. Wortman - Show less +5 more

10 Dec 2007

TL;DR: The experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

...read moreread less

Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

...read moreread less

1,528 citations

Journal Article•DOI•

Opportunities and obstacles for deep learning in biology and medicine.

[...]

Travers Ching¹, Daniel Himmelstein², Brett K. Beaulieu-Jones², Alexandr A. Kalinin³, Brian T. Do⁴, Gregory P. Way², Enrico Ferrero⁵, Paul-Michael Agapow⁶, Michael Zietz², Michael M. Hoffman⁷, Michael M. Hoffman⁸, Wei Xie⁹, Gail L. Rosen¹⁰, Benjamin J. Lengerich¹¹, Johnny Israeli¹², Jack Lanchantin¹³, Stephen Woloszynek¹⁰, Anne E. Carpenter¹⁴, Avanti Shrikumar¹², Jinbo Xu¹⁵, Evan M. Cofer¹⁶, Evan M. Cofer¹⁷, Christopher A. Lavender¹⁸, Srinivas C. Turaga¹⁹, Amr Alexandari¹², Zhiyong Lu¹⁸, David J. Harris²⁰, Dave DeCaprio, Yanjun Qi¹³, Anshul Kundaje¹², Yifan Peng¹⁸, Laura K. Wiley²¹, Marwin H. S. Segler²², Simina M. Boca²³, S. Joshua Swamidass²⁴, Austin Huang²⁵, Anthony Gitter²⁶, Anthony Gitter²⁷, Casey S. Greene² - Show less +35 more•Institutions (27)

University of Hawaii at Manoa¹, University of Pennsylvania², University of Michigan³, Harvard University⁴, GlaxoSmithKline⁵, Imperial College London⁶, University of Toronto⁷, Princess Margaret Cancer Centre⁸, Vanderbilt University⁹, Drexel University¹⁰, Carnegie Mellon University¹¹, Stanford University¹², University of Virginia¹³, Broad Institute¹⁴, Toyota Technological Institute at Chicago¹⁵, Princeton University¹⁶, Trinity University¹⁷, National Institutes of Health¹⁸, Howard Hughes Medical Institute¹⁹, University of Florida²⁰, University of Colorado Denver²¹, University of Münster²², Georgetown University Medical Center²³, Washington University in St. Louis²⁴, Brown University²⁵, University of Wisconsin-Madison²⁶, Morgridge Institute for Research²⁷

01 Apr 2018-Journal of the Royal Society Interface

TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.

...read moreread less

Abstract: Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

...read moreread less

1,491 citations

Journal Article•DOI•

Essentials of the self-organizing map

[...]

Teuvo Kohonen¹•Institutions (1)

Aalto University¹

01 Jan 2013-Neural Networks

TL;DR: The self-organizing map (SOM) is an automatic data-analysis method widely applied to clustering problems and data exploration in industry, finance, natural sciences, and linguistics and can be found in the management of massive textual databases and in bioinformatics.

...read moreread less

1,079 citations

Computational Analysis of Microarray Data

[...]

Partha S. Vasisht

01 Jan 2003

898 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse