Home
/
Authors
/
Andrey Zorin

Author

Andrey Zorin

Bio: Andrey Zorin is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Functional genomics & Gene expression profiling. The author has an hindex of 4, co-authored 4 publications receiving 425 citations. Previous affiliations of Andrey Zorin include Harvard University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Gene Expression Atlas at the European Bioinformatics Institute

[...]

Misha Kapushesky¹, Ibrahim Emam, Ele Holloway, Pavel Kurnosov, Andrey Zorin, James Malone, Gabriella Rustici, Eleanor Williams, Helen Parkinson, Alvis Brazma - Show less +6 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2010-Nucleic Acids Research

TL;DR: The Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions.

...read moreread less

Abstract: The Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive of Functional Genomics Data. A simple interface allows the user to query for differential gene expression either (i) by gene names or attributes such as Gene Ontology terms, or (ii) by biological conditions, e.g. diseases, organism parts or cell types. The gene queries return the conditions where expression has been reported, while condition queries return which genes are reported to be expressed in these conditions. A combination of both query types is possible. The query results are ranked using various statistical measures and by how many independent studies in the database show the particular gene-condition association. Currently, the database contains information about more than 200 000 genes from nine species and almost 4500 biological conditions studied in over 30 000 assays from over 1000 independent studies.

...read moreread less

240 citations

Journal Article•DOI•

Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments

[...]

Misha Kapushesky¹, Tomasz Adamusiak¹, Tony Burdett¹, Aedín C. Culhane¹, Anna Farne¹, Alexey Filippov¹, Ele Holloway¹, Andrey Klebanov¹, Nataliya Kryvych¹, Natalja Kurbatova¹, Pavel Kurnosov¹, James Malone¹, Olga Melnichuk¹, Robert Petryszak¹, Nikolay Pultsin¹, Gabriella Rustici¹, Andrew Tikhonov¹, Ravensara S. Travillian¹, Eleanor Williams¹, Andrey Zorin¹, Helen Parkinson¹, Alvis Brazma¹ - Show less +18 more•Institutions (1)

Harvard University¹

01 Jan 2012-Nucleic Acids Research

TL;DR: The Gene Expression Atlas is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions.

...read moreread less

Abstract: Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19,014 biological conditions in 136,551 assays from 5598 independent studies.

...read moreread less

166 citations

Journal Article•DOI•

Quantifying the impact of public omics data

[...]

Yasset Perez-Riverol¹, Andrey Zorin¹, Gaurhari Dass¹, Manh Tu Vu¹, Pan Xu², Mihai Glont¹, Juan Antonio Vizcaíno¹, Andrew F. Jarnuczak¹, Robert Petryszak¹, Peipei Ping³, Henning Hermjakob¹, Henning Hermjakob² - Show less +8 more•Institutions (3)

European Bioinformatics Institute¹, Protein Sciences², University of California, Los Angeles³

05 Aug 2019-Nature Communications

TL;DR: A set of metrics to quantify the attention and impact of biomedical datasets are developed and integrated into the framework of Omics Discovery Index (OmicsDI).

...read moreread less

Abstract: The amount of omics data in the public domain is increasing every year. Modern science has become a data-intensive discipline. Innovative solutions for data management, data sharing, and for discovering novel datasets are therefore increasingly required. In 2016, we released the first version of the Omics Discovery Index (OmicsDI) as a light-weight system to aggregate datasets across multiple public omics data resources. OmicsDI aggregates genomics, transcriptomics, proteomics, metabolomics and multiomics datasets, as well as computational models of biological processes. Here, we propose a set of novel metrics to quantify the attention and impact of biomedical datasets. A complete framework (now integrated into OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we propose a set of recommendations for authors, journals and data resources to promote an optimal quantification of the impact of datasets. Increasing amount of public omics data are important and valuable resources for the research community. Here, the authors develop a set of metrics to quantify the attention and impact of biomedical datasets and integrate them into the framework of Omics Discovery Index (OmicsDI).

...read moreread less

72 citations

Posted Content•DOI•

Quantifying the impact of public omics data

[...]

Yasset Perez-Riverol¹, Andrey Zorin¹, Gaurhari Dass¹, Mihai Glont¹, Juan Antonio Vizcaíno¹, Andrew F. Jarnuczak¹, Robert Petryszak¹, Peipei Ping², Henning Hermjakob¹ - Show less +5 more•Institutions (2)

European Bioinformatics Institute¹, University of California, Los Angeles²

14 Mar 2018-bioRxiv

TL;DR: The FAIR principles have been developed to promote good scientific practises for scientific data and data resources and put a specific emphasis on enhancing the ability of both individuals and software to discover and re-use digital objects in an automated fashion throughout their entire life cycle.

...read moreread less

Abstract: The amount of omics data in the public domain is increasing every year. Public availability of datasets is growing in all disciplines, because it is considered to be a good scientific practice (e.g. to enable reproducibility), and/or it is mandated by funding agencies, scientific journals. Science is now a data-intensive discipline and therefore, new and innovative ways for data management, data sharing, and for discovering novel datasets are increasingly required. In 2016, we released the first version of the Omics Discovery Index (www.omicsdi.org) as a light-weight system to aggregate datasets across multiple public omics data resources. OmicsDI integrates genomics, transcriptomics, proteomics, metabolomics, and multi-omics datasets, as well as computational models of biological processes. Here, we propose a set of novel metrics to quantify the impact of biomedical datasets. A complete framework (now integrated into OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we propose a set of recommendations for authors, journals, and data resources to promote an optimal quantification of the impact of datasets.

...read moreread less

23 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Reactome Pathway Knowledgebase.

[...]

Antonio Fabregat¹, Konstantinos Sidiropoulos¹, Phani V. Garapati¹, Marc Gillespie², Marc Gillespie³, Kerstin Hausmann¹, Robin Haw², Bijay Jassal², S Jupe¹, Florian Korninger¹, Sheldon J. McKay², Lisa Matthews⁴, Bruce May², Marija Milacic², Karen Rothfels², Veronica Shamovsky⁴, Marissa Webber², Joel Weiser², Mark Williams¹, Guanming Wu², Lincoln Stein⁵, Lincoln Stein², Lincoln Stein⁶, Henning Hermjakob⁷, Henning Hermjakob¹, Peter D'Eustachio⁴ - Show less +22 more•Institutions (7)

European Bioinformatics Institute¹, Ontario Institute for Cancer Research², St. John's University³, New York University⁴, University of Toronto⁵, Cold Spring Harbor Laboratory⁶, Protein Sciences⁷

01 Jan 2014-Nucleic Acids Research

TL;DR: The Reactome Knowledgebase provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model.

...read moreread less

Abstract: The Reactome Knowledgebase (www.reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression pattern surveys or somatic mutation catalogues from tumour cells. Over the last two years we redeveloped major components of the Reactome web interface to improve usability, responsiveness and data visualization. A new pathway diagram viewer provides a faster, clearer interface and smooth zooming from the entire reaction network to the details of individual reactions. Tool performance for analysis of user datasets has been substantially improved, now generating detailed results for genome-wide expression datasets within seconds. The analysis module can now be accessed through a RESTFul interface, facilitating its inclusion in third party applications. A new overview module allows the visualization of analysis results on a genome-wide Reactome pathway hierarchy using a single screen page. The search interface now provides auto-completion as well as a faceted search to narrow result lists efficiently.

...read moreread less

5,065 citations

Journal Article•DOI•

Analysis Tool Web Services from the EMBL-EBI

[...]

Hamish McWilliam¹, Weizhong Li¹, Mahmut Uludag¹, Silvano Squizzato¹, Youngmi Park¹, Nicola Buso¹, Andrew Peter Cowley¹, Rodrigo Lopez¹ - Show less +4 more•Institutions (1)

European Bioinformatics Institute¹

01 Jul 2013-Nucleic Acids Research

TL;DR: Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces, which allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows.

...read moreread less

Abstract: Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.

...read moreread less

1,562 citations

Journal Article•DOI•

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

[...]

Yasset Perez-Riverol¹, Jingwen Bai¹, Chakradhar Bandla¹, David García-Seisdedos¹, Suresh Hewapathirana¹, Selvakumar Kamatchinathan¹, Deepti J. Kundu¹, Ananth Prakash¹, Anika Frericks-Zipper², Martin Eisenacher², Mathias Walzer¹, Shengbo Wang¹, Alvis Brazma¹, Juan Antonio Vizcaíno¹ - Show less +10 more•Institutions (2)

European Bioinformatics Institute¹, Ruhr University Bochum²

01 Nov 2021-Nucleic Acids Research

TL;DR: The PRIDE database as discussed by the authors is the world's largest data repository of mass spectrometry-based proteomics data and is one of the founding members of the global ProteomeXchange (PX) consortium.

...read moreread less

Abstract: The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

...read moreread less

1,491 citations

Journal Article•DOI•

The Human Urine Metabolome

[...]

Souhaila Bouatra¹, Farid Aziat¹, Rupasri Mandal¹, An Chi Guo¹, Michael Wilson¹, Craig Knox¹, Trent C. Bjorndahl¹, Ramanarayan Krishnamurthy¹, Fozia Saleem¹, Philip B. Liu¹, Zerihun T. Dame¹, Jenna Poelzer¹, Jessica Huynh¹, Faizath S. Yallou¹, Nick Psychogios², Edison Dong¹, Ralf Bogumil³, Cornelia Roehring³, David S. Wishart⁴, David S. Wishart¹ - Show less +16 more•Institutions (4)

University of Alberta¹, Harvard University², Biocrates Life Sciences AG³, National Institute for Nanotechnology⁴

04 Sep 2013-PLOS ONE

TL;DR: A comprehensive, quantitative, metabolome-wide characterization of human urine and the identification and annotation of several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage are undertaken.

...read moreread less

Abstract: Urine has long been a “favored” biofluid among metabolomics researchers. It is sterile, easy-to-obtain in large volumes, largely free from interfering proteins or lipids and chemically complex. However, this chemical complexity has also made urine a particularly difficult substrate to fully understand. As a biological waste material, urine typically contains metabolic breakdown products from a wide range of foods, drinks, drugs, environmental contaminants, endogenous waste metabolites and bacterial by-products. Many of these compounds are poorly characterized and poorly understood. In an effort to improve our understanding of this biofluid we have undertaken a comprehensive, quantitative, metabolome-wide characterization of human urine. This involved both computer-aided literature mining and comprehensive, quantitative experimental assessment/validation. The experimental portion employed NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) experiments performed on multiple human urine samples. This multi-platform metabolomic analysis allowed us to identify 445 and quantify 378 unique urine metabolites or metabolite species. The different analytical platforms were able to identify (quantify) a total of: 209 (209) by NMR, 179 (85) by GC-MS, 127 (127) by DFI/LC-MS/MS, 40 (40) by ICP-MS and 10 (10) by HPLC. Our use of multiple metabolomics platforms and technologies allowed us to identify several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage. It also allowed us to critically assess the relative strengths and weaknesses of different platforms or technologies. The literature review led to the identification and annotation of another 2206 urinary compounds and was used to help guide the subsequent experimental studies. An online database containing the complete set of 2651 confirmed human urine metabolite species, their structures (3079 in total), concentrations, related literature references and links to their known disease associations are freely available at http://www.urinemetabolome.ca.

...read moreread less

1,118 citations

Journal Article•DOI•

DREME: motif discovery in transcription factor ChIP-seq data

[...]

Timothy L. Bailey¹•Institutions (1)

University of Queensland¹

15 Jun 2011-Bioinformatics

TL;DR: DREME is much faster than many commonly used algorithms, scales linearly in dataset size, finds multiple, non-redundant motifs and reports a reliable measure of statistical significance for each motif found.

...read moreread less

Abstract: Motivation: Transcription factor (TF) ChIP-seq datasets have particular characteristics that provide unique challenges and opportunities for motif discovery. Most existing motif discovery algorithms do not scale well to such large datasets, or fail to report many motifs associated with cofactors of the ChIP-ed TF. Results: We present DREME, a motif discovery algorithm specifically designed to find the short, core DNA-binding motifs of eukaryotic TFs, and optimized to analyze very large ChIP-seq datasets in minutes. Using DREME, we discover the binding motifs of the the ChIP-ed TF and many cofactors in mouse ES cell (mESC), mouse erythrocyte and human cell line ChIP-seq datasets. For example, in mESC ChIP-seq data for the TF Esrrb, we discover the binding motifs for eight cofactor TFs important in the maintenance of pluripotency. Several other commonly used algorithms find at most two cofactor motifs in this same dataset. DREME can also perform discriminative motif discovery, and we use this feature to provide evidence that Sox2 and Oct4 do not bind in mES cells as an obligate heterodimer. DREME is much faster than many commonly used algorithms, scales linearly in dataset size, finds multiple, non-redundant motifs and reports a reliable measure of statistical significance for each motif found. DREME is available as part of the MEME Suite of motif-based sequence analysis tools (http://meme.nbcr.net).

...read moreread less

963 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

Collapse