scispace - formally typeset
Search or ask a question

Showing papers by "Volkan Atalay published in 2022"


Journal ArticleDOI
TL;DR: This study presents SLPred, an ensemble-based multi-view and multi-label protein subcellular localization prediction tool, and tests its performance against six state-of-the-art methods.
Abstract: SUMMARY Accurate prediction of the subcellular locations of proteins is a critical topic in protein science. In this study, we present SLPred, an ensemble-based multi-view and multi-label protein subcellular localization prediction tool. For a query protein sequence, SLPred provides predictions for nine main subcellular locations using independent machine learning models trained for each location. We used UniProtKB/Swiss-Prot human protein entries and their curated subcellular location (SL) annotations as our source data. We connected all disjoint terms in the UniProt SL hierarchy based on the corresponding term relationships in the cellular component category of Gene Ontology, and constructed a training dataset that is both reliable and large-scale using the re-organized hierarchy. We tested SLPred on multiple benchmarking datasets including our-in house sets, and compared its performance against six state-of-the-art methods. Results indicated that SLPred outperforms other tools in the majority of cases. AVAILABILITY SLPred is available both as an open-access and user-friendly web-server (https://slpred.kansil.org) and a stand-alone tool (https://github.com/kansil/SLPred). All datasets used in this study are also available at https://slpred.kansil.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

1 citations


Journal ArticleDOI
TL;DR: Zubaroglu et al. as mentioned in this paper described a novel method EmCStream, to apply UMAP on evolving (nonstationary) data streams, to detect and adapt concept drift and to cluster embedded data instances using a distance or partitioning-based clustering algorithm.
Abstract: Number of connected devices is steadily increasing and this trend is expected to continue in the near future. Connected devices continuously generate data streams and the data streams may often be high dimensional and contain concept drift. Clustering is one of the most suitable methods for real‐time data stream processing, since clustering can be applied with less prior information about the data. Also, data embedding makes the visualization of high dimensional data possible and may simplify clustering process. There exist several data stream clustering algorithms in the literature; however, no data stream embedding method exists. Uniform Manifold Approximation and Projection (UMAP) is a data embedding algorithm that is suitable to be applied on stationary (stable) data streams, though it cannot adapt concept drift. In this study, we describe a novel method EmCStream, to apply UMAP on evolving (nonstationary) data streams, to detect and adapt concept drift and to cluster embedded data instances using a distance or partitioning‐based clustering algorithm. We have evaluated EmCStream against the state‐of‐the‐art stream clustering algorithms using both synthetic and real data streams containing concept drift. EmCStream outperforms DenStream and CluStream, in terms of clustering quality, on both synthetic and real evolving data streams. Datasets and code of this study are available online at https://gitlab.com/alaettinzubaroglu/emcstream.