scispace - formally typeset
Search or ask a question
Author

Sule Gunduz Oguducu

Other affiliations: Istanbul University
Bio: Sule Gunduz Oguducu is an academic researcher from Istanbul Technical University. The author has contributed to research in topics: Recommender system & Collaborative filtering. The author has an hindex of 11, co-authored 57 publications receiving 409 citations. Previous affiliations of Sule Gunduz Oguducu include Istanbul University.


Papers
More filters
Proceedings ArticleDOI
23 Oct 2009
TL;DR: A new method based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations which results in faster computational time.
Abstract: In this paper, we present a new method for calculating semantic similarities between documents This method is based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations The calculation of semantic similarities between documents is a very time consuming task, since it is necessary first to calculate semantic similarities between each pair of words that appear on different documents In this paper, we present a new method to calculate semantic similarities between documents which results in faster computational time Both a taxonomy based semantic similarity and cosine similarity are employed First, the concept vectors of documents are obtained by extending the terms in the document vectors with their corresponding IS-A concepts Cosine similarity is then calculated between those concept vectors of documents Thus, the overall similarity between documents is a combination of cosine similarity and semantic similarity The proposed semantic similarity is tested in document clustering problem The experimental results show that our method achieves a good performance

39 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel framework that examines various structural features of the network and detects the most prominent subset of community features in order to predict the future direction of community evolution, and requires extraction of a minimal number ofcommunity features to effectively determine whether a community will remain stable or undergo certain events such as shrink, merge or split.

37 citations

Journal ArticleDOI
TL;DR: Experiments demonstrate that the proposed multivariate method using NARX outperforms the previous temporal methods using univariate time series in different test cases.
Abstract: In this article, we propose a novel multivariate method for link prediction in evolving heterogeneous networks using a Nonlinear Autoregressive Neural Network with External Inputs (NARX). The proposed method combines (1) correlations between different link types; (2) the effects of different topological local and global similarity measures in different time periods; (3) nonlinear temporal evolution information; (4) the effects of the creation, preservation or removal of the links between the node pairs in consecutive time periods. We evaluate the performance of link prediction in terms of different AUC measures. Experiments on real networks demonstrate that the proposed multivariate method using NARX outperforms the previous temporal methods using univariate time series in different test cases.

36 citations

Journal ArticleDOI
TL;DR: A novel method, called Multivariate Time Series Link Prediction for evolving heterogeneous networks that incorporate temporal evolution of the network; correlations between link evolution and multi-typed relationships; local and global similarity measures; and node connectivity information.
Abstract: Link prediction is considered as one of the key tasks in various data mining applications for recommendation systems, bioinformatics, security and worldwide web. The majority of previous works in l...

34 citations

Proceedings ArticleDOI
25 Aug 2015
TL;DR: This paper proposed a new approach for predicting events by estimating feature values related to the communities in a given network and event prediction using forecasted feature values substantially match up with actual events.
Abstract: Communities in real life are usually dynamic and community structures evolve over time. Detecting community evolution provides insight into the underlying behavior of the network. A growing body of study is devoted in studying the dynamics of communities in evolving social networks. Most of them provide an event-based framework to characterize and track the community evolution. A part of these studies take a step further and provide a predictive model of the events by exploiting community features. However, the proposed models require the community extraction and computing the community features relevant to the time point to be predicted. In this paper, we proposed a new approach for predicting events by estimating feature values related to the communities in a given network. An event-based framework is used to characterize community behavior patterns. Then, a time series ARIMA model is used to predict how particular community features will change in the following time period. Distinct time windows are examined in constituting and analyzing time series. Our proposed approach efficiently tracks similar communities and identifies events over time. Furthermore, community feature values are forecasted with an acceptable error rate. Event prediction using forecasted feature values substantially match up with actual events.

33 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This two-part paper has surveyed different multiobjective evolutionary algorithms for clustering, association rule mining, and several other data mining tasks, and provided a general discussion on the scopes for future research in this domain.
Abstract: The aim of any data mining technique is to build an efficient predictive or descriptive model of a large amount of data. Applications of evolutionary algorithms have been found to be particularly useful for automatic processing of large quantities of raw noisy data for optimal parameter setting and to discover significant and meaningful information. Many real-life data mining problems involve multiple conflicting measures of performance, or objectives, which need to be optimized simultaneously. Under this context, multiobjective evolutionary algorithms are gradually finding more and more applications in the domain of data mining since the beginning of the last decade. In this two-part paper, we have made a comprehensive survey on the recent developments of multiobjective evolutionary algorithms for data mining problems. In this paper, Part I, some basic concepts related to multiobjective optimization and data mining are provided. Subsequently, various multiobjective evolutionary approaches for two major data mining tasks, namely feature selection and classification, are surveyed. In Part II of this paper, we have surveyed different multiobjective evolutionary algorithms for clustering, association rule mining, and several other data mining tasks, and provided a general discussion on the scopes for future research in this domain.

406 citations

01 Jan 2012
TL;DR: In this paper, the difference between the types of condition issues on objects that cause structural issues versus those that are purely cosmetic is identified and identified, which can help when determining whether or not an item requires conservation or deciding simple factors like the best way to store or exhibit an item.
Abstract: As preservers of history and the objects that others leave behind, it is easy to get distracted by a desire for those objects to be “perfect.” We worry about every little tear, stain or blemish, and we want nothing more than to return objects to their "original state." However, so many times those rough spots are part of the item’s history. It is important to understand the difference between the types of condition issues on objects that cause structural issues versus those that are purely cosmetic. Knowing and identifying these differences can help when determining whether or not an item requires conservation or deciding simple factors like the best way to store or exhibit an item.

347 citations

Journal Article
TL;DR: In this article, a bipartite graph based data clustering method is proposed, where terms and documents are simultaneously grouped into semantically meaningful co-categories and subject descriptors.
Abstract: Bipartite Graph Partitioning and Data Clustering* Hongyuan Zha Xiaofeng He Dept. of Comp. Sci. & Eng. Penn State Univ. State College, PA 16802 {zha,xhe}@cse.psu.edu Chris Ding Horst Simon NERSC Division Berkeley National Lab. Berkeley, CA 94720 {chqding,hdsimon} Qlbl. gov Ming Gu Dept. of Math. U.C. Berkeley Berkeley, CA 94720 mgu@math.berkeley.edu ABSTRACT M a n y data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approxi­ mate solution to the minimization problem can be obtained by computing a partial singular value decomposition ( S V D ) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency. 1. INTRODUCTION Cluster analysis is an important tool for exploratory data mining applications arising from many diverse disciplines. Informally, cluster analysis seeks to partition a given data set into compact clusters so that data objects within a clus­ ter are more similar than those in distinct clusters. The liter­ ature on cluster analysis is enormous including contributions from many research communities, (see [6, 9] for recent sur­ veys of some classical approaches.) M a n y traditional clus­ tering algorithms are based on the assumption that the given dataset consists of covariate information (or attributes) for each individual data object, and cluster analysis can be cast as a problem of grouping a set of n-dimensional vectors each representing a data object in the dataset. A familiar ex­ ample is document clustering using the vector space model [1]. Here each document is represented by an n-dimensional vector, and each coordinate of the vector corresponds to a term in a vocabulary of size n. This formulation leads to the so-called term-document matrix A = (oy) for the rep­ resentation of the collection of documents, where o y is the so-called term frequency, i.e., the number of times term i occurs in document j. In this vector space model terms and documents are treated asymmetrically with terms consid­ ered as the covariates or attributes of documents. It is also possible to treat both terms and documents as first-class citizens in a symmetric fashion, and consider a y as the fre­ quency of co-occurrence of term i and document j as is done, for example, in probabilistic latent semantic indexing [12]. In this paper, we follow this basic principle and propose a new approach to model terms and documents as vertices in a bipartite graph with edges of the graph indicating the co-occurrence of terms and documents. In addition we can optionally use edge weights to indicate the frequency of this co-occurrence. Cluster analysis for document collections in this context is based on a very intuitive notion: documents are grouped by topics, on one hand documents in a topic tend to more heavily use the same subset of terms which form a term cluster, and on the other hand a topic usually is characterized by a subset of terms and those documents heavily using those terms tend to be about that particular topic. It is this interplay of terms and documents which gives rise to what we call bi-clustering by which terms and documents are simultaneously grouped into semantically co- Categories and Subject Descriptors 11.3.3 [ I n f o r m a t i o n S e a r c h a n d R e t r i e v a l ] : Clustering; G.1.3 [ N u m e r i c a l L i n e a r A l g e b r a ] : Singular value de­ composition; G.2.2 [ G r a p h T h e o r y ] : G r a p h algorithms General Terms Algorithms, theory Keywords document clustering, bipartite graph, graph partitioning, spectral relaxation, singular value decomposition, correspon­ dence analysis *Part of this work was done while Xiaofeng He was a grad­ uate research assistant at N E R S C , Berkeley National Lab. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM '01 November 5-10, 2001, Atlanta, Georgia. U S A Copyright 2001 A C M X - X X X X X - X X - X / X X / X X ...$5.00. O u r clustering algorithm computes an approximate global optimal solution while probabilistic latent semantic indexing relies on the E M algorithm and therefore might be prone to local m i n i m a even with the help of some annealing process. x

295 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the distinctive features and challenges of dynamic community discovery and propose a classification of published approaches, which can be used to identify the set of approaches that best fit their needs.
Abstract: Several research studies have shown that complex networks modeling real-world phenomena are characterized by striking properties: (i) they are organized according to community structure, and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, giving birth to the field of community discovery. A novel and fascinating problem started capturing researcher interest recently: the identification of evolving communities. Dynamic networks can be used to model the evolution of a system: nodes and edges are mutable, and their presence, or absence, deeply impacts the community structure that composes them. This survey aims to present the distinctive features and challenges of dynamic community discovery and propose a classification of published approaches. As a “user manual,” this work organizes state-of-the-art methodologies into a taxonomy, based on their rationale, and their specific instantiation. Given a definition of network dynamics, desired community characteristics, and analytical needs, this survey will support researchers to identify the set of approaches that best fit their needs. The proposed classification could also help researchers choose in which direction to orient their future research.

270 citations

Journal ArticleDOI
31 Dec 2020
TL;DR: The authors discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision, and almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.
Abstract: Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.

158 citations