scispace - formally typeset
Search or ask a question

Showing papers by "Hiroyuki Kitagawa published in 2019"


Journal ArticleDOI
TL;DR: This research proposes a novel automated scoring method named “MC-SleepNet”, which combines two types of deep neural networks, and evaluates its performance using a large-scale dataset that contains 4,200 biological signal records of mice.
Abstract: Automated sleep stage scoring for mice is in high demand for sleep research, since manual scoring requires considerable human expertise and efforts. The existing automated scoring methods do not provide the scoring accuracy required for practical use. In addition, the performance of such methods has generally been evaluated using rather small-scale datasets, and their robustness against individual differences and noise has not been adequately verified. This research proposes a novel automated scoring method named "MC-SleepNet", which combines two types of deep neural networks. Then, we evaluate its performance using a large-scale dataset that contains 4,200 biological signal records of mice. The experimental results show that MC-SleepNet can automatically score sleep stages with an accuracy of 96.6% and kappa statistic of 0.94. In addition, we confirm that the scoring accuracy does not significantly decrease even if the target biological signals are noisy. These results suggest that MC-SleepNet is very robust against individual differences and noise. To the best of our knowledge, evaluations using such a large-scale dataset (containing 4,200 records) and high scoring accuracy (96.6%) have not been reported in previous related studies.

33 citations


Proceedings ArticleDOI
01 Aug 2019
TL;DR: gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters within a short running time, and outperforms existing methods in terms of running time while finding clusters with high accuracy.
Abstract: Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to the resolution limit problem. (2) They are computationally expensive because all nodes and edges must be computed iteratively. This paper proposes gScarf, which outputs fine-grained clusters within a short running time. To overcome the aforementioned weaknesses, gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters. Experiments show that gScarf outperforms existing methods in terms of running time while finding clusters with high accuracy.

8 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposes a new approach to recommend method names by applying graph embedding techniques to the call graph and confirms that the proposed technique can suggest more appropriate name candidates in difficult situations than the state-of-the-art approach.
Abstract: Software developers must provide meaningful but short names to identifiers because they strongly affect the comprehensibility of source code. On the other hand, identifier naming can be a difficult and time-consuming task, even for experienced developers. To support identifier naming, several techniques to recommend candidate names have been proposed. These techniques have challenges on the goodness of suggested candidates and limitations of applicable situations. This paper proposes a new approach to recommend method names by applying graph embedding techniques to the call graph. An experiment confirms that the proposed technique can suggest more appropriate name candidates in difficult situations than the state-of-the-art approach.

8 citations


Proceedings ArticleDOI
08 Apr 2019
TL;DR: A novel grid-based index is employed to manage both queries and dynamic spatial keyword objects and a buffer named partial cell list is developed to reduce the computation cost in the top-k reevaluation.
Abstract: As the popularity of SNS and the number of GPS-equipped mobile devices increases, a large number of web users frequently change their location (spatial attribute) and interesting keywords (keyword attribute) in real-time. An example of such would be when a user watches the news, videos, and blogs while moving. Many location-based web applications can benefit from continuously searching for these dynamic spatial keyword objects. In this paper, we define a novel query problem to continuously search for dynamic spatial keyword objects. To the best of our knowledge, this is the first work to consider dynamic spatial keyword objects. We employ a novel grid-based index to manage both queries and dynamic spatial keyword objects. With the proposed index, we develop a buffer named partial cell list to reduce the computation cost in the top-k reevaluation. The experiments confirm the superiorities of our proposed methods.

7 citations


Posted Content
TL;DR: In this article, a new approach to recommending method names by applying graph embedding techniques to the method call graph has been proposed, which can suggest more appropriate method name candidates in difficult situations than the state-of-the-art approach.
Abstract: Comprehensibility of source code is strongly affected by identifier names, therefore software developers need to give good (e.g. meaningful but short) names to identifiers. On the other hand, giving a good name is sometimes a difficult and time-consuming task even for experienced developers. To support naming identifiers, several techniques for recommending identifier name candidates have been proposed. These techniques, however, still have challenges on the goodness of suggested candidates and limitations on applicable situations. This paper proposes a new approach to recommending method names by applying graph embedding techniques to the method call graph. The evaluation experiment confirms that the proposed technique can suggest more appropriate method name candidates in difficult situations than the state of the art approach.

6 citations


Proceedings ArticleDOI
02 Dec 2019
TL;DR: This paper defines a new attribute-driven community search problem class called the Flexible Attributed Truss Community (F-ATC), and presents a novel heuristic algorithm to solve the F- ATC problem, which detects more accurate communities from attributed graphs than the traditional algorithms.
Abstract: How can the most appropriate community be found given an attributed graph and a user-specified query node? The community search algorithm is currently an essential graph data management tool to find a community suited to a user-specified query node. Although community search algorithms are useful in various web-based applications and services, they have trouble handling attributed graphs due to the strict topological constraints of traditional algorithms. In this paper, we propose an accurate community search algorithm for attributed graphs. To overcome current limitations, we define a new attribute-driven community search problem class called the Flexible Attributed Truss Community (F-ATC). The advantage of the F-ATC problem is that it relaxes topological constraints, allowing diverse communities to be explored. Consequently, the community search accuracy is enhanced compared to traditional community search algorithms. Additionally, we present a novel heuristic algorithm to solve the F-ATC problem. This effective algorithm detects more accurate communities from attributed graphs than the traditional algorithms. Finally, extensive experiments are conducted using real-world attributed graphs to demonstrate that our approach achieves a higher accuracy than the state-of-the-art method.

6 citations


Journal ArticleDOI
TL;DR: This work proposes a multi-query optimization for the proposed smart scheme to cover the cases where multiple continuous queries are registered, which makes use of smart windows to buffer the stream tuples during the absence of an event.
Abstract: With the increase in stream data, a demand for stream processing has become diverse and complicated. To meet this demand, several stream processing engines (SPEs) have been developed which execute continuous queries (CQs) to process continuous data streams. Event-driven stream processing, which is one of the important requirements, continuously gets the incoming stream data and, however, generates query results only on the occurrence of specified events. In the basic query execution scheme, even when no event is raised, input stream tuples are continuously processed by query operators, though they do not generate any query result. This results in increased system load and wastage of system resources. For this problem, we propose a smart event-driven stream processing scheme, which makes use of smart windows to buffer the stream tuples during the absence of an event. When the event is raised, the buffered tuples are flushed and processed by the downstream operators. If the buffered tuples in the smart window expire due to the window size before the occurrence of an event, they are deleted directly from the smart window. Since CQs once registered are executed for several weeks, months or even years, SPEs usually execute several CQs in parallel and merge their query plans whenever possible to save processing cost. Due to the presence of smart window, existing multi-query optimization techniques cannot work for smart event-driven stream processing. Hence, this work proposes a multi-query optimization for the proposed smart scheme to cover the cases where multiple continuous queries are registered. Extensive experiments are performed on real and synthetic data streams to show the effectiveness of the proposed smart scheme and its multi-query optimization.

6 citations


Proceedings Article
01 Jan 2019
TL;DR: This paper proposes a pipelined process of RPQs by dividing a query into multiple stages thereby taking advantage of pipeline parallelism, and achieves up to 23.6x faster for the small dataset and up to 4.61x faster than the comparative method running on CPU.
Abstract: This paper proposes a scheme for accelerating regular-path queries (RPQs) for directed edge-labeled graphs using an FPGA. Graphs are quite useful to represent various types of relationships among different entities and have been used in diverse fields, such as social networking analysis, linked open data (LOD), and bioscience. RPQs are queries to retrieve pairs of vertices that are reachable through a path whose labels conform to a user-specified regular expression. Despite its importance and usefulness, RPQs have not been paid much attention. In this paper, we attempt to accelerate such queries using an FPGA (field programmable gate array). Specifically, we propose a pipelined process of RPQs by dividing a query into multiple stages thereby taking advantage of pipeline parallelism. Experimental evaluations show that the proposed accelerator achieves up to 23.6x faster for the small dataset and up to 4.61x faster for large dataset than the comparative method running on CPU.

5 citations


Proceedings ArticleDOI
17 Oct 2019
TL;DR: A new noise-reduction model called NR-GAN, which is a method for deep learning models that does not require noise records as training samples, is proposed and can reduce noise in mice EEG signals.
Abstract: To support basic sleep research, several automated sleep stage scoring methods for mice have been proposed. Although these methods can score mice sleep stages accurately based on their electroencephalogram (EEG) and electromyogram (EMG) signals, they are fragile against noise, especially in EEG signals. The simplest solution is to reduce or eliminate noise before scoring. However, a method for reducing noise in biological signals does not exist. Because EEG signals contain many types of noise, predicting all of them is difficult, which inhibits the use of hand-engineered methods such as frequency filters. Additionally, noise reduction methods with deep learning models are not applicable as they require records of noise, and the noise considered here cannot be measured separately from biological signals. In this study, we address this problem using adversarial training, which is a method for deep learning models that does not require noise records as training samples. We propose a new noise-reduction model called "NR-GAN." Its training process requires a set of noisy signals and a set of clear signals. Since these sets can be measured independently, NR-GAN can reduce noise in mice EEG signals.

5 citations


Journal ArticleDOI
TL;DR: A novel query plan called MX-structure is proposed to consolidate CNs as much as possible and suppress explosive blowup of nodes in query plans by consolidating all common edges among CNs.

4 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: A novel class name recommendation approach to represent quantitatively the nature or behavior of classes by leveraging embedding technology for heterogeneous graphs makes it possible to recommend class names even where a previous approach cannot work.
Abstract: In software development, the quality of identifier names is important because it greatly affects program comprehension for developers. However, naming identifiers that appropriately represent the nature or behavior of program elements such as classes and methods is a difficult task requiring rich development experience and software domain knowledge. Although several studies proposed techniques for recommending identifier names, there are few studies targeting class names and they have limited availability. This paper proposes a novel class name recommendation approach widely available in software development. The key idea is to represent quantitatively the nature or behavior of classes by leveraging embedding technology for heterogeneous graphs. This makes it possible to recommend class names even where a previous approach cannot work. Experimental results suggest that the proposed approach can produce more accurate class name recommendation regardless of whether classes are used. In addition, a further experiment reveals a situation where the proposed approach is particularly effective.

Posted Content
TL;DR: GScarf as mentioned in this paper dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters and outperforms existing methods in terms of running time while finding clusters with high accuracy.
Abstract: Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to the resolution limit problem. (2) They are computationally expensive because all nodes and edges must be computed iteratively. This paper proposes gScarf, which outputs fine-grained clusters within a short running time. To overcome the aforementioned weaknesses, gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters. Experiments show that gScarf outperforms existing methods in terms of running time while finding clusters with high accuracy.

Proceedings ArticleDOI
01 Feb 2019
TL;DR: A real-time analytical system based on StreamingCube that can effectively process and analyze city data in real time very effectively, but can also provide analyzed results that are simple and easy to understand by the average citizens.
Abstract: Analyzing city data in real time can undercover many important facts about the city, which is very useful to support smart and healthy living Real-time data from sensors and SNSs can reflect the current condition of the city, ie, current weather, traffic, infrastructure, security, etc, which are quite useful for crime prevention, infrastructure maintenance, and supporting smart and healthy living In this paper, we develop a real-time analytical system based on StreamingCube This system processes city data from sensors in real time by applying OLAP analytical tools The analyzed results are visualized into tables, charts, graph, and interactive maps The application reveals that this system not only can effectively process and analyze city data in real time very effectively, but can also provide analyzed results that are simple and easy to understand by the average citizens

Journal ArticleDOI
TL;DR: Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods, and clustering results obtained using the CAR NMF indicate that CARNMf can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attributive clusters.

Book ChapterDOI
26 Aug 2019
TL;DR: This paper proposes two optimization methods to realize the reduction of computational cost for row pattern matching process in Spark and shows design and implementation of the proposed methods for Spark SQL.
Abstract: Due to the advance of information and communications technology and sensor technology, a large quantity of sequence data (time series data, log data, etc.) are generated and processed every day. Row pattern matching for the sequence data stored in relational databases was standardized as SQL/RPR in 2016. Today, in addition to relational databases, there are many frameworks for processing a large amount of data in parallel and distributed computing environments. They include MapReduce and Spark. Hive and Spark SQL enable us to code data analysis processes in SQL-like query languages. Row pattern matching is also beneficial in Hive and Spark SQL. However, computational cost of the row pattern matching process is large and it is needed to make this process efficient. In this paper, we propose two optimization methods to realize the reduction of computational cost for row pattern matching process. We focus on Spark and show design and implementation of the proposed methods for Spark SQL. We verify by the experiments that our optimization methods really contribute to the reduction of the processing time of Spark SQL queries including row pattern matching.

Proceedings ArticleDOI
01 Feb 2019
TL;DR: A protocol is proposed that allows the consistency with regard to the transaction amount and the balance to be checked without disclosing their values and exploits a homomorphic encryption to encrypt the transaction amounts and balance.
Abstract: A blockchain is a technology that allows transactions to be processed and committed data to be shared among participants without a central server. To implement applications among restricted parties with permission such as asset sales and trades, a special type of blockchain system called a private blockchain is used. Additionally, the demand for privacy preservation where sensitive information such as the trade amounts and balances is not disclosed is increasing. However, privacy preservation in a typical blockchain setting is difficult because it prevents parties that are not involved in the transaction from checking the correctness of the transaction being processed, which may lead to unintended or invalid transactions. To address this issue, herein we propose a protocol that allows the consistency with regard to the transaction amount and the balance to be checked without disclosing their values. Specifically, we exploit a homomorphic encryption to encrypt the transaction amount and balance. Thus, the correctness of a transaction can be publicly verified by the parties using the zero-knowledge proof without a trusted third party.

Book ChapterDOI
01 Jan 2019
TL;DR: This chapter discusses multidimensional analysis (also known as on-line analytical processing or OLAP) of big data by focusing particularly on data streams, characterized by huge volume and high velocity.
Abstract: Data warehousing and multidimensional analysis go side by side. Data warehouses provide clean and partially normalized data for fast, consistent, and interactive multidimensional analysis. With the advancement in data generation and collection technologies, businesses and organizations are now generating big data (defined by 3Vs; i.e., volume, variety, and velocity). Since the big data is different from traditional data, it requires different set of tools and techniques for processing and analysis. This chapter discusses multidimensional analysis (also known as on-line analytical processing or OLAP) of big data by focusing particularly on data streams, characterized by huge volume and high velocity. OLAP requires to maintain a number of materialized views corresponding to user queries for interactive analysis. Precisely, this chapter discusses the issues in maintaining the materialized views for data streams, the use of special window for the maintenance of materialized views and the coupling issues of stream processing engine (SPE) with OLAP engine.

Book ChapterDOI
26 Aug 2019
TL;DR: A pattern hierarchy model for Sequence OLAP is formalized and a very efficient algorithm to do multiple row pattern matching using SP-NFA (Shared Prefix Nondeterministic Finite Automaton) is proposed.
Abstract: Sequence OLAP is a variant of OLAP for sequence data analysis such as analysis of RFID log and person trip data. It extracts pattern occurrences of the given patterns (e.g., state transition pattern \(S_1 \rightarrow S_2\), movement pattern \(A \rightarrow B\)) on sequence data and executes multi-dimensional aggregate using OLAP operations (such as drill-down and roll-up) and pattern OLAP operations (such as pattern-drill-down and pattern-roll-up). The pattern OLAP operations are specific to Sequence OLAP and involve a hierarchy of multiple patterns. When sequence data is stored in relational databases as sequences of rows, row pattern matching finds all subsequences of rows which match a given pattern. To do Sequence OLAP, especially pattern OLAP operations, on relational databases, it is required to execute row pattern matching for such a hierarchy of multiple patterns and identify parent-child relationships among pattern occurrences. Generally, row pattern matching needs sequential scan of a large table and is an expensive operation. If row pattern matching is executed individually for each pattern, it is very time consuming. Therefore, it is strongly demanded to execute multiple row pattern matching for a given hierarchy of patterns efficiently. This paper formalizes a pattern hierarchy model for Sequence OLAP and proposes a very efficient algorithm to do multiple row pattern matching using SP-NFA (Shared Prefix Nondeterministic Finite Automaton). In experiments, we implement our algorithm in PostgreSQL and evaluate the effectiveness of the proposal.

Proceedings ArticleDOI
02 Dec 2019
TL;DR: A novel RankClus algorithm that reduces the running time for large bi-type information networks by dynamically updating ranking results and employs dynamic graph processing techniques into the ranking procedures included inRankClus.
Abstract: Given a bi-type information network, which is an extended model of well-known bipartite graphs, how can clusters be efficiently found in graphs? Graph clustering is now a fundamental tool to understand overviews from graph-structured data. The RankClus framework accurately performs clustering for bi-type information networks using ranking-based graph clustering techniques. It integrates a graph ranking algorithms such as PageRank or HITS into graph clustering procedures to improve the clustering quality. However, this integration incurs a high computational cost to handle large bi-type information networks since RankClus repeatedly computes the ranking algorithm for all nodes and edges until the clustering procedure converges. To overcome this runtime limitation, herein we present a novel RankClus algorithm that reduces the running time for large bi-type information networks. Our proposed method employs dynamic graph processing techniques into the ranking procedures included in RankClus. By dynamically updating ranking results, our proposal reduces the number of computed nodes and edges during repeated ranking procedures. We experimentally verify using real-world datasets that our proposed method successfully reduces the running time while maintaining the clustering quality of RankClus.