scispace - formally typeset
Search or ask a question

Showing papers by "Yongrui Qin published in 2018"


Journal ArticleDOI
TL;DR: This article proposes a collaborative filtering method based on Tensor Factorization, a generalization of the Matrix Factorization approach, to model the multi-dimensional contextual information and improves the recommendation accuracy by utilizing the internal relations within users and locations to regularize the latent factors.
Abstract: Point-of-Interest (POI) recommendation is a new type of recommendation task that comes along with the prevalence of location-based social networks and services in recent years. Compared with traditional recommendation tasks, POI recommendation focuses more on making personalized and context-aware recommendations to improve user experience. Traditionally, the most commonly used contextual information includes geographical and social context information. However, the increasing availability of check-in data makes it possible to design more effective location recommendation applications by modeling and integrating comprehensive types of contextual information, especially the temporal information. In this article, we propose a collaborative filtering method based on Tensor Factorization, a generalization of the Matrix Factorization approach, to model the multi-dimensional contextual information. Tensor Factorization naturally extends Matrix Factorization by increasing the dimensionality of concerns, within which the three-dimensional model is the one most popularly used. Our method exploits a high-order tensor to fuse heterogeneous contextual information about users’ check-ins instead of the traditional two-dimensional user-location matrix. The factorization of this tensor leads to a more compact model of the data that is naturally suitable for integrating contextual information to make POI recommendations. Based on the model, we further improve the recommendation accuracy by utilizing the internal relations within users and locations to regularize the latent factors. Experimental results on a large real-world dataset demonstrate the effectiveness of our approach.

54 citations


Journal ArticleDOI
TL;DR: A system that can recognize handwritten words expressed using broken letters of the Persian alphabet using interconnected fuzzy neural network and the possibility of extending dataset instance codes in a simple manner is presented.
Abstract: This paper presents a system that can recognize handwritten words expressed using broken letters of the Persian alphabet. The proposed system can be used for most activities related to the gathering of public information. Statistical features of the separated/broken letters are employed in the system. Each letter is recognized using interconnected fuzzy neural network. The advantages of this method include high precision owing to the strength of the neural network algorithm and the possibility of extending dataset instance codes in a simple manner. At last, an evaluation for the proposed method is provided experimentally.

19 citations


Journal ArticleDOI
TL;DR: This work focuses on modeling features of a SParQL query to a vector representation and proposes a two-step prediction process that can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.
Abstract: One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.

13 citations


Book ChapterDOI
15 Nov 2018
TL;DR: A new discovering multi-stage Algorithm is used to choose Boundary and non-Boundary samples so, the results of the conducted experiments shows promising performance of the proposing method.
Abstract: Linear discriminant analysis is considered as current techniques in feature extraction so, LDA, by discriminant information which obtains in mapping space, does the classification act. When the classes’ distribution is not normal, LDA, to perform classification, will face problem and will resulted the poor performance of criteria in performing the classification act. One of the proposed ways is the use of other measures, such as Chernoff’s distance so, by using Chernoff’s measure LDA has been spreading to its heterogeneous states and LDA in this state, in addition to use information among the medians, uses the information of the classes’ Covariance matrices. By defining scattering matrix, based on Boundary and non-Boundary samples and using these matrices in Chernoff’s criteria, the decrease of the classes’ overlapping in the mapping space in as result, the rate of classification correctness increases. Using Boundary and non-Boundary samples in scattering matrices causes improvement over the result. In this article, we use a new discovering multi-stage Algorithm to choose Boundary and non-Boundary samples so, the results of the conducted experiments shows promising performance of the proposing method.

9 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: This work presents the novel MPTCP algorithm augmenting the end user throughput by understating the best policy in different situations by Q-learning and reveals a tremendous effect of switching between the different interfaces and changing the congestion control mechanism on throughput and delay.
Abstract: Mobile devices are able to leverage diverse heterogeneous network paths by Multi-Path Transmission Control Protocol (MPTCP); nevertheless, boosting MPTCP throughput in wireless networks is a real bear. Not only the best path(s) should be selected, but also the optimal congestion control mechanism should be chosen. We investigate the impact of different paths and congestion control for different signal quality states. Consequently, we present the novel MPTCP algorithm augmenting the end user throughput by understating the best policy in different situations by Q-learning. The Results reveal a tremendous effect of switching between the different interfaces and changing the congestion control mechanism on throughput and delay. By and large, the proposed framework achieves 10% more throughput compared to base MPTCP.

6 citations


Journal ArticleDOI
TL;DR: A feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms, and a time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement.
Abstract: Knowledge Bases (KBs) are widely used as one of the fundamental components in Semantic Web applications as they provide facts and relationships that can be automatically understood by machines. Curated knowledge bases usually use Resource Description Framework (RDF) as the data representation model. To query the RDF-presented knowledge in curated KBs, Web interfaces are built via SPARQL Endpoints. Currently, querying SPARQL Endpoints has problems like network instability and latency, which affect the query efficiency. To address these issues, we propose a client-side caching framework, SPARQL Endpoint Caching Framework (SECF), aiming at accelerating the overall querying speed over SPARQL Endpoints. SECF identifies the potential issued queries by leveraging the querying patterns learned from clients’ historical queries and prefecthes/caches these queries. In particular, we develop a distance function based on graph edit distance to measure the similarity of SPARQL queries. We propose a feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms. A time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement. Extensive experiments performed on real-world queries showcase the effectiveness of our approach, which outperforms the state-of-the-art work in terms of the overall querying speed.

5 citations


Journal ArticleDOI
TL;DR: The results show that the proposed algorithm can effectively place XML data on air and significantly improve the overall access efficiency.
Abstract: Wireless data broadcast is an efficient way of delivering data of common interest to a large population of mobile devices within a proximate area, such as smart cities, battle fields, etc. In this work, we focus ourselves on studying the data placement problem of periodic XML data broadcast in mobile and wireless environments. This is an important issue, particularly when XML becomes prevalent in today’s ubiquitous and mobile computing devices and applications. Taking advantage of the structured characteristics of XML data, effective broadcast programs can be generated based on the XML data on the server only. An XML data broadcast system is developed and a theoretical analysis on the XML data placement on a wireless channel is also presented, which forms the basis of the novel data placement algorithm in this work. The proposed algorithm is validated through a set of experiments. The results show that the proposed algorithm can effectively place XML data on air and significantly improve the overall access efficiency.

4 citations



Proceedings ArticleDOI
01 Nov 2018
TL;DR: An efficient approach in extracting feature vectors from the drugs in a drug-pair to compute the similarity ratio between them through a network model will drive research efforts into more efficient data-mining algorithms for information retrieval, similarity search and machine learning.
Abstract: With more patients taking multiple medications and the increasing digital availability of diagnostic data such as treatment notes and x-ray images, the importance of decision support systems to help dentists in their treatment planning cannot be over emphasised. Based on the hypothesis that a higher similarity ratio between drugs in a drug-pair indicates that the combination of the drug-pair has a higher chance of an adverse interaction, this paper describes an efficient approach in extracting feature vectors from the drugs in a drug-pair to compute the similarity ratio between them. The feature vectors are obtained through a network model where the information of the drugs are represented as nodes and the relationships between them represented as edges. Experimental evaluation of our model yielded a superior F score of 74%. The use of a network model will drive research efforts into more efficient data-mining algorithms for information retrieval, similarity search and machine learning. Since it is important to avoid drug allergies when prescribing drugs, our work when integrated within the clinical work-flow will reduce prescription errors thereby increasing health outcomes for patients.

3 citations



Book ChapterDOI
16 Nov 2018
TL;DR: A novel technique to improve the performance of the existing GA-based outlier detection method using a bit freezing approach to achieve a faster convergence and achieve an early stop of the GA that leads to more accurate approximation of fitness function is proposed.
Abstract: In this paper, we study the problem of subspace outlier detection in high dimensional data space and propose a new genetic algorithm-based technique to identify outliers embedded in subspaces. The existing technique, mainly using genetic algorithm (GA) to carry out the subspace search, is generally slow due to its expensive fitness evaluation and long solution encoding scheme. In this paper, we propose a novel technique to improve the performance of the existing GA-based outlier detection method using a bit freezing approach to achieve a faster convergence. Through freezing converged bits in the solution encoding strings, this innovative approach can contribute to fast crossover and mutation operations and achieve an early stop of the GA that leads to more accurate approximation of fitness function. This research work can contribute to the development of a more efficient search method for detecting subspace outliers. The experimental results demonstrate the improved efficiency of our technique compared with the existing method.

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This paper provides a method to compensate the missing data by using the characteristics of BP neural network learning system aiming at the dynamic target tracking system, and the prediction accuracy is much higher.
Abstract: During the target tracking process, some observation data may be missing due to the equipment problems or the operation errors, which may affect the filtering process of the target state and the position determination accuracy. Therefore, the missing data needs to be effectively compensated. This paper provides a method to compensate the missing data by using the characteristics of BP neural network learning system aiming at the dynamic target tracking system. The neural network is trained by using the complete data of the dynamic system, and then the missing data is predicted by the trained neural network. The simulation results for both of the linear system and the nonlinear system show that the method is indeed effective, compared to the traditional time update prediction, the prediction accuracy is much higher.

Book ChapterDOI
03 Sep 2018
TL;DR: A semi-automatic method to construct an MDT that only requires a small amount of manual input is introduced, in combination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data.
Abstract: In recent years large volumes of short text data can be easily collected from platforms such as microblogs and product review sites. Very often the obtained short text data contains several domains, which poses many challenges in effective multi-domain text processing because it is challenging to distinguish among the multiple domains in the text data. The concept of multiple domain taxonomy (MDT) has shown promising performance in processing multi-domain text data. However, MDT has to be constructed manually, which requires much expert knowledge about the relevant domains and is time consuming. To address such issues, in this paper, we introduce a semi-automatic method to construct an MDT that only requires a small amount of manual input, in combination of an unsupervised method for ranking multi-domain concepts based on semantic relationships learned from unlabeled data. We show that the iteratively-constructed MDT using our semi-automatic method can achieve higher accuracy than existing methods in domain classification, where the accuracy can be improved by up to 11%.