Showing papers by "Wang-Chien Lee published in 2014"

PDF

Open Access

Journal Article•DOI•

CIM: Community-Based Influence Maximization in Social Networks

[...]

Yi-Cheng Chen¹, Wen-Yuan Zhu², Wen-Chih Peng², Wang-Chien Lee³, Suh-Yin Lee² - Show less +1 more•Institutions (3)

Tamkang University¹, National Chiao Tung University², Pennsylvania State University³

30 Apr 2014-ACM Transactions on Intelligent Systems and Technology

TL;DR: A new framework to tackle the influence maximization problem with an emphasis on the time efficiency issue, and shows that the proposed CIM algorithm significantly outperforms the state-of-the-art algorithms in terms of efficiency and scalability, with almost no compromise of effectiveness.

...read moreread less

Abstract: Given a social graph, the problem of influence maximization is to determine a set of nodes that maximizes the spread of influences. While some recent research has studied the problem of influence maximization, these works are generally too time consuming for practical use in a large-scale social network. In this article, we develop a new framework, community-based influence maximization (CIM), to tackle the influence maximization problem with an emphasis on the time efficiency issue. Our proposed framework, CIM, comprises three phases: (i) community detection, (ii) candidate generation, and (iii) seed selection. Specifically, phase (i) discovers the community structure of the network; phase (ii) uses the information of communities to narrow down the possible seed candidates; and phase (iii) finalizes the seed nodes from the candidate set. By exploiting the properties of the community structures, we are able to avoid overlapped information and thus efficiently select the number of seeds to maximize information spreads. The experimental results on both synthetic and real datasets show that the proposed CIM algorithm significantly outperforms the state-of-the-art algorithms in terms of efficiency and scalability, with almost no compromise of effectiveness.

...read moreread less

136 citations

Journal Article•DOI•

Mining geographic-temporal-semantic patterns in trajectories for location prediction

[...]

Josh Jia-Ching Ying¹, Wang-Chien Lee², Vincent S. Tseng¹•Institutions (2)

National Cheng Kung University¹, Pennsylvania State University²

03 Jan 2014-ACM Transactions on Intelligent Systems and Technology

TL;DR: This article proposes a novel mining-based location prediction approach called Geographic-Temporal-Semantic-based Location Prediction (GTS-LP), which takes into account a user's geographic-triggered intentions, temporal-trigious intentions, and semantic-tracked intentions, to estimate the probability of the user in visiting a location.

...read moreread less

Abstract: In recent years, research on location predictions by mining trajectories of users has attracted a lot of attention. Existing studies on this topic mostly treat such predictions as just a type of location recommendation, that is, they predict the next location of a user using location recommenders. However, an user usually visits somewhere for reasons other than interestingness. In this article, we propose a novel mining-based location prediction approach called Geographic-Temporal-Semantic-based Location Prediction (GTS-LP), which takes into account a user's geographic-triggered intentions, temporal-triggered intentions, and semantic-triggered intentions, to estimate the probability of the user in visiting a location. The core idea underlying our proposal is the discovery of trajectory patterns of users, namely GTS patterns, to capture frequent movements triggered by the three kinds of intentions. To achieve this goal, we define a new trajectory pattern to capture the key properties of the behaviors that are motivated by the three kinds of intentions from trajectories of users. In our GTS-LP approach, we propose a series of novel matching strategies to calculate the similarity between the current movement of a user and discovered GTS patterns based on various moving intentions. On the basis of similitude, we make an online prediction as to the location the user intends to visit. To the best of our knowledge, this is the first work on location prediction based on trajectory pattern mining that explores the geographic, temporal, and semantic properties simultaneously. By means of a comprehensive evaluation using various real trajectory datasets, we show that our proposed GTS-LP approach delivers excellent performance and significantly outperforms existing state-of-the-art location prediction methods.

...read moreread less

128 citations

Proceedings Article•DOI•

PGT: Measuring Mobility Relationship Using Personal, Global and Temporal Factors

[...]

Hongjian Wang¹, Zhenhui Li², Wang-Chien Lee•Institutions (2)

Pennsylvania State University¹, Penn State College of Information Sciences and Technology²

14 Dec 2014

TL;DR: A unified framework is proposed, called PGT, that considers personal, global, and temporal factors to measure the strength of the relationship between two given mobile users and significantly outperforms the state-of-the-art methods.

...read moreread less

Abstract: Rich location data of mobile users collected from smart phones and location-based social networking services enable us to measure the mobility relationship strength based on their interactions in the physical world. A commonly-used measure for such relationship is the frequency of meeting events (i.e., Co-locate at the same time). That is, the more frequently two persons meet, the stronger their mobility relationship is. However, we argue that not all the meeting events are equally important in measuring the mobility relationship and propose to consider personal and global factors to differentiate meeting events. Personal factor models the probability for an individual user to visit a certain location, whereas the global factor models the popularity of a location based on the behavior of general public. In addition, we introduce the temporal factor to further consider the time gaps between meeting events. Accordingly, we propose a unified framework, called PGT, that considers personal, global, and temporal factors to measure the strength of the relationship between two given mobile users. Extensive experiments on real datasets validate our ideas and show that our method significantly outperforms the state-of-the-art methods.

...read moreread less

75 citations

Proceedings Article•DOI•

Distributed Graph Summarization

[...]

Xingjie Liu, Yuanyuan Tian¹, Qi He², Wang-Chien Lee³, John McPherson¹ - Show less +1 more•Institutions (3)

IBM¹, LinkedIn², Pennsylvania State University³

03 Nov 2014

TL;DR: Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes, and this is the first work to study distributed graph summarization methods.

...read moreread less

Abstract: Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the underlying data. However, all existing works in graph summarization are single-process solutions, and as a result cannot scale to large graphs. In this paper, we introduce three distributed graph summarization algorithms to address this problem. Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes. To the best of our knowledge, this is the first work to study distributed graph summarization methods.

...read moreread less

41 citations

Proceedings Article•DOI•

Making B+-tree efficient in PCM-based main memory

[...]

Ping Chi¹, Wang-Chien Lee¹, Yuan Xie¹•Institutions (1)

Pennsylvania State University¹

11 Aug 2014

TL;DR: This paper proposes three different schemes that can efficiently improve the performance, reduce the memory energy consumption, and improve the lifetime for PCM memory by making B+-tree PCM-friendly by reducing the write accesses.

...read moreread less

Abstract: Phase change memory (PCM) is a promising technology for building future large-scale and low-power main memory systems. Main memory databases (MMDBs) can benefit from the high density of PCM. However, its long write latency, high write energy, and limited lifetime, bring challenges to database algorithm design for PCM-based memory systems. In this paper, we focus on making B+-tree PCM-friendly by reducing the write accesses to PCM. We propose three different schemes. Experimental results show that they can efficiently improve the performance, reduce the memory energy consumption, and improve the lifetime for PCM memory.

...read moreread less

38 citations

Journal Article•DOI•

Authenticating Location-Based Skyline Queries in Arbitrary Subspaces

[...]

Xin Lin¹, Jianliang Xu², Haibo Hu², Wang-Chien Lee³•Institutions (3)

East China Normal University¹, Hong Kong Baptist University², Pennsylvania State University³

01 Jun 2014-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A prefetching-based approach is developed that enables clients to compute new LASQ results locally during movement, without frequently contacting the server for query re-evaluation, and a basic Merkle Skyline R-tree method and a novel Partial S4- tree method to authenticate one-shot LASZs are proposed.

...read moreread less

Abstract: With the ever-increasing use of smartphones and tablet devices, location-based services (LBSs) have experienced explosive growth in the past few years. To scale up services, there has been a rising trend of outsourcing data management to Cloud service providers, which provide query services to clients on behalf of data owners. However, in this data-outsourcing model, the service provider can be untrustworthy or compromised, thereby returning incorrect or incomplete query results to clients, intentionally or not. Therefore, empowering clients to authenticate query results is imperative for outsourced databases. In this paper, we study the authentication problem for location-based arbitrary-subspace skyline queries (LASQs), which represent an important class of LBS applications. We propose a basic Merkle Skyline R-tree method and a novel Partial S4-tree method to authenticate one-shot LASQs. For the authentication of continuous LASQs, we develop a prefetching-based approach that enables clients to compute new LASQ results locally during movement, without frequently contacting the server for query re-evaluation. Experimental results demonstrate the efficiency of our proposed methods and algorithms under various system settings.

...read moreread less

36 citations

Journal Article•DOI•

Indexing spatial data in cloud data managements

[...]

Ling-Yin Wei¹, Ya-Ting Hsu², Wen-Chih Peng², Wang-Chien Lee³•Institutions (3)

Academia Sinica¹, National Chiao Tung University², Pennsylvania State University³

01 Dec 2014-Pervasive and Mobile Computing

TL;DR: This paper presents a novel key design based on an R + -tree ( KR + -index) for retrieving skewed spatial data efficiently and shows that the KR -index outperforms the-state-of-the-art methods.

...read moreread less

35 citations

Book Chapter•DOI•

Mining Correlation Patterns among Appliances in Smart Home Environment

[...]

Yi-Cheng Chen¹, Chien Chih Chen², Wen-Chih Peng², Wang-Chien Lee³•Institutions (3)

Tamkang University¹, National Chiao Tung University², Pennsylvania State University³

13 May 2014

TL;DR: A novel algorithm, namely, Correlation Pattern Miner (CoPMiner), is developed to capture the usage patterns and correlations among appliances probabilistically and can reduce the search space effectively and efficiently.

...read moreread less

Abstract: Since the great advent of sensor technology, the usage data of appliances in a house can be logged and collected easily today. However, it is a challenge for the residents to visualize how these appliances are used. Thus, mining algorithms are much needed to discover appliance usage patterns. Most previous studies on usage pattern discovery are mainly focused on analyzing the patterns of single appliance rather than mining the usage correlation among appliances. In this paper, a novel algorithm, namely, Correlation Pattern Miner (CoPMiner), is developed to capture the usage patterns and correlations among appliances probabilistically. With several new optimization techniques, CoPMiner can reduce the search space effectively and efficiently. Furthermore, the proposed algorithm is applied on a real-world dataset to show the practicability of correlation pattern mining.

...read moreread less

34 citations

Book Chapter•DOI•

Mining GPS Data for Trajectory Recommendation

[...]

Peifeng Yin¹, Mao Ye, Wang-Chien Lee¹, Zhenhui Li¹•Institutions (1)

Pennsylvania State University¹

13 May 2014

TL;DR: This work proposed a trajectory recommendation framework and developed three recommendation methods, namely, Activity-Based Recommendation (ABR), GPS-Based recommendation (GBR) and Hybrid Recommendation, which turned out the hybrid solution displays the best performance.

...read moreread less

Abstract: The wide use of GPS sensors in smart phones encourages people to record their personal trajectories and share them with others in the Internet. A recommendation service is needed to help people process the large quantity of trajectories and select potentially interesting ones. The GPS trace data is a new format of information and few works focus on building user preference profiles on it. In this work we proposed a trajectory recommendation framework and developed three recommendation methods, namely, Activity-Based Recommendation (ABR), GPS-Based Recommendation (GBR) and Hybrid Recommendation. The ABR recommends trajectories purely relying on activity tags. For GBR, we proposed a generative model to construct user profiles based on GPS traces. The Hybrid recommendation combines the ABR and GBR. We finally conducted extensive experiments to evaluate these proposed solutions and it turned out the hybrid solution displays the best performance.

...read moreread less

20 citations

Book Chapter•DOI•

Two Sides of a Coin: Separating Personal Communication and Public Dissemination Accounts in Twitter

[...]

Peifeng Yin¹, Nilam Ram¹, Wang-Chien Lee¹, Conrad S. Tucker¹, Shashank Khandelwal¹, Marcel Salathé¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

13 May 2014

TL;DR: This paper categorizes twitter accounts into two types, namely Personal Communication Account (PCA) and Public Dissemination Account (PDA), and develops probabilistic models based on these features to identify PDAs.

...read moreread less

Abstract: There are millions of accounts in Twitter. In this paper, we categorize twitter accounts into two types, namely Personal Communication Account (PCA) and Public Dissemination Account (PDA). PCAs are accounts operated by individuals and are used to express that individual’s thoughts and feelings. PDAs, on the other hand, refer to accounts owned by non-individuals such as companies, governments, etc. Generally, Tweets in PDA (i) disseminate a specific type of information (e.g., job openings, shopping deals, car accidents) rather than sharing an individual’s personal life; and (ii) may be produced by non-human entities (e.g., bots). We aim to develop techniques for identifying PDAs so as to (i) facilitate social scientists to reduce “noise” in their study of human behaviors, and (ii) to index them for potential recommendation to users looking for specific types of information. Through analysis, we find these two types of accounts follow different temporal, spatial and textual patterns. Accordingly we develop probabilistic models based on these features to identify PDAs. We also conduct a series of experiments to evaluate those algorithms for cleaning the Twitter data stream.

...read moreread less

15 citations

Posted Content•

Geo-Social Group Queries with Minimum Acquaintance Constraint

[...]

Qijun Zhu, Haibo Hu, Cheng Xu, Jianliang Xu, Wang-Chien Lee - Show less +1 more

28 Jun 2014-arXiv: Databases

TL;DR: Zhang et al. as discussed by the authors proposed a new family of geo-social group queries with minimum acquaintance constraint (GSGQs), which are more appealing than existing geo-Social group queries in terms of producing a cohesive group that guarantees the worst-case acquaintance level.

...read moreread less

Abstract: The prosperity of location-based social networking services enables geo-social group queries for group-based activity planning and marketing. This paper proposes a new family of geo-social group queries with minimum acquaintance constraint (GSGQs), which are more appealing than existing geo-social group queries in terms of producing a cohesive group that guarantees the worst-case acquaintance level. GSGQs, also specified with various spatial constraints, are more complex than conventional spatial queries; particularly, those with a strict $k$NN spatial constraint are proved to be NP-hard. For efficient processing of general GSGQ queries on large location-based social networks, we devise two social-aware index structures, namely SaR-tree and SaR*-tree. The latter features a novel clustering technique that considers both spatial and social factors. Based on SaR-tree and SaR*-tree, efficient algorithms are developed to process various GSGQs. Extensive experiments on real-world Gowalla and Dianping datasets show that our proposed methods substantially outperform the baseline algorithms based on R-tree.

...read moreread less

Proceedings Article•DOI•

Increasing the Responsiveness of Recommended Expert Collaborators for Online Open Projects

[...]

Mohammad Y. Allaho¹, Wang-Chien Lee¹•Institutions (1)

Pennsylvania State University¹

03 Nov 2014

TL;DR: Four DoK models are proposed and integrated with three SRI methods under the proposed Expert Ranking (ER) framework to rank the candidate expert collaborators based on their likelihood of collaborating in response to a query formulated by the social network of a query initiator and certain required skills to a project/task.

...read moreread less

Abstract: We consider the experts recommendation problem for open collaborative projects in large-scale Open Source Software (OSS) communities. In large-scale online community, recommending expert collaborators to a project coordinator or lead developer has two prominent challenges: (i) the "cold shoulder"' problem, which is the lack of interest from the experts to collaborate and share their skills, and (ii) the "cold start" problem, which is an issue with community members who has scarce data history. In this paper, we consider the Degree of Knowledge (DoK) which imposes the knowledge of the skills factor, and the Social Relative Importance (SRI) which imposes the social distance factor to tackle the aforementioned challenges. We propose four DoK models and integrate them with three SRI methods under our proposed Expert Ranking (ER) framework to rank the candidate expert collaborators based on their likelihood of collaborating in response to a query formulated by the social network of a query initiator and certain required skills to a project/task. We evaluate our proposal using a dataset collected from Github.com, which is one of the most fast-growing, large-scale online OSS community. In addition, we test the models under different data scarcity levels. The experiment shows promising results of recommending expert collaborators who tend to make real collaborations to projects.

...read moreread less

Proceedings Article•DOI•

It Takes Two to Tango: Exploring Social Tie Development with Both Online and Offline Interactions.

[...]

Peifeng Yin¹, Qi He², Xingjie Liu, Wang-Chien Lee¹•Institutions (2)

Pennsylvania State University¹, LinkedIn²

01 Jan 2014

TL;DR: This paper analyzes the social interactions of users and investigates the development of their social ties using data trail of ‘how social ties grow’ left in mobile and social networking services and develops a Social‐aware Hidden Markov Model (SaHMM) that explicitly takes into account the factor of common friends in measure of the social tie development.

...read moreread less

Abstract: Understanding social tie development among users is crucial for user engagement in social networking services. In this paper, we analyze the social interactions, both online and offline, of users and investigate the development of their social ties using data trail of 'how social ties grow' left in mobile and social networking services. To the best of our knowledge, this is the first research attempt on studying social tie development by considering both online and offline interactions in a heterogeneous yet realistic relationship. In this study, we aim to answer three key questions: i is there a correlation between online and offline interactions? ii how is the social tie developed via heterogeneous interaction channels? and iii would the development of social tie between two users be affected by their common friends? To achieve our goal, we develop a Social-aware Hidden Markov Model SaHMM that explicitly takes into account the factor of common friends in measure of the social tie development. Our experiments show that, comparing with results obtained using HMM and other heuristic methods, the social tie development captured by our SaHMM is significantly more consistent to lifetime profiles of users.

...read moreread less

Proceedings Article•DOI•

Exploring Legal Patent Citations for Patent Valuation

[...]

Shuting Wang¹, Zhen Lei¹, Wang-Chien Lee¹•Institutions (1)

Pennsylvania State University¹

03 Nov 2014

TL;DR: It is argued that patent citations can either be technological citations that indicate knowledge transfer or be legal citations that delimit the legal scope of citing patents, and a probabilistic citation network based algorithm and a prediction model for patent valuation are proposed.

...read moreread less

Abstract: Effective patent valuation is important for patent holders. Forward patent citations, widely used in assessing patent value, have been considered as reflecting knowledge flows, just like paper citations. However, patent citations also carry legal implication, which is important for patent valuation. We argue that patent citations can either be technological citations that indicate knowledge transfer or be legal citations that delimit the legal scope of citing patents. In this paper, we first develop citation-network based methods to infer patent quality measures at either the legal or technological dimension. Then we propose a probabilistic mixture approach to incorporate both the legal and technological dimensions in patent citations, and an iterative learning process that integrates a temporal decay function on legal citations, a probabilistic citation network based algorithm and a prediction model for patent valuation. We learn all the parameters together and use them for patent valuation. We demonstrate the effectiveness of our approach by using patent maintenance status as an indicator of patent value and discuss the insights we learned from this study.

...read moreread less

Journal Article•DOI•

Energy-efficient and cost-effective web API invocations with transfer size reduction for mobile mashup applications

[...]

Chen-Che Huang¹, Jiun-Long Huang¹, Chin-Liang Tsai¹, Guan-Zhong Wu¹, Chia-Min Chen¹, Wang-Chien Lee² - Show less +2 more•Institutions (2)

National Chiao Tung University¹, Pennsylvania State University²

01 Apr 2014-Wireless Networks

TL;DR: An API query language that allows mobile mashup applications to readily specify and obtain desired information by instructing a proxy to filter unnecessary information returned from Web API servers is designed and an image multi-get module is devised, which results in mobile mashups applications with smaller transfer sizes.

...read moreread less

Abstract: Recently, the proliferation of smartphones and the extensive coverage of wireless networks have enabled numerous mobile users to access Web resources with smartphones. Mobile mashup applications are very attractive to smartphone users due to specialized services and user-friendly GUIs. However, to offer new services through the integration of Web resources via Web API invocations, mobile mashup applications suffer from high energy consumption and long response time. In this paper, we propose a proxy system and two techniques to reduce the size of data transfer, thereby enabling mobile mashup applications to achieve energy-efficient and cost-effective Web API invocations. Specifically, we design an API query language that allows mobile mashup applications to readily specify and obtain desired information by instructing a proxy to filter unnecessary information returned from Web API servers. We also devise an image multi-get module, which results in mobile mashup applications with smaller transfer sizes by combining multiple images and adjusting the quality, scale, or resolution of the images. With the proposed proxy and techniques, a mobile mashup application can rapidly retrieve Web resources via Web API invocations with lower energy consumption due to a smaller number of HTTP requests and responses as well as smaller response bodies. Experimental results show that the proposed proxy system and techniques significantly reduce transfer size, response time, and energy consumption of mobile mashup applications.

...read moreread less

Proceedings Article•DOI•

Trends and behavior of developers in open collaborative software projects

[...]

Mohammad Y. Allaho¹, Wang-Chien Lee¹•Institutions (1)

Pennsylvania State University¹

12 Mar 2014

TL;DR: An extensive analysis on the developers of Open Source Software (OSS) projects is conducted, finding that a significant ratio of developers share the same affiliation and location in a team for a project that is being developed by remote collaborators.

...read moreread less

Abstract: We conduct an extensive analysis on the developers of Open Source Software (OSS) projects. Our goal is to discover trends that govern the developers' behavior in contributing to OSS projects. To achieve our goal, we define and analyze a set of developer and OSS project features. Moreover, we study the behavior of the developers on selecting OSS projects to participate in by analyzing the project features that dictate the developers' selection. In addition, we study the difference between developers who seek a job and who do not seek a job in developing social ties. We, also, analyze the developers' affiliation (e.g., corporate, university, Institute, etc.) and location (e.g., city) statistics. It is found that a significant ratio of developers share the same affiliation and location in a team for a project that is being developed by remote collaborators. We use a dataset collected from Github.com, which is one of the most fast-growing and large-scale online OSS community. This study is substantial for future works of recommender systems targeting the OSS community.

...read moreread less

Book Chapter•DOI•

Patent Evaluation Based on Technological Trajectory Revealed in Relevant Prior Patents

[...]

Sooyoung Oh¹, Zhen Lei¹, Wang-Chien Lee¹, John Yen¹•Institutions (1)

Pennsylvania State University¹

13 May 2014

TL;DR: Experimental results show that the models created based on the proposed approach significantly enhance those using the baseline features or patent backward citations, and also exploit trends in temporal patterns of relevant prior patents, which are highly related to patent values.

...read moreread less

Abstract: It is a challenging task for firms to assess the importance of a patent and identify valuable patents as early as possible Counting the number of citations received is a widely used method to assess the value of a patent However, recently granted patents have few citations received, which makes the use of citation counts infeasible In this paper, we propose a novel idea to evaluate the value of new or recently granted patents using recommended relevant prior patents Our approach is to exploit trends in temporal patterns of relevant prior patents, which are highly related to patent values We evaluate the proposed approach using two patent value evaluation tasks with a large-scale collection of US patents Experimental results show that the models created based on our idea significantly enhance those using the baseline features or patent backward citations

...read moreread less

Book Chapter•DOI•

SKY R-tree: An Index Structure for Distance-Based Top-k Query

[...]

Yuya Sasaki¹, Wang-Chien Lee², Takahiro Hara¹, Shojiro Nishio¹•Institutions (2)

Osaka University¹, Pennsylvania State University²

21 Apr 2014

TL;DR: A new index structure and query processing algorithms for distance-based top-k queries, called SKY R-tree, which drives on the strengths of R- tree and Skyline algorithm to efficiently prune the search space by exploring both the spatial proximity and non-spatial attributes.

...read moreread less

Abstract: Searches for objects associated with location information and non-spatial attributes have increased significantly over the years. To address this need, a top-k query may be issued by taking into account both the location information and non-spatial attributes. This paper focuses on a distance-based top-k query which retrieves the best objects based on distance from candidate objects to a query point as well as other non-spatial attributes. In this paper, we propose a new index structure and query processing algorithms for distance-based top-k queries. This new index, called SKY R-tree, drives on the strengths of R-tree and Skyline algorithm to efficiently prune the search space by exploring both the spatial proximity and non-spatial attributes. Moreover, we propose a variant of SKY R-tree, called S2KY R-tree which incorporates a similarity measure of non-spatial attributes. We demonstrate, through extensive experimentation, that our proposals perform very well in terms of I/O costs and CPU time.

...read moreread less

Proceedings Article•DOI•

Recommending missing citations for newly granted patents

[...]

Sooyoung Oh¹, Zhen Lei¹, Wang-Chien Lee¹, John Yen²•Institutions (2)

Pennsylvania State University¹, Penn State College of Information Sciences and Technology²

10 Mar 2014

TL;DR: This paper proposes a recommendation system for missing citations for newly granted patents, based on the patent citation network of a newly granted query patent, which ranks candidate patents via a RankSVM model learned by using those relevancy scores as features.

...read moreread less

Abstract: The U.S. recently adopted a post-grant opposition procedure to encourage third parties to challenge the validity of newly granted patents by providing relevant prior patents that are missed during patent examination (i.e., missing citations). In this paper, we propose a recommendation system for missing citations for newly granted patents. The recommendation system, based on the patent citation network of a newly granted query patent, focuses on paths that start with the references of the query patent in the network. Our approach is to identify the relevancy of a candidate patent to the query patent by its citation relationship (paths) that are distinguished based on the direction, topology and semantics of the paths in the network. We consider six different types of paths between a candidate patent and a query patent based on their citation relationship and define a relevancy score for each path type. Accordingly, we rank candidate patents via a RankSVM model learned by using those relevancy scores as features. The experimental results show our approach significantly improves the average precision and recall performance compared to two baseline methods, i.e., Katz distance and text similarity.

...read moreread less

Book Chapter•DOI•

Distributed Entity Resolution Based on Similarity Join for Large-Scale Data Clustering

[...]

Tiezheng Nie¹, Wang-Chien Lee², Derong Shen¹, Ge Yu¹, Yue Kou¹ - Show less +1 more•Institutions (2)

Northeastern University¹, Pennsylvania State University²

16 Jun 2014

TL;DR: A cache-based algorithm which cluster entities with similar pairs based on the Disjoin Set algorithm and are also designed for MapReduce framework and can achieve more efficiency than previous algorithms on the entity resolution and clustering.

...read moreread less

Abstract: Entity resolution has been widely used in data mining applications to find similar records. However, the increasing scale and complexity of data has restricted the performance of entity resolution. In this paper, we propose a novel entity resolution framework that clusters large-scale data with distributed entity resolution method. We model the clustering problem as finding similarity sub connected graphs from records. Firstly, our approach finds pairs of records whose similarities are above a given threshold based on appjoin algorithm which extends the ppjoin algorithm and are executed on MapReduce framework. Then, we propose a cache-based algorithm which cluster entities with similar pairs based on the Disjoin Set algorithm and are also designed for MapReduce framework. Experimental results on real dataset show that our algorithms can achieve more efficiency than previous algorithms on the entity resolution and clustering.

...read moreread less

Journal Article•DOI•

Querying Distributed Spatial Datasets with Unknown Regions

[...]

Qijun Zhu¹, Dik Lun Lee¹, Wang-Chien Lee²•Institutions (2)

Hong Kong University of Science and Technology¹, Pennsylvania State University²

01 Oct 2014-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The site-based and the area-based approach for efficiently processing range and k-nearest-neighbor queries on distributed BSDs are developed and an optimal division is proved and a practical heuristic is derived to partition a query and select the best processing site for each partition, hence achieving even better efficiency.

...read moreread less

Abstract: This paper studies the problem of querying Bounded Spatial Datasets (BSDs). A BSD contains objects with known locations, and unknown regions, each of which bounds an unknown number of objects, within a coverage area. We consider applications where each BSD is hosted on a site connected to a communication network and the BSDs overlap in their coverage areas. The challenge is to query the distributed BSDs to retrieve all objects and to minimize the unknown regions which may contain objects satisfying the query, while minimizing the data transmission volume and number of interactions between the query client and the sites. We develop the site-based approach and the area-based approach for efficiently processing range and kNN queries on distributed BSDs. Accordingly, optimal site selection and the corresponding site querying methods are important problems studied in this paper. In the area-based approach, we prove an optimal division and derive a practical heuristic to partition a query and select the best processing site for each partition, hence achieving even better efficiency than the site-based approach. Simulation results based on three real spatial datasets show that our proposed approaches significantly outperform the baseline in terms of data transmission volume and the number of interactions.

...read moreread less

Proceedings Article•DOI•

Exploring technological trends for patent evaluation

[...]

Shuting Wang¹, Wang-Chien Lee¹, Zhen Lei¹, Xianliang Zhang¹, Yu-Hsuan Kuo¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

10 Mar 2014

TL;DR: This paper proposes to identify patent technological trends, which carries information about technology evolution and trajectories among patents, to enable more effective and precise patent evaluation and demonstrates that the identified technological trends are able to capture patent value precisely.

...read moreread less

Abstract: Patents are very important intangible assets that protect firm technologies and maintain market competitiveness. Thus, patent evaluation is critical for firm business strategy and innovation management. Currently patent evaluation mostly relies on some meta information of patents, such as number of forward/backward citations and number of claims. In this paper, we propose to identify patent technological trends, which carries information about technology evolution and trajectories among patents, to enable more effective and precise patent evaluation. We explore features to capture both the value of trends and the quality of patents within a trend, and perform patent evaluation to validate the extracted trends and features using patents in the United States Patent and Trademark Office (USPTO) dataset. Experimental results demonstrate that the identified technological trends are able to capture patent value precisely. With the proposed trend related features extracted from our identified trends, we can improve patent evaluation performance significantly over the baseline using conventional features.

...read moreread less

Proceedings Article•DOI•

Social influence-aware reverse nearest neighbor search

[...]

Hui-Ju Hung¹, De-Nian Yang¹, Wang-Chien Lee²•Institutions (2)

Academia Sinica¹, Pennsylvania State University²

10 Mar 2014

TL;DR: A framework for business location planning that takes into account both factors of geographical proximity and social influence is proposed and a suite of algorithms based on Targeted Region-oriented strategy is designed to enhance the processing efficiency.

...read moreread less

Abstract: Business location planning, critical to success of many businesses, can be addressed by reverse nearest neighbors (RNN) query using geographical proximity to the customers as the main metric to find a store location which is the closest to many customers Nevertheless, we argue that other marketing factors such as social influence could be considered in the process of business location planning In this paper, we propose a framework for business location planning that takes into account both factors of geographical proximity and social influence An essential task in this framework is to compute the “influence spread” of RNNs for candidate locations However, excessive computational overhead and long latency hinder its feasibility for our framework Thus, we trade storage overhead for the processing speed by precomputing and storing the social influences between pairs of customers and design a suite of algorithms based on Targeted Region-oriented strategy Various ordering and pruning techniques have been incorporated in these algorithms to enhance the processing efficiency of our framework Experiments also show that the proposed algorithms efficiently support the task of location planning under various parameter settings

...read moreread less