Home
/
Authors
/
Chuan Lei

Author

Chuan Lei

Other affiliations: Worcester Polytechnic Institute

Bio: Chuan Lei is an academic researcher from IBM. The author has contributed to research in topics: Complex event processing & Graph (abstract data type). The author has an hindex of 11, co-authored 33 publications receiving 259 citations. Previous affiliations of Chuan Lei include Worcester Polytechnic Institute.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

ATHENA++: natural language querying for complex nested SQL queries

[...]

Jaydeep Sen¹, Chuan Lei¹, Abdul Quamar¹, Fatma Ozcan¹, Vasilis Efthymiou¹, Ayushi Dalmia¹, Greg Stager¹, Ashish Mittal¹, Diptikalyan Saha¹, Karthik Sankaranarayanan¹ - Show less +6 more•Institutions (1)

IBM¹

01 Jul 2020

TL;DR: This paper presents ATHENA++, an end-to-end system that can answer complex queries in natural language by translating them into nested SQL queries, and combines linguistic patterns from NL queries with deep domain reasoning using ontologies to enable nested query detection and generation.

...read moreread less

Abstract: Natural Language Interfaces to Databases (NLIDB) systems eliminate the requirement for an end user to use complex query languages like SQL, by translating the input natural language (NL) queries to SQL automatically. Although a significant volume of research has focused on this space, most state-of-the-art systems can at best handle simple select-project-join queries. There has been little to no research on extending the capabilities of NLIDB systems to handle complex business intelligence (BI) queries that often involve nesting as well as aggregation. In this paper, we present Athena++, an end-to-end system that can answer such complex queries in natural language by translating them into nested SQL queries. In particular, Athena++ combines linguistic patterns from NL queries with deep domain reasoning using ontologies to enable nested query detection and generation. We also introduce a new benchmark data set (FIBEN), which consists of 300 NL queries, corresponding to 237 distinct complex SQL queries on a database with 152 tables, conforming to an ontology derived from standard financial ontologies (FIBO and FRO). We conducted extensive experiments comparing Athena++ with two state-of-the-art NLIDB systems, using both FIBEN and the prominent Spider benchmark. Athena++ consistently outperforms both systems across all benchmark data sets with a wide variety of complex queries, achieving 88.33% accuracy on FIBEN benchmark, and 78.89% accuracy on Spider benchmark, beating the best reported accuracy results on the dev set by 8%.

...read moreread less

63 citations

Proceedings Article•DOI•

Scalable Pattern Sharing on Event Streams

[...]

Medhabi Ray¹, Chuan Lei¹, Elke A. Rundensteiner¹•Institutions (1)

Worcester Polytechnic Institute¹

14 Jun 2016

TL;DR: The SPASS optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time-based event correlations among queries and finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound.

...read moreread less

Abstract: Complex Event Processing (CEP) has emerged as a technology of choice for high performance event analytics in time-critical decision-making applications. Yet it is becoming increasingly difficult to support high-performance event processing due to the rising number and complexity of event pattern queries and the increasingly high velocity of event streams. In this work we design the SPASS framework that successfully tackles these demanding CEP workloads. Our SPASS optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time-based event correlations among queries. The problem of pattern sharing is shown to be NP-hard by reducing the Minimum Substring Cover problem to our CEP pattern sharing problem. The SPASS optimizer is designed that finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. To execute this shared pattern plan, the SPASS runtime employs stream transactions that assure concurrent shared maintenance and re-use of sub-patterns across queries. Our experimental study confirms that the SPASS framework achieves over 16 fold performance improvement for a wide range of experiments compared to the state-of-the-art solution.

...read moreread less

52 citations

Proceedings Article•DOI•

State of the Art and Open Challenges in Natural Language Interfaces to Data

[...]

Fatma Őzcan¹, Abdul Quamar¹, Jaydeep Sen¹, Chuan Lei¹, Vasilis Efthymiou¹ - Show less +1 more•Institutions (1)

IBM¹

11 Jun 2020

TL;DR: This tutorial will review natural language interface solutions in terms of their interpretation approach, as well as the complexity of the queries they can generate, and discuss open research challenges.

...read moreread less

Abstract: Recent advances in natural language understanding and processing resulted in renewed interest in natural language based interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems only allowed simple selection queries over a single table, some recent work supports complex BI queries, with many joins and aggregation, and even nested queries. There are various approaches in the literature for interpreting user's natural language query. Rule-based systems try to identify the entities in the query, and understand the intended relationships between those entities. Recent years have seen the emergence and popularity of neural network based approaches which try to interpret the query holistically, by learning the patterns. In this tutorial, we will review these natural language interface solutions in terms of their interpretation approach, as well as the complexity of the queries they can generate. We will also discuss open research challenges.

...read moreread less

32 citations

Journal Article•DOI•

Multi-route query processing and optimization

[...]

Rimma V. Nehme¹, Karen Works², Chuan Lei², Elke A. Rundensteiner², Elisa Bertino³ - Show less +1 more•Institutions (3)

Microsoft¹, Worcester Polytechnic Institute², Purdue University³

01 May 2013-Journal of Computer and System Sciences

TL;DR: This paper presents ''Query Mesh'' (or QM), a practical alternative to state-of-the-art data stream processing approaches, and proposes several cost-based query optimization heuristics designed to effectively find nearly optimal QMs.

...read moreread less

27 citations

Proceedings Article•DOI•

Medical Entity Disambiguation Using Graph Neural Networks

[...]

Alina Vretinaris¹, Chuan Lei¹, Vasilis Efthymiou¹, Xiao Qin¹, Fatma Ozcan² - Show less +1 more•Institutions (2)

IBM¹, Google²

09 Jun 2021

TL;DR: Zhang et al. as mentioned in this paper introduced ED-GNN based on three representative graph neural networks (GraphSAGE, R-GCN, and MAGNN) for medical entity disambiguation.

...read moreread less

Abstract: Medical knowledge bases (KBs), distilled from biomedical literature and regulatory actions, are expected to provide high-quality information to facilitate clinical decision making. Entity disambiguation (also referred to as entity linking) is considered as an essential task in unlocking the wealth of such medical KBs. However, existing medical entity disambiguation methods are not adequate due to word discrepancies between the entities in the KB and the text snippets in the source documents. Recently, graph neural networks (GNNs) have proven to be very effective and provide state-of-the-art results for many real-world applications with graph-structured data. In this paper, we introduce ED-GNN based on three representative GNNs (GraphSAGE, R-GCN, and MAGNN) for medical entity disambiguation. We develop two optimization techniques to fine-tune and improve ED-GNN. First, we introduce a novel strategy to represent entities that are mentioned in text snippets as a query graph. Second, we design an effective negative sampling strategy that identifies hard negative samples to improve the model's disambiguation capability. Compared to the best performing state-of-the-art solutions, our ED-GNN offers an average improvement of 7.3% in terms of F1 score on five real-world datasets.

...read moreread less

25 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

ACM Transactions on Database Systems

[...]

Dan Suciu, Gerhard Weikum

01 Jan 2005-ACM Transactions on Database Systems

TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ Ã ¼ Ã Ã 0

...read moreread less

Abstract: BLOCKIN BLOCKINÒ BLOCKIN× ½¸ÔÔº ¿ßß¿º ¿

...read moreread less

373 citations

Posted Content•

The Web as a Knowledge-base for Answering Complex Questions

[...]

Alon Talmor¹, Jonathan Berant²•Institutions (2)

Allen Institute for Artificial Intelligence¹, Tel Aviv University²

18 Mar 2018-arXiv: Computation and Language

TL;DR: This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset.

...read moreread less

Abstract: Answering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositionality, but only when the information resides in a target knowledge-base. In this paper, we present a novel framework for answering broad and complex questions, assuming answering simple questions is possible using a search engine and a reading comprehension model. We propose to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers. To illustrate the viability of our approach, we create a new dataset of complex questions, ComplexWebQuestions, and present a model that decomposes questions and interacts with the web to compute an answer. We empirically demonstrate that question decomposition improves performance from 20.8 precision@1 to 27.5 precision@1 on this new dataset.

...read moreread less

256 citations

Journal Article•DOI•

Community Detection in Multi-Layer Graphs: A Survey

[...]

Jungeun Kim¹, Jae-Gil Lee¹•Institutions (1)

KAIST¹

03 Dec 2015

TL;DR: This survey provides readers with a comprehensive understanding of community detection in multi-layer graphs and compares the state-of-the-art algorithms with respect to their underlying properties.

...read moreread less

Abstract: Community detection, also known as graph clustering, has been extensively studied in the literature. The goal of community detection is to partition vertices in a complex graph into densely-connected components socalled communities. In recent applications, however, an entity is associated with multiple aspects of relationships, which brings new challenges in community detection. The multiple aspects of interactions can be modeled as a multi-layer graph comprised of multiple interdependent graphs, where each graph represents an aspect of the interactions. Great efforts have therefore been made to tackle the problem of community detection in multi-layer graphs. In this survey, we provide readers with a comprehensive understanding of community detection in multi-layer graphs and compare the state-of-the-art algorithms with respect to their underlying properties.

...read moreread less

162 citations

Journal Article•DOI•

A Big Data-as-a-Service Framework: State-of-the-Art and Perspectives

[...]

Xiaokang Wang¹, Laurence T. Yang¹, Huazhong Liu¹, M. Jamal Deen²•Institutions (2)

Huazhong University of Science and Technology¹, McMaster University²

01 Sep 2018-IEEE Transactions on Big Data

TL;DR: A tensor-based multiple clustering on bicycle renting and returning data is illustrated, which can provide several suggestions for rebalancing of the bicycle-sharing system and some challenges about the proposed framework are discussed.

...read moreread less

Abstract: Due to the rapid advances of information technologies, Big Data, recognized with 4Vs characteristics (volume, variety, veracity, and velocity), bring significant benefits as well as many challenges A major benefit of Big Data is to provide timely information and proactive services for humans The primary purpose of this paper is to review the current state-of-the-art of Big Data from the aspects of organization and representation, cleaning and reduction, integration and processing, security and privacy, analytics and applications, then present a novel framework to provide high-quality so called Big Data-as-a-Service The framework consists of three planes, namely sensing plane, cloud plane and application plane, to systemically address all challenges of the above aspects Also, to clearly demonstrate the working process of the proposed framework, a tensor-based multiple clustering on bicycle renting and returning data is illustrated, which can provide several suggestions for rebalancing of the bicycle-sharing system Finally, some challenges about the proposed framework are discussed

...read moreread less

121 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

Collapse