Home
/
Authors
/
Laila Abdelhafeez

Author

Laila Abdelhafeez

Bio: Laila Abdelhafeez is an academic researcher from University of California, Riverside. The author has contributed to research in topics: Computer science & Spatial analysis. The author has an hindex of 1, co-authored 3 publications receiving 18 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Microblogs data management: a survey

[...]

Amr Magdy¹, Laila Abdelhafeez¹, Yunfan Kang¹, Eric Ong¹, Mohamed F. Mokbel² - Show less +1 more•Institutions (2)

University of California, Riverside¹, University of Minnesota²

01 Jan 2020

TL;DR: This paper reviews core components that enable large-scale querying and indexing for microblogs data, and discusses system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems.

...read moreread less

Abstract: Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

...read moreread less

23 citations

Proceedings Article•DOI•

Scalable Spatial GroupBy Aggregations Over Complex Polygons

[...]

Laila Abdelhafeez¹, Amr Magdy¹, Vassilis J. Tsotras¹•Institutions (1)

University of California, Riverside¹

03 Nov 2020

TL;DR: This paper proposes a highly-parallelized query processing framework to efficiently compute the spatial group-by query, which has shown significant superiority over all existing techniques.

...read moreread less

Abstract: This paper studies a spatial group-by query over complex polygons. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within boundaries of each polygon. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques.

...read moreread less

3 citations

Proceedings Article•DOI•

Scalable Spatial Queries in Big Data Systems

[...]

Laila Abdelhafeez

01 Jun 2022

TL;DR: This research focuses on scaling spatial queries in the context of big data systems, to be able to apply complex algorithms on large-scale spatial datasets in a timely manner.

...read moreread less

Abstract: The amount of data in the world is increasing exponentially, a large portion of this data comes from the interactions over mobile devices and the ubiquitous IoT applications. Improving our ability to extract information and insights from these large and complex datasets is crucial to a variety of applications. Our research focuses on scaling spatial queries in the context of big data systems, to be able to apply complex algorithms on large-scale spatial datasets in a timely manner. In particular, this paper studies two spatial queries: (a) spatial group-by polygon query which groups input data points by a given complex polygon set (e.g. world countries), and (b) polygonization query which polygonizes an input set of line strings (e.g. USA road network).

...read moreread less

Journal Article•DOI•

SGPAC: generalized scalable spatial GroupBy aggregations over complex polygons

[...]

Laila Abdelhafeez, Amr Magdy, Vassilis J. Tsotras

21 Mar 2023-Geoinformatica

TL;DR: In this article , the spatial group-by-query over complex polygons is studied and a highly-parallelized query processing framework is proposed to efficiently compute the spatial groups-by query on highly skewed spatial data.

...read moreread less

Abstract: Abstract This paper studies the spatial group-by query over complex polygons. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within the boundaries of each polygon. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query on highly skewed spatial data. We also propose an effective query optimizer that adaptively assigns the appropriate processing scheme based on the query polygons. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques.

...read moreread less

Proceedings Article•DOI•

DLEEL: Multi-Predicate Spatial Queries on User-generated Streaming Data

[...]

Abdulaziz Almaslukh¹, Laila Abdelhafeez¹, Amr Magdy¹•Institutions (1)

University of California, Riverside¹

20 Apr 2020

TL;DR: DLEEL is a research system that supports scalable spatial queries with multiple predicates on user-generated data streams, such as social media streams, and is the first to address personalized queries on streaming spatial- social data through novel low-overhead indexing that scales for large amounts of data and users.

...read moreread less

Abstract: This paper demonstrates DLEEL; a research system that supports scalable spatial queries with multiple predicates on user-generated data streams, such as social media streams. Supported queries include spatial-social queries and spatial-keyword queries, which are popular in different applications but have never been addressed in the challenging environment of streaming data, where data arrives with excessively high rates. DLEEL distinguishes itself with three novel contributions: (1) Indexing spatial-social data in for personalized real-time search: DLEEL is the first to address personalized queries on streaming spatial- social data through novel low-overhead indexing that scales for large amounts of data and users. The novel indexing has a hybrid storage architecture that trades off indexing overhead, memory consumption, and query latency. (2) Indexing spatial-keyword data for real-time search: DLEEL is the first to enrich existing spatial-keyword indexes with novel streaming data components. The new components reveal performance losses and gains from a system perspective, trading off the system overhead with flexibility to support a variety of queries. (3) Scalable query processing: DLEEL exploits the indexes content to smartly prune the search space on multiple dimensions and support efficient query latency for its different queries on excessive number of data records. DLEEL is demonstrated using a stream of 5 billions real tweets collected from Twitter APIs and real query locations obtained from a popular web search engine. DLEEL has shown superior performance with serving incoming queries with an average latency of few milliseconds while digesting hundreds of thousands of data records every second.

...read moreread less

Cited by

PDF

Open Access

More filters

Posted Content•

AsterixDB: A Scalable, Open Source BDMS

[...]

Sattam Alsubaiee¹, Yasser Altowim¹, Hotham Altwaijry¹, Alexander Behm², Vinayak Borkar¹, Yingyi Bu¹, Michael J. Carey¹, Inci Cetindil¹, Madhusudan Cheelangi³, Khurram Faraaz⁴, Eugenia Gabrielova¹, Raman Grover¹, Zachary Heilbron¹, Young-Seok Kim¹, Chen Li¹, Guangqiang Li, Ji Mahn Ok¹, Nicola Onose, Pouria Pirzadeh¹, Vassilis J. Tsotras⁵, Rares Vernica⁶, Jian Wen⁷, Till Westmann⁷ - Show less +19 more•Institutions (7)

University of California, Irvine¹, Cloudera², Google³, IBM⁴, University of California, Riverside⁵, Hewlett-Packard⁶, Oracle Corporation⁷

02 Jul 2014-arXiv: Databases

TL;DR: This paper is the first complete description of the resulting open source AsterixDB system, covering the system's data model, its query language, and its software architecture.

...read moreread less

Abstract: AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.

...read moreread less

168 citations

Proceedings Article•DOI•

Multitask Learning for Fine-Grained Twitter Sentiment Analysis

[...]

Georgios Balikas¹, Simon Moura¹, Massih-Reza Amini¹•Institutions (1)

Grenoble Institute of Technology¹

12 Jul 2017-arXiv: Information Retrieval

TL;DR: This study demonstrates the potential of multitask models on this type of problems and improves the state-of-the-art results in the fine-grained sentiment classification problem.

...read moreread less

Abstract: Traditional sentiment analysis approaches tackle problems like ternary (3-category) and fine-grained (5-category) classification by learning the tasks separately. We argue that such classification tasks are correlated and we propose a multitask approach based on a recurrent neural network that benefits by jointly learning them. Our study demonstrates the potential of multitask models on this type of problems and improves the state-of-the-art results in the fine-grained sentiment classification problem.

...read moreread less

53 citations

Proceedings Article•DOI•

Parallel Semantic Trajectory Similarity Join

[...]

Lisi Chen¹, Shuo Shang¹, Christian S. Jensen², Bin Yao³, Panos Kalnis⁴ - Show less +1 more•Institutions (4)

University of Electronic Science and Technology of China¹, Aalborg University², Shanghai Jiao Tong University³, King Abdullah University of Science and Technology⁴

20 Apr 2020

TL;DR: An efficient divide-and-conquer algorithm is proposed to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity.

...read moreread less

Abstract: Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm’s per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8–12.

...read moreread less

32 citations

Journal Article•DOI•

Research Progress and Development Trend of Social Media Big Data (SMBD): Knowledge Mapping Analysis Based on CiteSpace

[...]

Ziyi Wang, Debin Ma, Ru Pang, Fan Xie, Jingxiang Zhang, Dongqi Sun - Show less +2 more

26 Oct 2020-ISPRS international journal of geo-information

TL;DR: Web of Science core collection was taken as the data source, and traditional statistical methods and CiteSpace software were used to carry out the scientometrics analysis of SMBD, which showed the research status, hotspots and trends in this field.

...read moreread less

Abstract: Social Media Big Data (SMBD) is widely used to serve the economic and social development of human beings. However, as a young research and practice field, the understanding of SMBD in academia is not enough and needs to be supplemented. This paper took Web of Science (WoS) core collection as the data source, and used traditional statistical methods and CiteSpace software to carry out the scientometrics analysis of SMBD, which showed the research status, hotspots and trends in this field. The results showed that: (1) More and more attention has been paid to SMBD research in academia, and the number of journals published has been increased in recent years, mainly in subjects such as Computer Science Engineering and Telecommunications. The results were published primarily in IEEE Access Sustainability and Future Generation Computer Systems the International Journal of eScience and so on; (2) In terms of contributions, China, the United States, the United Kingdom and other countries (regions) have published the most papers in SMBD, high-yield institutions also mainly from these countries (regions). There were already some excellent teams in the field, such as the Wanggen Wan team at Shanghai University and Haoran Xie team from City University of Hong Kong; (3) we studied the hotspots of SMBD in recent years, and realized the summary of the frontier of SMBD based on the keywords and co-citation literature, including the deep excavation and construction of social media technology, the reflection and concerns about the rapid development of social media, and the role of SMBD in solving human social development problems. These studies could provide values and references for SMBD researchers to understand the research status, hotspots and trends in this field.

...read moreread less

29 citations

Journal Article•DOI•

Top-k term publish/subscribe for geo-textual data streams

[...]

Lisi Chen¹, Shuo Shang¹, Christian S. Jensen², Jianliang Xu³, Panos Kalnis⁴, Bin Yao⁵, Ling Shao - Show less +3 more•Institutions (5)

University of Electronic Science and Technology of China¹, Aalborg University², Hong Kong Baptist University³, King Abdullah University of Science and Technology⁴, Shanghai Jiao Tong University⁵

09 Mar 2020

TL;DR: This work proposes solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers of SST and RST subscriptions over a realistic stream of spatio-temporal documents.

...read moreread less

Abstract: Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of general top-k term subscriptions over streams of spatio-temporal documents: region-based top-k spatial-temporal term (RST) subscriptions and similarity-based top-k spatio-temporal term (SST) subscriptions. RST subscriptions continuously maintain the top-k most popular trending terms within a user-defined region. SST subscriptions free users from defining a region and maintain top-k locally popular terms based on a ranking function that combines term frequency, term recency, and term proximity. To solve the problem, we propose solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers of SST and RST subscriptions over a realistic stream of spatio-temporal documents. The performance of our proposed solutions is studied in extensive experiments using two spatio-temporal datasets.

...read moreread less

29 citations

1
2
3
4
…
5
6

Collapse