scispace - formally typeset
Search or ask a question

Showing papers by "Ramez Elmasri published in 2019"


Proceedings ArticleDOI
05 Jun 2019
TL;DR: This paper built Singular Race Models, a novel approach of segmenting the dataset based on race, to train and test single race-based models to increase prediction accuracy and reduce racially inspired bias by considering only one race at a time.
Abstract: As machine learning based predictive systems pervade many aspects of our lives, an inherent bias and unfairness surface from time to time in the form of mispredictions in various domains. Recidivism, the tendency of offenders to reoffend after release from prison on parole, is one such domain where one race-based sub-population has been found to be treated more harshly than others. Current practices have focused on eliminating race information from datasets to reduce the predictive bias. In contrast to this, we built Singular Race Models, a novel approach of segmenting the dataset based on race, to train and test single race-based models to increase prediction accuracy and reduce racially inspired bias by considering only one race at a time. We created Singular Race Models for four different crime categories and compared these with base models created using all crimes and all races. This modeling choice helped us increase accuracy and analyze race related discrimination. A three-layered artificial neural network was utilized to do the heavy weight-lifting of recidivism prediction. With the help of several suitable metrics, in this paper, we demonstrate the increase in predictive accuracy of these Singular Race Models in various crime categories and analyze the causes and the secondary effect on bias.

9 citations


Proceedings ArticleDOI
05 Nov 2019
TL;DR: A novel method to generate road maps using GPS trajectories that is accurate with good coverage area, has a minimum number of vertices and edges, and several details of the road network is proposed.
Abstract: Road maps are important in our personal lives and are widely used in many different applications. Therefore, an up-to-date road map is essential. The huge amount of GPS data collected from moving objects provides an opportunity to generate an up-to-date road map. In this paper, we propose a novel method to generate road maps using GPS trajectories that is accurate with good coverage area, has a minimum number of vertices and edges, and several details of the road network. Our algorithm starts by identifying the locations of intersections using a line simplification algorithm with spatial-constraints and grid-based method. Then, it creates graph connectivity information to connect intersections and build road segments. In addition, our algorithm extracts road features such as turn restrictions, average speed, road length, road type, and the number of cars traveling in a specific portion of the road. To demonstrate the accuracy of our proposed algorithm, we conduct experiments using two real data sets and compare our results with two baseline methods. The comparisons indicate that our algorithm is able to achieve higher F-score in terms of accuracy and generates a detailed road map that is not overly complex.

4 citations


Book ChapterDOI
26 Jun 2019
TL;DR: GloVe, and the combination of word2vec and GloVe were marginally better in terms of F-score, identifying more unique words, and identifying words not seen in the train data.
Abstract: The identification of Multi-Word Expressions (MWEs) is central to resolving ambiguity of phrases. Recent works show that deep learning methods outperform statistical and lexical based approaches. The deep learning approaches mostly use word2vec embedding; our paper aims at comparing the use of word2vec, GloVe, and a combination of the two word embeddings in identifying MWEs. GloVe, and the combination of word2vec and GloVe were marginally better in terms of F-score, identifying more unique words, and identifying words not seen in the train data. GloVe was marginally better at identifying Verbal Multi-Word Expressions (VMWEs) which tend to be the hardest group of MWEs because they can be gappy, which is caused by interleaving of words that are part of the MWE and words that are not part of the MWE. The major purpose of the paper is to compare the use of different word embeddings in identifying MWEs and not to suggest improvements to the state-of-the-art. Future work using different dimensions of word embedding vectors and use of fasttext are suggested.

3 citations


Proceedings ArticleDOI
05 Jun 2019
TL;DR: This paper introduces a framework that demonstrates methods of how two datasets for the same area can be matched to each other even though there are some data discrepancies, and discusses several types of applications that could utilize this framework.
Abstract: Road network map is one of the datasets that are used in many different applications. Many smart cities have more than one Road Network map from different sources (government authorities, private enterprise, or volunteered). Be that as it may, there is a high chance of mismatches between road maps that represent the same area for different reasons. These reasons include: one of the datasets is not updated; datasets have different names for the same road; and so on. As a result, matching the roads in such datasets with each other is challenging. This paper introduces a framework that demonstrates methods of how two datasets for the same area can be matched to each other even though there are some data discrepancies. In addition, it gives an overview of each component of the framework and focuses mainly on the similarity measurements. These measurements are local divergence measurements and global divergence measurement. Local divergence measurements compare two roads from different datasets to each other to see if they are similar or not by deciding if these two roads have a similar shape as well as the same length. On the other hand, global divergence measurement is used in order to ensure that these two roads are similar in the real world, not different roads that happen to be beside each other having similar length and shape. This paper discusses several types of applications that could utilize this framework not only for matching different road maps and unify the information for smart cities usages but also data enrichment and being up-to-date.

1 citations