Showing papers by "Ugo Buy published in 2013"

PDF

Open Access

Proceedings Article•DOI•

Language independent gender classification on Twitter

[...]

Jalal S. Alowibdi¹, Ugo Buy¹, Philip S. Yu¹•Institutions (1)

25 Aug 2013

TL;DR: The approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy, and proves the validity of the approach by examining different classifiers over a large dataset of Twitter profiles.

...read moreread less

Abstract: Online Social Networks (OSNs) generate a huge volume of user-originated texts. Gender classification can serve multiple purposes. For example, commercial organizations can use gender classification for advertising. Law enforcement may use gender classification as part of legal investigations. Others may use gender information for social reasons. Here we explore language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles (e.g., the background color in a user's profile page). Most other methods for gender prediction are typically language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.

...read moreread less

89 citations

Proceedings Article•DOI•

Empirical Evaluation of Profile Characteristics for Gender Classification on Twitter

[...]

Jalal S. Alowibdi¹, Ugo Buy¹, Philip S. Yu¹•Institutions (1)

University of Illinois at Chicago¹

04 Dec 2013

TL;DR: This work explores profile characteristics for gender classification on Twitter and provides a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features.

...read moreread less

Abstract: Online Social Networks (OSNs) provide reliable communication among users from different countries. The volume of texts generated by OSNs is huge and highly informative. Gender classification can serve commercial organizations for advertising, law enforcement for legal investigation, and others for social reasons. Here we explore profile characteristics for gender classification on Twitter. Unlike existing approaches to gender classification that depend heavily on posted text such as tweets, here we study the relative strengths of different characteristics extracted from Twitter profiles (e.g., first name and background color in a user's profile page). Our goal is to evaluate profile characteristics with respect to their predictive accuracy and computational complexity. In addition, we provide a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.

...read moreread less

68 citations

Proceedings Article•DOI•

Preventing database deadlocks in applications

[...]

Mark Grechanik¹, B M Mainul Hossain¹, Ugo Buy¹, Haisheng Wang¹•Institutions (1)

University of Illinois at Chicago¹

18 Aug 2013

TL;DR: A novel approach is created that combines run-time monitoring, which automatically prevents database deadlocks, with static analysis, which detects hold-and-wait cycles that specify how resources are held in contention during executions of SQL statements.

...read moreread less

Abstract: Many organizations deploy applications that use databases by sending Structured Query Language (SQL) statements to them and obtaining data that result from the execution of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases resulting in major performance degradation in these applications. Database engines do not prevent database deadlocks for the same reason that the schedulers of operating system kernels do not preempt processes in a way to avoid race conditions and deadlocks - it is not feasible to find an optimal context switching schedule quickly for multiple processes (and SQL statements), and the overhead of doing it is prohibitive. We created a novel approach that combines run-time monitoring, which automatically prevents database deadlocks, with static analysis, which detects hold-and-wait cycles that specify how resources (e.g., database tables) are held in contention during executions of SQL statements. We rigorously evaluated our approach. For a realistic case of over 1,200 SQL statements, our algorithm detects all hold-and-wait cycles in less than two seconds. We built a toolset and experimented with three applications. Our tool prevented all existing database deadlocks in these applications and increased their throughputs by up to three orders of magnitude.

...read moreread less

25 citations

Proceedings Article•DOI•

Testing Database-Centric Applications for Causes of Database Deadlocks

[...]

Mark Grechanik¹, B M Mainul Hossain¹, Ugo Buy¹•Institutions (1)

University of Illinois at Chicago¹

18 Mar 2013

TL;DR: A novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) is created that enables testers to instantiate database deadlocks in applications with a high level of automation and frequency.

...read moreread less

Abstract: Many organizations deploy applications that use databases by sending Structured Query Language (SQL) statements to them and obtaining data that result from executions of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases. Testing applications to determine how they cause database deadlocks is important as part of ensuring correctness, reliability, and performance of these applications. Unfortunately, it is very difficult to reproduce database deadlocks, since it involves different factors such as the precise interleavings in executing SQL statements. We created a novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) that enables testers to instantiate database deadlocks in applications with a high level of automation and frequency. We implemented STEPDAD and experimented with three applications. On average, STEPDAD detected a number of database deadlocks exceeding the deadlocks obtained with the baseline approach by more than an order of magnitude. In some cases, STEPDAD reproduced a database deadlock after running an application only twice, while no database deadlocks could be obtained after ten runs using the baseline approach.

...read moreread less

17 citations

Proceedings Article•DOI•

REDACT: preventing database deadlocks from application-based transactions

[...]

B M Mainul Hossain¹, Mark Grechanik¹, Ugo Buy¹, Haisheng Wang²•Institutions (2)

University of Illinois at Chicago¹, Oracle Corporation²

18 Aug 2013

TL;DR: A database deadlocks prevention system that visualizes the algorithm for detecting hold-and-wait cycles that specify how resources are locked and waited on to be locked during executions of SQL statements and utilizes those cycles information to prevent database deadlock automatically.

...read moreread less

Abstract: In this demonstration, we will present a database deadlocks prevention system that visualizes our algorithm for detecting hold-and-wait cycles that specify how resources (e.g., database tables) are locked and waited on to be locked during executions of SQL statements and utilizes those cycles information to prevent database deadlocks automatically.

...read moreread less

2 citations