scispace - formally typeset
Search or ask a question

Showing papers by "Ugo Buy published in 2013"


Proceedings ArticleDOI
25 Aug 2013
TL;DR: The approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy, and proves the validity of the approach by examining different classifiers over a large dataset of Twitter profiles.
Abstract: Online Social Networks (OSNs) generate a huge volume of user-originated texts. Gender classification can serve multiple purposes. For example, commercial organizations can use gender classification for advertising. Law enforcement may use gender classification as part of legal investigations. Others may use gender information for social reasons. Here we explore language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles (e.g., the background color in a user's profile page). Most other methods for gender prediction are typically language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.

89 citations


Proceedings ArticleDOI
04 Dec 2013
TL;DR: This work explores profile characteristics for gender classification on Twitter and provides a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features.
Abstract: Online Social Networks (OSNs) provide reliable communication among users from different countries. The volume of texts generated by OSNs is huge and highly informative. Gender classification can serve commercial organizations for advertising, law enforcement for legal investigation, and others for social reasons. Here we explore profile characteristics for gender classification on Twitter. Unlike existing approaches to gender classification that depend heavily on posted text such as tweets, here we study the relative strengths of different characteristics extracted from Twitter profiles (e.g., first name and background color in a user's profile page). Our goal is to evaluate profile characteristics with respect to their predictive accuracy and computational complexity. In addition, we provide a novel technique to reduce the number of features of text-based profile characteristics from the order of millions to a few thousands and, in some cases, to only 40 features. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.

68 citations


Proceedings ArticleDOI
18 Aug 2013
TL;DR: A novel approach is created that combines run-time monitoring, which automatically prevents database deadlocks, with static analysis, which detects hold-and-wait cycles that specify how resources are held in contention during executions of SQL statements.
Abstract: Many organizations deploy applications that use databases by sending Structured Query Language (SQL) statements to them and obtaining data that result from the execution of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases resulting in major performance degradation in these applications. Database engines do not prevent database deadlocks for the same reason that the schedulers of operating system kernels do not preempt processes in a way to avoid race conditions and deadlocks - it is not feasible to find an optimal context switching schedule quickly for multiple processes (and SQL statements), and the overhead of doing it is prohibitive. We created a novel approach that combines run-time monitoring, which automatically prevents database deadlocks, with static analysis, which detects hold-and-wait cycles that specify how resources (e.g., database tables) are held in contention during executions of SQL statements. We rigorously evaluated our approach. For a realistic case of over 1,200 SQL statements, our algorithm detects all hold-and-wait cycles in less than two seconds. We built a toolset and experimented with three applications. Our tool prevented all existing database deadlocks in these applications and increased their throughputs by up to three orders of magnitude.

25 citations


Proceedings ArticleDOI
18 Mar 2013
TL;DR: A novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) is created that enables testers to instantiate database deadlocks in applications with a high level of automation and frequency.
Abstract: Many organizations deploy applications that use databases by sending Structured Query Language (SQL) statements to them and obtaining data that result from executions of these statements. Since applications often share the same databases concurrently, database deadlocks routinely occur in these databases. Testing applications to determine how they cause database deadlocks is important as part of ensuring correctness, reliability, and performance of these applications. Unfortunately, it is very difficult to reproduce database deadlocks, since it involves different factors such as the precise interleavings in executing SQL statements. We created a novel approach for Systematic TEsting in Presence of DAtabase Deadlocks (STEPDAD) that enables testers to instantiate database deadlocks in applications with a high level of automation and frequency. We implemented STEPDAD and experimented with three applications. On average, STEPDAD detected a number of database deadlocks exceeding the deadlocks obtained with the baseline approach by more than an order of magnitude. In some cases, STEPDAD reproduced a database deadlock after running an application only twice, while no database deadlocks could be obtained after ten runs using the baseline approach.

17 citations


Proceedings ArticleDOI
18 Aug 2013
TL;DR: A database deadlocks prevention system that visualizes the algorithm for detecting hold-and-wait cycles that specify how resources are locked and waited on to be locked during executions of SQL statements and utilizes those cycles information to prevent database deadlock automatically.
Abstract: In this demonstration, we will present a database deadlocks prevention system that visualizes our algorithm for detecting hold-and-wait cycles that specify how resources (e.g., database tables) are locked and waited on to be locked during executions of SQL statements and utilizes those cycles information to prevent database deadlocks automatically.

2 citations