scispace - formally typeset
Search or ask a question
Author

Sameep Mehta

Bio: Sameep Mehta is an academic researcher from IBM. The author has contributed to research in topics: Service (business) & Resource (project management). The author has an hindex of 22, co-authored 160 publications receiving 2093 citations. Previous affiliations of Sameep Mehta include Lady Hardinge Medical College & All India Institute of Medical Sciences.


Papers
More filters
Proceedings ArticleDOI
08 Jan 2022
TL;DR: This tutorial highlights the importance of data quality and its associated challenges in data, and highlights the need for analysing data quality in terms of its value for machine learning applications.
Abstract: The saying Garbage In, Garbage Out resonates perfectly within the machine learning and artificial intelligence community. While there has been considerable ongoing effort for improving the quality of models, there is relatively less focus on systematically analysing the quality of data with respect to its efficacy for machine learning. Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. In this tutorial, we emphasize on the importance of data quality and its associated challenges in data, and highlights the importance of analysing data quality in terms of its value for machine learning applications. We will survey on important data-centric approaches to improve the data quality and the ML pipeline. We also will be focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems. As part of hands on session, we first provide an overview on available data quality analysis tools like: Pandas Profilers, Amazon Deepqu, IBM’s Data Quality for AI, etc. We will then showcase how an end users can assess the data quality for their structured (tabular) data using one of the available tool in detail.

2 citations

Book ChapterDOI
Sameep Mehta1, L. V. Subramaniam1
16 Dec 2013
TL;DR: An overview of some of the common tasks in analyzing text messages on Social Media (mostly on microblogging sites) and new ideas on handling common research problems like Event Detection from Social Media, Summarization, Location Inference and fusing external data sources with social data are presented.
Abstract: In this tutorial we present an overview of some of the common tasks in analyzing text messages on Social Media (mostly on microblogging sites). We review the state of the art as well as present new ideas on handling common research problems like Event Detection from Social Media, Summarization, Location Inference and fusing external data sources with social data. The tutorial would assume basic knowledge of Data Mining, Text Analytics and NLP Methods.

2 citations

Patent
24 Aug 2017
TL;DR: In this paper, a computer-implemented method includes classifying each of multiple temporally evolving data entities into one of multiple categories based on one or more parameters; partitioning the multiple temporal evolving data entity into multiple partitions based at least on classifying and the update frequency of each of the multiple time-evolving data entities; implementing multiple checkpoints at a distinct temporal interval for each of these partitions; and creating a snapshot of the many temporal evolving entities at a selected past point of time (i) based on said implementing and (ii) in response to a query pertaining
Abstract: Methods, systems, and computer program products for historical state snapshot construction over temporally evolving data are provided herein. A computer-implemented method includes classifying each of multiple temporally evolving data entities into one of multiple categories based on one or more parameters; partitioning the multiple temporally evolving data entities into multiple partitions based at least on (i) said classifying and (ii) the update frequency of each of the multiple temporally evolving data entities; implementing multiple checkpoints at a distinct temporal interval for each of the multiple partitions; and creating a snapshot of the multiple temporally evolving data entities at a selected past point of time (i) based on said implementing and (ii) in response to a query pertaining to a historical state of one or more of the multiple temporally evolving data entities.

2 citations

Posted Content
TL;DR: The authors proposed a neural network architecture for fairly transferring multiple style attributes in a given text, which can obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness.
Abstract: To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. In this work, we demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp data set to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.

2 citations

Proceedings ArticleDOI
01 Oct 2022
TL;DR: In this paper , the authors propose a design for a Ser verless Scientific Workflow Orchestrator that overcomes these challenges using techniques like function fusion, pilot invocations and data fabrics.
Abstract: Serverless computing and FaaS have gained popularity due to their ease of design, deployment, scaling and billing on clouds. However, when used to compose and orchestrate scientific workflows, they pose limitations due to cold starts, message indirection, vendor lock-in and lack of provenance support. Here, we propose a design for a Ser verless Scientific Workflow Orchestrator that overcomes these challenges using techniques like function fusion, pilot invocations and data fabrics.

2 citations


Cited by
More filters
Journal ArticleDOI
09 Mar 2018-Science
TL;DR: A large-scale analysis of tweets reveals that false rumors spread further and faster than the truth, and false news was more novel than true news, which suggests that people were more likely to share novel information.
Abstract: We investigated the differential diffusion of all of the verified true and false news stories distributed on Twitter from 2006 to 2017. The data comprise ~126,000 stories tweeted by ~3 million people more than 4.5 million times. We classified news as true or false using information from six independent fact-checking organizations that exhibited 95 to 98% agreement on the classifications. Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information. We found that false news was more novel than true news, which suggests that people were more likely to share novel information. Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust. Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it.

4,241 citations

01 Jan 2012

3,692 citations

21 Jan 2018
TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Abstract: The paper “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” by Joy Buolamwini and Timnit Gebru, that will be presented at the Conference on Fairness, Accountability, and Transparency (FAT*) in February 2018, evaluates three commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition. The study finds these services to have recognition capabilities that are not balanced over genders and skin tones [1]. In particular, the authors show that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men.

2,528 citations

Posted Content
TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Abstract: With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

1,571 citations