scispace - formally typeset
Search or ask a question
Author

R. Geetha

Bio: R. Geetha is an academic researcher from Sri Sivasubramaniya Nadar College of Engineering. The author has contributed to research in topics: Privacy law & Regret. The author has an hindex of 1, co-authored 2 publications receiving 2 citations.
Topics: Privacy law, Regret, Social media, Identity theft

Papers
More filters
Journal ArticleDOI
TL;DR: The Tweet-Scan-Post system scans the tweets contextually for sensitive messages and formulates a sensitivity scaling called TSP’s Tweet Sensitivity Scale based on Senti-Cyber features composed of Sensitive Privacy Keyword, Cyber-keywords with Non-Sensitive Privacy Keywords and Non-Cybersensitive Keywords to detect the degree of disclosed sensitive information.
Abstract: Twitter is an extensively used micro-blogging site for publishing user’s views on recent happenings. This wide reachability of messages over large audience poses a threat, as the degree of personally identifiable information disclosed might lead to user regrets. The Tweet-Scan-Post system scans the tweets contextually for sensitive messages. The tweet repository was generated using cyber-keywords for personal, professional and health tweets. The Rules of Sensitivity and Contextuality was defined based on standards established by various national regulatory bodies. The naive sensitivity regression function uses the Bag-of-Words model built from short text messages. The imbalanced classes in dataset result in misclassification with 25% of sensitive and 75% of insensitive tweets. The system opted stacked classification to combat the problem of imbalanced classes. The system initially applied various state-of-art algorithms and predicted 26% of the tweets to be sensitive. The proposed stacked classification approach increased the overall proportion of sensitive tweets to 35%. The system contributes a vocabulary set of 201 Sensitive Privacy Keyword using the boosting approach for three tweet categories. Finally, the system formulates a sensitivity scaling called TSP’s Tweet Sensitivity Scale based on Senti-Cyber features composed of Sensitive Privacy Keywords, Cyber-keywords with Non-Sensitive Privacy Keywords and Non-Cyber-keywords to detect the degree of disclosed sensitive information.

8 citations

Journal ArticleDOI
TL;DR: In this article, a Tweet-Scan-Post (TSP) framework is proposed to identify the presence of sensitive private data (SPD) in user's posts under personal, professional, and health domains.
Abstract: The social media technologies are open to users who are intended in creating a community and publishing their opinions of recent incidents. The participants of the online social networking sites remain ignorant of the criticality of disclosing personal data to the public audience. The private data of users are at high risk leading to many adverse effects like cyberbullying, identity theft, and job loss. This research work aims to define the user entities or data like phone number, email address, family details, health-related information as user’s sensitive private data (SPD) in a social media platform. The proposed system, Tweet-Scan-Post (TSP), is mainly focused on identifying the presence of SPD in user’s posts under personal, professional, and health domains. The TSP framework is built based on the standards and privacy regulations established by social networking sites and organizations like NIST, DHS, GDPR. The proposed approach of TSP addresses the prevailing challenges in determining the presence of sensitive PII, user privacy within the bounds of confidentiality and trustworthiness. A novel layered classification approach with various state-of-art machine learning models is used by the TSP framework to classify tweets as sensitive and insensitive. The findings of TSP systems include 201 Sensitive Privacy Keywords using a boosting strategy, sensitivity scaling that measures the degree of sensitivity allied with a tweet. The experimental results revealed that personal tweets were highly related to mother and children, professional tweets with apology, and health tweets with concern over the father’s health condition.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, the authors propose an approach based on machine learning and sentence embedding techniques with the primary goal of providing privacy awareness to users and, as a consequence, full control over their data during online activities.

22 citations

Proceedings ArticleDOI
26 Jul 2022
TL;DR: An agent called Aegis is proposed to calculate the potential risk incurred by multi-party members in order to push privacy-preserving nudges to the sharer and is inspired by the consequentialist approach in normative ethical problem-solving techniques.
Abstract: The proliferation of social media set the foundation for the culture of over-disclosure where many people document every single event, incident, trip, etc. for everyone to see. Raising the individual's awareness of the privacy issues that they are subjecting themselves to can be challenging. This becomes more complex when the post being shared includes data "owned" by others. The existing approaches aiming to assist users in multi-party disclosure situations need to be revised to go beyond preferences to the "good" of the collective. This paper proposes an agent called Aegis to calculate the potential risk incurred by multi-party members in order to push privacy-preserving nudges to the sharer. Aegis is inspired by the consequentialist approach in normative ethical problem-solving techniques. The main contribution is the introduction of a social media-specific risk equation based on data valuation and the propagation of the post from intended to unintended audience. The proof-of-concept reports on how Aegis performs based on real-world data from the SNAP dataset and synthetically generated networks.

2 citations

Journal ArticleDOI
TL;DR: This work proposes and evaluates a set of approaches for automatically detecting second- and third-party disclosures on Twitter of sensitive private information, a subset of which constitutes doxing, and compares nine different approaches for automated detection based on string-matching and one-hot encoded heuristics, as well as word and contextualized string embedding representations of tweets.
Abstract: Doxing refers to the practice of disclosing sensitive personal information about a person without their consent. This form of cyberbullying is an unpleasant and sometimes dangerous phenomenon for online social networks. Although prior work exists on automated identification of other types of cyberbullying, a need exists for methods capable of detecting doxing on Twitter specifically. We propose and evaluate a set of approaches for automatically detecting second- and third-party disclosures on Twitter of sensitive private information, a subset of which constitutes doxing. We summarize our findings of common intentions behind doxing episodes and compare nine different approaches for automated detection based on string-matching and one-hot encoded heuristics, as well as word and contextualized string embedding representations of tweets. We identify an approach providing 96.86% accuracy and 97.37% recall using contextualized string embeddings and conclude by discussing the practicality of our proposed methods.

2 citations

Book ChapterDOI
14 Nov 2022
TL;DR: In this paper , an adversarial process between a discriminator and a generator is proposed to detect false samples created by the generator from those which are genuine, and the target tries to comprise a generator that practically maps tests from guaranteed (straightforward) earlier appropriation, to synthetic information that seem, by all accounts, to be sensible.
Abstract: The Generative Adversarial Network (GAN) has made incredible progress in creating sensible synthetic information. The authors propose a structure for creating convincing text through adversarial training. It enables the creation of new sentences whilst maintaining the semantics and syntax of genuine phrases whilst being possibly unique from any of the models used to evaluate the model. The authors propose an adversarial process between a discriminator and a generator. The discriminator’s goal is to detect false samples created by the generator from those which are genuine. The target tries to comprise a generator, that practically maps tests from guaranteed (straightforward) earlier appropriation, to synthetic information that seem, by all accounts, to be sensible. In this paper, the authors present various classifiers and test them based on various performance metrics and develop a suitable model to test the authenticity of tweets.