scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Clustering of Micro-Messages Using Similarity Upper Approximation

TL;DR: A large number of people believe that social media should be used as a medium of communication for all forms of communication, and social media companies should be considered as a channel for doing so.
Abstract: Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for communication. A large a...
Citations
More filters
Journal ArticleDOI
TL;DR: The current study proposes to use upper approximation concept of rough sets for developing a solution for privacy preserving social network graph publishing that is capable of preserving the privacy of graph structure while simultaneously maintaining the utility or value that can be generated from the graph structure.
Abstract: With the advent of the online social network and advancement of technology, people get connected and interact on social network. To better understand the behavior of users on social network, we need to mine the interactions of users and their demographic data. Companies with less or no expertise in mining would need to share this data with the companies of expertise for mining purposes. The major challenge in sharing the social network data is maintaining the individual privacy on social network while retaining the implicit knowledge embedded in the social network. Thus, there is a need of anonymizing the social network data before sharing it to the third-party. The current study proposes to use upper approximation concept of rough sets for developing a solution for privacy preserving social network graph publishing. The proposed algorithm is capable of preserving the privacy of graph structure while simultaneously maintaining the utility or value that can be generated from the graph structure. The proposed algorithm is validated by showing its effectiveness on several graph mining tasks like clustering, classification, and PageRank computation. The set of experiments were conducted on four standard datasets, and the results of the study suggest that the proposed algorithm would maintain the both the privacy of individuals and the accuracy of the graph mining tasks.

17 citations


Cites methods from "Clustering of Micro-Messages Using ..."

  • ..., 2015) and have been used for clustering of micro-messages in microblogging platforms like Twitter (Gupta et al., 2017)....

    [...]

  • ...Rough set based approaches have also been used in recommendation systems of websites (Mishra et al., 2015) and have been used for clustering of micro-messages in microblogging platforms like Twitter (Gupta et al., 2017)....

    [...]

  • ...Another avenue for further research includes the application of the proposed algorithm on very large datasets like Twitter data, Facebook data, and so on....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Abstract: A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n − 1 mutually exclusive sets by considering the union of all possible n(n − 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.

17,405 citations

Journal ArticleDOI
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

7,572 citations

Journal ArticleDOI
16 Feb 2007-Science
TL;DR: A method called “affinity propagation,” which takes as input measures of similarity between pairs of data points, which found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.
Abstract: Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such "exemplars" can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called "affinity propagation," which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.

6,429 citations

Journal ArticleDOI
TL;DR: A broad research agenda for understanding the relationships among social media, business, and society is outlined and it is hoped that the flexible framework outlined will help guide future research and develop a cumulative research tradition in this area.
Abstract: Social media are fundamentally changing the way we communicate, collaborate, consume, and create. They represent one of the most transformative impacts of information technology on business, both within and outside firm boundaries. This special issue was designed to stimulate innovative investigations of the relationship between social media and business transformation. In this paper we outline a broad research agenda for understanding the relationships among social media, business, and society. We place the papers comprising the special issue within this research framework and identify areas where further research is needed. We hope that the flexible framework we outline will help guide future research and develop a cumulative research tradition in this area.

778 citations

Proceedings ArticleDOI
05 Jul 2011
TL;DR: This paper explores approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages, and relies on a rich family of aggregatestatistics of topically similar message clusters.
Abstract: User-contributed messages on social media sites such as Twitter have emerged aspowerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events andnon-event messages. Our approach relies on a rich family of aggregatestatistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.

761 citations