Bio: Rahul Pramanik is an academic researcher from Indian Institutes of Technology. The author has contributed to research in topics: Devanagari & Handwriting recognition. The author has an hindex of 5, co-authored 14 publications receiving 168 citations. Previous affiliations of Rahul Pramanik include Synergy University & Indian Institute of Technology Dhanbad.
TL;DR: This paper proposes a novel shape decomposition-based segmentation technique to decompose the compound characters into prominent shape components, which reduces the classification complexity in terms of less number of classes to recognize, and at the same time improves the recognition accuracy.
Abstract: Proper recognition of complex-shaped handwritten compound characters is still a big challenge for Bangla OCR systems. In this paper, we propose a novel shape decomposition-based segmentation technique to decompose the compound characters into prominent shape components. This shape decomposition reduces the classification complexity in terms of less number of classes to recognize, and at the same time improves the recognition accuracy. The decomposition is done at the segmentation area where the two basic shapes are joined to form a compound character. We use chain code histogram feature set with multi-layer perceptron (MLP) based classifier with backpropagation learning for classification. On experimentation, the proposed method is observed to provide good recognition accuracy comparing with other existing methods.
TL;DR: The article surveys recent developments on social spam detection and mitigation, its theoretical models and applications along with their qualitative comparison, and presents the state-of-the-art and attempt to provide challenges to be addressed, as the nature and content of spam are bound to get more complicated.
Abstract: Social networking and instant multimedia communication is integral to online existence.Spamming is a new menace in messaging, blogs, video sites, internet telephony etc.The article surveys recent developments on social spam detection and mitigation.A qualitative comparison of different models and their performances are presented.A roadmap on how newer anti-spam techniques can be devised in future is provided. Spam in recent years has pervaded all forms of digital communication.The increase in user base for social platforms like Facebook, Twitter, YouTube, etc., has opened new avenues for spammers. The liberty to contribute content freely has encouraged the spammers to exploit the social platforms for their benefits. E-mail and web search engine being the early victims of spam have attracted serious attention from the information scientists for quite some time. A substantial amount of research has been directed to combat spam on these two platforms. Social networks being quite different in nature from the earlier two, have different kinds of spam and spam-fighting techniques from these domains seldom work. Moreover, due to the continuous and rapid evolution of social media, spam themselves evolve very fast posing a great challenge to the community. Despite being relatively new, there has been a number of attempts in the area of social spam in the recent past and a lot many are certain to come in near future. This paper surveys the recent developments in the area of social spam detection and mitigation, its theoretical models and applications along with their qualitative comparison. We present the state-of-the-art and attempt to provide challenges to be addressed, as the nature and content of spam are bound to get more complicated.
18 Dec 2018
TL;DR: This study uses readily available pre-trained Convolutional Neural Network architectures on four different Indic scripts, viz.
Abstract: Filling up forms at post offices, railway counters, and for application of jobs has become a routine for modern people, especially in a developing country like India. Research on automation for the recognition of such handwritten forms has become mandatory. This applies more for a multilingual country like India. In the present work, we use readily available pre-trained Convolutional Neural Network (CNN) architectures on four different Indic scripts, viz. Bangla, Devanagari, Oriya, and Telugu to achieve a satisfactory recognition rate for handwritten Indic numerals. Furthermore, we have mixed Bangla and Oriya numerals and applied transfer learning for recognition. The main objective of this study is to realize how good a CNN model trained on an entire different dataset (of natural images) works for small and unrelated datasets. As a part of practical application, we have applied the proposed approach to recognize Bangla handwritten pin codes after their extraction from postal letters.
TL;DR: The authors propose a system that first detects and corrects skew present in Bangla and Devanagari handwritten words, estimates the headline, and further segments the words into meaningful pseudo-characters to provide the final result.
Abstract: Offline recognition of handwritten text in Indian regional scripts is a major area of research as nearly 910 million people use such scripts in India. Most of the reported research works on Indian script-based optical character recognition (OCR) system have focused on a single script only. Research for developing methodologies that are capable of handling more than one Indian script is yet to be focused. As such, this has motivated us to study and experiment on creating a recognition system that can handle two most popular Indian scripts, namely Bangla and Devanagari. The authors propose a system that first detects and corrects skew present in Bangla and Devanagari handwritten words, estimates the headline, and further segments the words into meaningful pseudo-characters. This is followed by extraction of three different statistical features and combination of these features with off-the-shelf classifiers to study and identify the exemplary combination. Moreover, they employ state-of-the-art convolutional neural network-based transfer learning architectures and delineate a comparison with the extracted hand-crafted features. Finally, they amalgamate the identified pseudo-characters to provide the final result. On experimentation, the proposed segmentation methodology is discerned to provide good accuracy when compared with existing methods.
TL;DR: This work extracts different statistical feature sets from word images and uses five different off-the-shelf classifiers to delineate their performance on holistic Bangla words and uses a seven-layer FCN architecture to provide a threefold study on this particular domain.
Abstract: Decomposition of a word into a set of appropriate pseudo-characters is a challenging task in case of a cursive script like Bangla. Segmentation-free approach bypasses the decomposition problem entirely and treats the handwritten word as an individual entity. From the literature, we found that the accuracy of handwritten Bangla cursive word recognition using segmentation-free approach is relatively low (accuracy hovers between 80% and 90%). In the current work, we aim to provide a threefold study on this particular domain. Firstly, we extract different statistical feature sets from word images and use five different off-the-shelf classifiers to delineate their performance. Then, we employ five different CNN-TL architectures, namely AlexNet, VGG-16, VGG-19, ResNet50, and GoogleNet, to understand how they perform on holistic Bangla words. Finally, we use a seven-layer FCN architecture and provide a comparison of results with all the aforementioned experimentations. We achieved an accuracy of 98.86% with ResNet50, which is nearly 19% improvement when compared with other recent state-of-the-art methodologies.
TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.
Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition
TL;DR: This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade.
Abstract: Pervasive growth and usage of the Internet and mobile applications have expanded cyberspace. The cyberspace has become more vulnerable to automated and prolonged cyberattacks. Cyber security techniques provide enhancements in security measures to detect and react against cyberattacks. The previously used security systems are no longer sufficient because cybercriminals are smart enough to evade conventional security systems. Conventional security systems lack efficiency in detecting previously unseen and polymorphic security attacks. Machine learning (ML) techniques are playing a vital role in numerous applications of cyber security. However, despite the ongoing success, there are significant challenges in ensuring the trustworthiness of ML systems. There are incentivized malicious adversaries present in the cyberspace that are willing to game and exploit such ML vulnerabilities. This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade. It also provides brief descriptions of each ML method, frequently used security datasets, essential ML tools, and evaluation metrics to evaluate a classification model. It finally discusses the challenges of using ML techniques in cyber security. This paper provides the latest extensive bibliography and the current trends of ML in cyber security.
TL;DR: This study is original by presenting an important source of research by explaining the problems of online social network and the studies performed in this area and a reference work for researchers interested in analyzingOnline social network data and social network problems.
Abstract: The use of online social networks has made significant progress in recent years as the use of the Internet has become widespread worldwide as the technological infrastructure and the use of technological products evolve. It has become more suitable to reach online social networking sites such as Facebook, Twitter, Instagram and LinkedIn via the internet and web 3.0 technologies. Thus, people have shared their views on many different topics and their emotions with other users more widely on these platforms. This means that a huge amount of data is created on platforms where millions of people connect with each other through social networks. Nevertheless, the development of computational paradigms at high speed and complexity with technological possibilities allows analysis of valuable data by means of social network analysis methods. Our goal for this paper is to present a review of novel and popular online social network analysis problems with related applications and a reference work for researchers interested in analyzing online social network data and social network problems. Unlike other individual studies we have gathered 21 online social network problems and defined them with related studies. Thus, this study is original by presenting an important source of research by explaining the problems of online social network and the studies performed in this area.
TL;DR: This research identified that success factors of any review spam detection method have interdependencies and for the successful implementation of the spam review detection model and to achieve better accuracy, these factors are required to be considered in accordance with each other.
Abstract: Online reviews about the purchase of products or services provided have become the main source of users’ opinions. In order to gain profit or fame, usually spam reviews are written to promote or demote a few target products or services. This practice is known as review spamming. In the past few years, a variety of methods have been suggested in order to solve the issue of spam reviews. In this study, the researchers carry out a comprehensive review of existing studies on spam review detection using the Systematic Literature Review (SLR) approach. Overall, 76 existing studies are reviewed and analyzed. The researchers evaluated the studies based on how features are extracted from review datasets and different methods and techniques that are employed to solve the review spam detection problem. Moreover, this study analyzes different metrics that are used for the evaluation of the review spam detection methods. This literature review identified two major feature extraction techniques and two different approaches to review spam detection. In addition, this study has identified different performance metrics that are commonly used to evaluate the accuracy of the review spam detection models. Lastly, this work presents an overall discussion about different feature extraction approaches from review datasets, the proposed taxonomy of spam review detection approaches, evaluation measures, and publicly available review datasets. Research gaps and future directions in the domain of spam review detection are also presented. This research identified that success factors of any review spam detection method have interdependencies. The feature’s extraction depends upon the review dataset, and the accuracy of review spam detection methods is dependent upon the selection of the feature engineering approach. Therefore, for the successful implementation of the spam review detection model and to achieve better accuracy, these factors are required to be considered in accordance with each other. To the best of the researchers’ knowledge, this is the first comprehensive review of existing studies in the domain of spam review detection using SLR process.
TL;DR: A brief introduction to social spam, the spamming process, and social spam taxonomy is provided and several dimensionality reduction techniques used for feature selection/extraction, features used, various machine learning and deep learning techniquesused for social spam and spammer detection, and their merits and demerits are explored.
Abstract: Online Social Networks are perpetually evolving and used in plenteous applications such as content sharing, chatting, making friends/followers, customer engagements, commercials, product reviews/promotions, online games, and news, etc. The substantial issues related to the colossal flood of social spam in social media are polarizing sentiments, impacting users’ online interaction time, degrading available information quality, network bandwidth, computing power, and speed. Simultaneously, groups of coordinated automated accounts/bots often use social networking sites to spread spam, rumors, bogus reviews, and fake news for targeted users or mass communication. The latest developments in the form of artificial intelligence-enabled Deepfakes have exacerbated these issues at large. Consequently, it becomes extremely relevant to review recent work concerning social spam and spammer detection to counter this issue and its effect. This paper provides a brief introduction to social spam, the spamming process, and social spam taxonomy. The comprehensive review entails several dimensionality reduction techniques used for feature selection/extraction, features used, various machine learning and deep learning techniques used for social spam and spammer detection, and their merits and demerits. Artificial intelligence and deep learning empowered Deepfake (text, image, and video) spam, and their countermeasures are also explored. Furthermore, meticulous discussions, existing challenges, and emerging issues such as robustness of detection systems, scalability, real-time datasets, evade strategies used by spammers, coordinated inauthentic behavior, and adversarial attacks on machine learning-based spam detectors, etc., have been discussed with possible directions for future research.