scispace - formally typeset
Book ChapterDOI: 10.1007/978-981-10-3874-7_61

A Classification Model to Analyze the Spread and Emerging Trends of the Zika Virus in Twitter

01 Jan 2017-pp 643-650
Abstract: The Zika disease is a 2015–16 virus epidemic and continues to be a global health issue. The recent trend in sharing critical information on social networks such as Twitter has been a motivation for us to propose a classification model that classifies tweets related to Zika and thus enables us to extract helpful insights into the community. In this paper, we try to explain the process of data collection from Twitter, the preprocessing of the data, building a model to fit the data, comparing the accuracy of support vector machines and Naive Bayes algorithm for text classification and state the reason for the superiority of support vector machine over Naive Bayes algorithm. Useful analytical tools such as word clouds are also presented in this research work to provide a more sophisticated method to retrieve community support from social networks such as Twitter. more

Topics: Naive Bayes classifier (55%), Zika virus (51%)

Book ChapterDOI: 10.1108/978-1-83909-099-820201010
30 Sep 2020-
Abstract: Tremendous measure of data lakes with the exponential mounting rate is produced by the present healthcare sector. The information from differing sources like electronic wellbeing record, clinical information, streaming information from sensors, biomedical image data, biomedical signal information, lab data, and so on brand it substantial as well as mind-boggling as far as changing information positions, which have stressed the abilities of prevailing regular database frameworks in terms of scalability, storage of unstructured data, concurrency, and cost. Big data solutions step in the picture by harnessing these colossal, assorted, and multipart data indexes to accomplish progressively important and learned patterns. The reconciliation of multimodal information seeking after removing the relationship among the unstructured information types is a hotly debated issue these days. Big data energizes in triumphing the bits of knowledge from these immense expanses of information. Big data is a term which is required to take care of the issues of volume, velocity, and variety generally seated in the medicinal services data. This work plans to exhibit a survey of the writing of big data arrangements in the medicinal services part, the potential changes, challenges, and accessible stages and philosophies to execute enormous information investigation in the healthcare sector. The work categories the big healthcare data (BHD) applications in five broad categories, followed by a prolific review of each sphere, and also offers some practical available real-life applications of BHD solutions. more

Topics: Big data (60%), Unstructured data (58%), Analytics (56%) more

2 Citations

Journal ArticleDOI: 10.1007/S13278-020-00707-X
Ariel Rodriguez1, Koji Okamura1Institutions (1)
Abstract: In this research, we aim to expand the utility of keyword filtering on text-based data in the domain of cyber threat intelligence. Existing research-based cyber threat intelligence systems and production systems often utilize keyword filtering as a method to obtain training data for a classification model or as a classifier in itself. This method is known to have concerns with false-positives that affect data quality and thus can produce downstream issues for security analysts that utilize these types of systems. We propose a method to classify open-source intelligence data into a cybersecurity-related information stream and subsequently increase the quality of that stream using an unsupervised clustering method. Our method expands on keyword filtering techniques by introducing a word2vec generated associated words list which assists in the classification of ambiguous posts to reduce false-positives while still retrieving large scope data. We then use k-means clustering on positively classified entries to identify and remove clusters that are not relevant to threats. We further explore this method by investigating the effects of using segmentation based on data characteristics to achieve better classification. Together these methods are able to create a higher quality cyber threat-related data stream that can be applied to existing text-based threat intelligence systems that use keyword filtering methods. more

Topics: Data quality (56%), Cluster analysis (54%), Data stream (51%)

1 Citations

Book ChapterDOI: 10.1007/978-981-10-8476-8_11
B. K. Tripathy1Institutions (1)
01 Jan 2018-
Abstract: Technology plays a major role in all spheres of life and higher education and health care are no exceptions. The use of big data in higher education and health care are relatively new. The dynamics of higher education is passing through a phase of rapid changes. Also, the amount of data available in this field and proper analytics can reap the benefits and highlight on future techniques to be followed in handling the complex situations arisen from pressure exerted by accrediting agencies, governments and other stake holders. Higher education is becoming more and more complex with several institutes entering into the market with more and more diversified approaches. This makes the functionalities of all institutes of higher education to revise their approaches frequently to cope up with this pressure. The educational institutes have to ensure that the quality of learning programmes is at par with that of their counterparts at the national and global level. Analysis of vast data sources generated in this connection being more often not available for analysis is a major concern. The analysis of these volumes of data plays a major role in understanding and ensuring that institutions are aware of the changes occurring everywhere and they are taking care of their social responsibilities. Due to digitization of medical records in an attempt to make them available for research and development over the past ten to fifteen years, there is a huge amount of data, which besides being voluminous are complex, diverse and temporal which is collected by healthcare stockholders. An analysis of these data could collectively help the healthcare industry to find out problems related to variability in healthcare quality and escalating healthcare expenditure. In this chapter we shall make a critical analysis of these aspects of higher education and healthcare with respect to big data analysis and make some recommendations in this direction. more

Topics: Higher education (55%), Health care (54%), Big data (52%) more

Open accessBook
Nello Cristianini1, John Shawe-Taylor2Institutions (2)
01 Jan 2000-
Abstract: From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software. more

13,269 Citations

Open accessBook ChapterDOI: 10.1007/BFB0026683
Thorsten Joachims1Institutions (1)
21 Apr 1998-
Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning. more

Topics: Support vector machine (55%), Categorization (50%)

8,287 Citations

Open accessJournal ArticleDOI: 10.1145/505282.505283
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation. more

Topics: Multi-task learning (61%), Categorization (58%), Classifier (UML) (56%) more

7,232 Citations

Proceedings ArticleDOI: 10.1145/1772690.1772777
26 Apr 2010-
Abstract: Twitter, a popular microblogging service, has received much attention recently. An important characteristic of Twitter is its real-time nature. For example, when an earthquake occurs, people make many Twitter posts (tweets) related to the earthquake, which enables detection of earthquake occurrence promptly, simply by observing the tweets. As described in this paper, we investigate the real-time interaction of events such as earthquakes in Twitter and propose an algorithm to monitor tweets and to detect a target event. To detect a target event, we devise a classifier of tweets based on features such as the keywords in a tweet, the number of words, and their context. Subsequently, we produce a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location. We consider each Twitter user as a sensor and apply Kalman filtering and particle filtering, which are widely used for location estimation in ubiquitous/pervasive computing. The particle filter works better than other comparable methods for estimating the centers of earthquakes and the trajectories of typhoons. As an application, we construct an earthquake reporting system in Japan. Because of the numerous earthquakes and the large number of Twitter users throughout the country, we can detect an earthquake with high probability (96% of earthquakes of Japan Meteorological Agency (JMA) seismic intensity scale 3 or more are detected) merely by monitoring tweets. Our system detects earthquakes promptly and sends e-mails to registered users. Notification is delivered much faster than the announcements that are broadcast by the JMA. more

3,811 Citations

Open accessProceedings Article
01 May 2010-
Abstract: Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language. more

Topics: Sentiment analysis (68%), Microblogging (56%)

2,440 Citations

No. of citations received by the Paper in previous years