Text Mining Methods and Techniques

doi:10.5120/14937-3507

Home
/
Papers
/
Text Mining Methods and Techniques

Journal Article•DOI•

Text Mining Methods and Techniques

Sonali Vijay Gaikwad, Archana Chaugule, Pramod B. Patil

16 Jan 2014-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 85, Iss: 17, pp 42-45

TL;DR: This survey paper discusses such successful techniques and methods to give effectiveness over information retrieval in text mining, the types of situations where each technology may be useful in order to help users are discussed.

read less

Abstract: In recent years growth of digital data is increasing, knowledge discovery and data mining have attracted great attention with coming up need for turning such data into useful information and knowledge. The use of the information and knowledge extracted from a large amount of data benefits many applications like market analysis and business management. In many applications database stores information in text form so text mining is the one of the most resent area for research. To extract user required information is the challenging issue. Text Mining is an important step of knowledge discovery process. Text mining extracts hidden information from notstructured to semi-structured data. Text mining is the discovery by automatically extracting information from different written resources and also by computer for extracting new, previously unknown information. This survey paper tries to cover the text mining techniques and methods that solve these challenges. In this survey paper we discuss such successful techniques and methods to give effectiveness over information retrieval in text mining. The types of situations where each technology may be useful in order to help users are also discussed.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A survey of text mining in social media: Facebook and Twitter perspectives

[...]

Said A. Salloum, Mostafa Al-Emran, Azza Abdel Monem, Khaled Shaalan

01 Jan 2017-Advances in Science, Technology and Engineering Systems Journal

TL;DR: This survey focused on analyzing the text mining studies related to Facebook and Twitter; the two dominant social media in the world, to describe how studies in social media have used text analytics and text mining techniques for the purpose of identifying the key themes in the data.

...read moreread less

Abstract: Text mining has become one of the trendy fields that has been incorporated in several research fields such as computational linguistics, Information Retrieval (IR) and data mining Natural Language Processing (NLP) techniques were used to extract knowledge from the textual text that is written by human beings Text mining reads an unstructured form of data to provide meaningful information patterns in a shortest time period Social networking sites are a great source of communication as most of the people in today’s world use these sites in their daily lives to keep connected to each other It becomes a common practice to not write a sentence with correct grammar and spelling This practice may lead to different kinds of ambiguities like lexical, syntactic, and semantic and due to this type of unclear data, it is hard to find out the actual data order Accordingly, we are conducting an investigation with the aim of looking for different text mining methods to get various textual orders on social media websites This survey aims to describe how studies in social media have used text analytics and text mining techniques for the purpose of identifying the key themes in the data This survey focused on analyzing the text mining studies related to Facebook and Twitter; the two dominant social media in the world Results of this survey can serve as the baselines for future text mining research

...read moreread less

158 citations

Cites background from "Text Mining Methods and Techniques"

...A management information system is capable of incorporating the resulting information, and as a result, significant knowledge is produced for the user of that information system [57]....
[...]

Book Chapter•DOI•

Using Text Mining Techniques for Extracting Information from Research Articles

[...]

Said A. Salloum¹, Mostafa Al-Emran², Mostafa Al-Emran³, Azza Abdel Monem⁴, Khaled Shaalan¹ - Show less +1 more•Institutions (4)

British University in Dubai¹, Universiti Malaysia Pahang², AL Buraimi University College³, Ain Shams University⁴

01 Jan 2018

TL;DR: A comprehensive overview about text mining and its current research status is demonstrated and experimental results indicated that Springer database represents the main source for research articles in the field of mobile education for the medical domain.

...read moreread less

Abstract: Nowadays, research in text mining has become one of the widespread fields in analyzing natural language documents. The present study demonstrates a comprehensive overview about text mining and its current research status. As indicated in the literature, there is a limitation in addressing Information Extraction from research articles using Data Mining techniques. The synergy between them helps to discover different interesting text patterns in the retrieved articles. In our study, we collected, and textually analyzed through various text mining techniques, three hundred refereed journal articles in the field of mobile learning from six scientific databases, namely: Springer, Wiley, Science Direct, SAGE, IEEE, and Cambridge. The selection of the collected articles was based on the criteria that all these articles should incorporate mobile learning as the main component in the higher educational context. Experimental results indicated that Springer database represents the main source for research articles in the field of mobile education for the medical domain. Moreover, results where the similarity among topics could not be detected were due to either their interrelations or ambiguity in their meaning. Furthermore, findings showed that there was a booming increase in the number of published articles during the years 2015 through 2016. In addition, other implications and future perspectives are presented in the study.

...read moreread less

125 citations

Journal Article•DOI•

Investigating Key Attributes in Experience and Satisfaction of Hotel Customer Using Online Review Data

[...]

Hyun Jeong Ban, Ha-Yeon Choi, Eun-Kyong (Cindy) Choi, Sanghyeop Lee, Hak-Seon Kim - Show less +1 more

21 Nov 2019-Sustainability

TL;DR: In this paper, the authors investigated what are the key attributes and the structural relationship of those key attributes in hotel reviews and applied semantic network analysis, factor analysis and regression analysis to understand the experience and satisfaction of the hotel customer.

...read moreread less

Abstract: With the development of social media, customers are sharing their experiences, and it is rapidly spreading as a form of online review. That is why the online review has become a significant information source affecting customers’ purchase intention and behavior. Therefore, it is important to understand the customer’s experience shown in the online review in order to maintain sustainable customer satisfaction and loyalty. The purpose of this study is to investigate what are the key attributes and the structural relationship of those key attributes. To accomplish this purpose, a total of 6596 hotel reviews were collected from Google (google.com). A frequency analysis using text mining was performed to figure out the most frequently mentioned attributes. In addition, semantic network analysis, factor analysis, and regression analysis were applied to understand the experience and satisfaction of the hotel customer. As a result, the top 99 keywords were divided into four groups such as “Intangible Service”, “Physical Environment”, “Purpose”, and “Location”. The factor analysis reduced the dimension of the original 64 keywords to 22 keywords, and grouped them into five factors, which are “Access”, “F&B (Food and Beverage)”, “Purpose”, “Tangibles”, and “Empathy”. Based on these results, theoretical and practical implications for sustainable hotel marketing strategies are suggested.

...read moreread less

50 citations

Journal Article•DOI•

A critical review of text-based research in construction: Data source, analysis method, and implications

[...]

Seungwon Baek¹, Wooyong Jung², Seung Heon Han¹•Institutions (2)

Yonsei University¹, Korea Electric Power Corporation²

01 Dec 2021-Automation in Construction

TL;DR: A comprehensive review of text analytics finds that the ontology- and rule-based approach has been dominant, at the same time, recent research has attempted to apply the state-of-the-art machine learning methods.

...read moreread less

30 citations

Journal Article•DOI•

LSA & LDA topic modeling classification: comparison study on e-books

[...]

Shaymaa H. Mohammed¹, Salam Al-augby¹•Institutions (1)

University of Kufa¹

01 Jul 2020-Indonesian Journal of Electrical Engineering and Computer Science

TL;DR: This paper addresses a comparison study on scientific unstructured text document classification (e-books) based on the full text where applying the most popular topic modeling approach (LDA, LSA) to cluster the words into a set of topics as important keywords for classification.

...read moreread less

Abstract: With the rapid growth of information technology, the amount of unstructured text data in digital libraries is rapidly increased and has become a big challenge in analyzing, organizing and how to classify text automatically in E-research repository to get the benefit from them is the cornerstone. The manual categorization of text documents requires a lot of financial, human resources for management. In order to get so, topic modeling are used to classify documents. This paper addresses a comparison study on scientific unstructured text document classification (e-books) based on the full text where applying the most popular topic modeling approach (LDA, LSA) to cluster the words into a set of topics as important keywords for classification. Our dataset consists of (300) books contain about 23 million words based on full text. In the used topic models (LSA, LDA) each word in the corpus of vocabulary is connected with one or more topics with a probability, as estimated by the model. Many (LDA, LSA) models were built with different values of coherence and pick the one that produces the highest coherence value. The result of this paper showed that LDA has better results than LSA and the best results obtained from the LDA method was ( 0.592179 ) of coherence value when the number of topics was 20 while the LSA coherence value was (0.5773026) when the number of topics was 10.

...read moreread less

30 citations

Additional excerpts

...The text exploration process includes many functions such as (cleaning up unstructured data to be available for text analytics, text classification, text clustering, keyword extraction, document summarization, and entity relationship model in [22]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Term Weighting Approaches in Automatic Text Retrieval

[...]

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 Aug 1988-Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

9,460 citations

"Text Mining Methods and Techniques" refers methods in this paper

...Term based methods suffer from the problems of polysemy and synonymy[1]....
[...]

Proceedings Article•

Text Classification using String Kernels

[...]

Huma Lodhi¹, John Shawe-Taylor¹, Nello Cristianini¹, Chris Watkins¹•Institutions (1)

Royal Holloway, University of London¹

01 Jan 2000

TL;DR: In this article, an inner product in the feature space consisting of all subsequences of length k was introduced for comparing two text documents, where a subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously.

...read moreread less

Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results.

...read moreread less

1,464 citations

Journal Article•DOI•

Text classification using string kernels

[...]

Huma Lodhi¹, Craig Saunders¹, John Shawe-Taylor¹, Nello Cristianini¹, Chris Watkins¹ - Show less +1 more•Institutions (1)

Royal Holloway, University of London¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.

...read moreread less

Abstract: We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.

...read moreread less

1,281 citations

"Text Mining Methods and Techniques" refers methods in this paper

...The typical text categorization process consists of pre-processing, indexing, dimensionally reduction, and classification[3][4]....
[...]

Journal Article•DOI•

Automatic text categorization and its application to text retrieval

[...]

Wai Lam¹, Miguel E. Ruiz, Padmini Srinivasan•Institutions (1)

The Chinese University of Hong Kong¹

01 Nov 1999-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work develops an automatic text categorization approach and investigates its application to text retrieval, demonstrating the effectiveness of the approach and demonstrating that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization.

...read moreread less

Abstract: We develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two real-world document collections from the MEDLINE database. Next, we investigate the application of automatic categorization to text retrieval. Our experiments clearly indicate that automatic categorization improves the retrieval performance compared with no categorization. We also demonstrate that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization. Furthermore, detailed analysis of the retrieval performance on each individual test query is provided.

...read moreread less

177 citations

"Text Mining Methods and Techniques" refers methods in this paper

...The typical text categorization process consists of pre-processing, indexing, dimensionally reduction, and classification[3][4]....
[...]

Proceedings Article•DOI•

Deploying Approaches for Pattern Refinement in Text Mining

[...]

Sheng-Tang Wu¹, Yuefeng Li¹, Yue Xu¹•Institutions (1)

Queensland University of Technology¹

18 Dec 2006

TL;DR: The performance of the pattern deploying algorithms for text mining is investigated on the Reuters dataset RCVI and the results show that the effectiveness is improved by using the proposed pattern refinement approaches.

...read moreread less

Abstract: Text mining is the technique that helps users find useful information from a large amount of digital text documents on the Web or databases. Instead of the keyword-based approach which is typically used in this field, the pattern-based model containing frequent sequential patterns is employed to perform the same concept of tasks. However, how to effectively use these discovered patterns is still a big challenge. In this study, we propose two approaches based on the use of pattern deploying strategies. The performance of the pattern deploying algorithms for text mining is investigated on the Reuters dataset RCV1 and the results show that the effectiveness is improved by using our proposed pattern refinement approaches.

...read moreread less

131 citations