scispace - formally typeset
Search or ask a question
Topic

Raw data

About: Raw data is a research topic. Over the lifetime, 4442 publications have been published within this topic receiving 94779 citations. The topic is also known as: primary data.


Papers
More filters
Journal ArticleDOI
TL;DR: Although the general inductive approach is not as strong as some other analytic strategies for theory or model development, it does provide a simple, straightforward approach for deriving findings in the context of focused evaluation questions.
Abstract: A general inductive approach for analysis of qualitative evaluation data is described. The purposes for using an inductive approach are to (a) condense raw textual data into a brief, summary format; (b) establish clear links between the evaluation or research objectives and the summary findings derived from the raw data; and (c) develop a framework of the underlying structure of expe- riences or processes that are evident in the raw data. The general inductive approach provides an easily used and systematic set of procedures for analyzing qualitative data that can produce reliable and valid findings. Although the general inductive approach is not as strong as some other analytic strategies for theory or model development, it does provide a simple, straightforward approach for deriving findings in the context of focused evaluation questions. Many evaluators are likely to find using a general inductive approach less complicated than using other approaches to qualitative data analysis.

8,199 citations

Journal ArticleDOI
08 Jan 2000-BMJ
TL;DR: Qualitative research produces large amounts of textual data in the form of transcripts and observational fieldnotes, and the systematic and rigorous preparation and analysis of these data is time consuming and labour intensive.
Abstract: This is the second in a series of three articles Contrary to popular perception, qualitative research can produce vast amounts of data. These may include verbatim notes or transcribed recordings of interviews or focus groups, jotted notes and more detailed “fieldnotes” of observational research, a diary or chronological account, and the researcher's reflective notes made during the research. These data are not necessarily small scale: transcribing a typical single interview takes several hours and can generate 20–40 pages of single spaced text. Transcripts and notes are the raw data of the research. They provide a descriptive record of the research, but they cannot provide explanations. The researcher has to make sense of the data by sifting and interpreting them. #### Summary points Qualitative research produces large amounts of textual data in the form of transcripts and observational fieldnotes The systematic and rigorous preparation and analysis of these data is time consuming and labour intensive Data analysis often takes place alongside data collection to allow questions to be refined and new avenues of inquiry to develop Textual data are typically explored inductively using content analysis to generate categories and explanations; software packages can help with analysis but should not be viewed as short cuts to rigorous and systematic analysis High quality analysis of qualitative data depends on the skill, vision, and integrity of the researcher; it should not be left to the novice In much qualitative research the analytical process begins during data collection as the data already gathered are analysed and shape the ongoing data collection. This sequential analysis1 or interim analysis2 has the advantage of allowing the researcher to go back and refine questions, develop hypotheses, and pursue emerging avenues of inquiry in further depth. Crucially, it also enables the researcher to look for deviant or negative cases; that is, …

7,637 citations

Journal ArticleDOI
TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Abstract: With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.

6,320 citations

Book
31 Jul 1998
TL;DR: Feature Selection for Knowledge Discovery and Data Mining offers an overview of the methods developed since the 1970's and provides a general framework in order to examine these methods and categorize them and suggests guidelines for how to use different methods under various circumstances.
Abstract: From the Publisher: With advanced computer technologies and their omnipresent usage, data accumulates in a speed unmatchable by the human's capacity to process data. To meet this growing challenge, the research community of knowledge discovery from databases emerged. The key issue studied by this community is, in layman's terms, to make advantageous use of large stores of data. In order to make raw data useful, it is necessary to represent, process, and extract knowledge for various applications. Feature Selection for Knowledge Discovery and Data Mining offers an overview of the methods developed since the 1970's and provides a general framework in order to examine these methods and categorize them. This book employs simple examples to show the essence of representative feature selection methods and compares them using data sets with combinations of intrinsic properties according to the objective of feature selection. In addition, the book suggests guidelines for how to use different methods under various circumstances and points out new challenges in this exciting area of research. Feature Selection for Knowledge Discovery and Data Mining is intended to be used by researchers in machine learning, data mining, knowledge discovery, and databases as a toolbox of relevant tools that help in solving large real-world problems. This book is also intended to serve as a reference book or secondary text for courses on machine learning, data mining, and databases.

1,867 citations

Journal ArticleDOI
TL;DR: This study explores how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks.
Abstract: Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.

1,827 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
88% related
Software
130.5K papers, 2M citations
87% related
Deep learning
79.8K papers, 2.1M citations
82% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023516
20221,147
2021297
2020331
2019350
2018284