scispace - formally typeset
Search or ask a question
Topic

Data proliferation

About: Data proliferation is a research topic. Over the lifetime, 72 publications have been published within this topic receiving 996 citations.


Papers
More filters
Proceedings ArticleDOI
Barna Saha1, Divesh Srivastava1
19 May 2014
TL;DR: This tutorial presents recent results that are relevant to big data quality management, focusing on the two major dimensions of discovering quality issues from the data itself, and (ii) trading-off accuracy vs efficiency, and identifies a range of open problems for the community.
Abstract: In our Big Data era, data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth ‘V’ of big data is increasingly being recognized. In this tutorial, we highlight the substantial challenges that the first three ‘V’s, volume, velocity and variety, bring to dealing with veracity in big data. Due to the sheer volume and velocity of data, one needs to understand and (possibly) repair erroneous data in a scalable and timely manner. With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori; one needs to let the “data to speak for itself” in order to discover the semantics of the data. This tutorial presents recent results that are relevant to big data quality management, focusing on the two major dimensions of (i) discovering quality issues from the data itself, and (ii) trading-off accuracy vs efficiency, and identifies a range of open problems for the community.

203 citations

Journal ArticleDOI
01 Sep 2015
TL;DR: This study of more than 100 currently existing data journals describes the approaches they promote for data set description, availability, citation, quality, and open access and identifies ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.
Abstract: Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as data deluge and big data today characterize numerous research initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture. Lately, data journals have overcome some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.

117 citations

Journal Article
TL;DR: Ethical issues pertaining to secondary data analysis, which refers to the use of existing research data to find answer to a question that was different from the original work, have become more pressing with the advent of new technologies.
Abstract: BackgroundResearch does not always involve collection of data from the participants. There is huge amount of data that is being collected through the routine management information system and other sur-veys or research activities. The existing data can be analyzed to generate new hypothesis or answer critical research questions. This saves lots of time, money and other resources. Also data from large sample surveys may be of higher quality and representative of the population. It avoids repeti-tion of research & wastage of resources by de-tailed exploration of existing research data and also ensures that sensitive topics or hard to reach populations are not over researched (1). However, there are certain ethical issues pertaining to secondary data analysis which should be taken care of before handling such data.Secondary data analysisSecondary analysis refers to the use of existing research data to find answer to a question that was different from the original work (2). Secondary data can be large scale surveys or data collected as part of personal research. Although there is gen-eral agreement about sharing the results of large scale surveys, but little agreement exists about the second. While the fundamental ethical issues re-lated to secondary use of research data remain the same, they have become more pressing with the advent of new technologies. Data sharing, compil-ing and storage have become much faster and eas-ier. At the same time, there are fresh concerns about data confidentiality and security.Issues in Secondary data analysisConcerns about secondary use of data mostly re-volve around potential harm to individual subjects and issue of return for consent. Secondary data vary in terms of the amount of identifying infor-mation in it. If the data has no identifying infor-mation or is completely devoid of such infor-mation or is appropriately coded so that the re-searcher does not have access to the codes, then it does not require a full review by the ethical board. The board just needs to confirm that the data is actually anonymous. However, if the data contains identifying information on participants or infor-mation that could be linked to identify partici-pants, a complete review of the proposal will then be made by the board. The researcher will then have to explain why is it unavoidable to have identifying information to answer the research question and must also indicate how participants' privacy and the confidentiality of the data will be protected. If the above said concerns are satisfactorily addressed, the researcher can then request for a waiver of consent.If the data is freely available on the Internet, books or other public forum, permission for fur-ther use and analysis is implied. However, the ownership of the original data must be acknowl-edged. If the research is part of another research project and the data is not freely available, except to the original research team, explicit, written permission for the use of the data must be ob-tained from the research team and included in the application for ethical clearance.However, there are certain other issues pertaining to the data that is procured for secondary analysis. The data obtained should be adequate, relevant but not excessive. In secondary data analysis, the original data was not collected to answer the pre-sent research question. Thus the data should be evaluated for certain criteria such as the methodology of data collection, accuracy, period of data collection, purpose for which it was col-lected and the content of the data. …

89 citations

Journal ArticleDOI
TL;DR: An overview of the data mining systems and some of its applications is given.
Abstract: In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.

78 citations

Journal ArticleDOI
01 Jun 2019
TL;DR: The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools ofbig data can deliver actionable insights that create business values.
Abstract: Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the majority of the world’s digital data and these include text files, web, and social media posts, emails, images, audio, movies, etc. The unstructured data cannot be managed in the traditional relational database management system (RDBMS). Therefore, data proliferation requires a rethinking of techniques for capturing, storing, and processing the data. This is the role big data has come to play. This paper, therefore, is aimed at increasing the attention of organizations and researchers to various applications and benefits of big data technology. The paper reviews and discusses, the recent trends, opportunities and pitfalls of big data and how it has enabled organizations to create successful business strategies and remain competitive, based on available literature. Furthermore, the review presents the various applications of big data and business analytics, data sources generated in these applications and their key characteristics. Finally, the review not only outlines the challenges for successful implementation of big data projects but also highlights the current open research directions of big data analytics that require further consideration. The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools of big data can deliver actionable insights that create business values.

72 citations


Network Information
Related Topics (5)
Query optimization
17.6K papers, 474.4K citations
75% related
Query language
17.2K papers, 496.2K citations
75% related
Relational database
21.7K papers, 479K citations
73% related
Semantic Web
26.9K papers, 534.2K citations
71% related
Ontology (information science)
57K papers, 869.1K citations
70% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20213
20201
20192
20183
201712
20168