A survey on scholarly data
read more
Citations
Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases
A survey towards an integration of big data analytics to big insights for value-creation
Academic social networks: Modeling, analysis, mining and applications
Big data adoption: State of the art and research challenges
Exploring the Online Doctor-Patient Interaction on Patient Satisfaction Based on Text Mining and Empirical Analysis
References
An index to quantify an individual's scientific research output
Co-citation in the scientific literature: A new measure of the relationship between two documents
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
Recommender Systems Handbook
Big data analytics in healthcare: promise and potential
Related Papers (5)
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
Frequently Asked Questions (15)
Q2. What future works have the authors mentioned in the paper "Cloud-based big data management and analytics for scholarly resources: current trends, challenges and scope for future research" ?
This survey includes a detailed study of the current trends and existing challenges in the different subsystems of the big scholarly data platform, with specific focus on directions for future research in this area. Suggested future work in the area includes the development of solutions and APIs. Most of the future work in this direction includes creation of expressive languages that shall enable users to define their problem to the system keeping in view that operational efficiency of the system with the increasing data only needs to get better. While CiteSeerX exists as one of the most popular scholarly platforms, the services provided are rather limited in their functionality and can be further enhanced to include many scholarly applications like research management and optimized to provide added functionality like algorithm linking, time-evolution of research and recommendations.
Q3. What are the main categories of analytics for big scholarly data?
analytics for big scholarly data can be divided into four categories namely, research management, collaborator discovery, expert finder systems and recommender systems.
Q4. What can be used to form a comprehensive author profile?
Many other types of information like venues where the author has published or presented work and detailed author information derived from the professional author webpage can be used to form a comprehensive author profile, which can be useful for advanced scholarly analytics like collaborator discovery and expert finding [86][87][88].
Q5. What is the importance of data linking and matching?
Citation linking and matching are important step in the process in view of the fact that some fields of metadata that may have been incomplete or extracted incorrectly can be corrected and completed from the data provided by the linkage.
Q6. What is the effective and efficient framework for big data analytics?
In order to implement the techniques mentioned above, MapReduce and Hadoop [20] has been identified as the most effective and efficient framework.
Q7. What are the challenges of storing and processing data?
storing and processing unstructured data and performing these activities such that aggregating and correlating data from different sources become simpler, also require research attention.
Q8. What is the role of tables in computer science research?
Computer Science research documents contain specific sections like pseudocodes and algorithms, which play an instrumental role in mapping research growth and evolution.
Q9. What is the proposed approach to extracting concepts in books?
The proposed approach captures the global coherence and local relatedness in the book by extracting concepts in each chapter and constructing concept hierarchy.
Q10. What is the concept of data linking?
Considering that data will be collected from heterogeneous sources and may exist in different formats, the concept of data linking can be used.
Q11. What is the main idea behind the structure extraction in books?
Gao et al. [110] reviewed structure extraction in books and proposed that extraction of ToC and metadata from books can be seen as a matching problem on bipartite graph.
Q12. What is the main idea behind the use of algorithm co-citation network?
Tuarob et al. [54] also proposed the use of algorithm co-citation network to detect algorithmic level of similarity, which can further be extended to implement algorithm recommendation engines.
Q13. What are the challenges in data management?
Challenges in data management can be further divided into four sub-categories: (i) big data characteristics (ii) data acquisition and integration (iii) information extraction (iii) data preprocessing (iv) data processing and resource management.
Q14. What is the main purpose of the hybrid algorithm?
Tuarob et al. [97] proposed a hybrid algorithm that can identify section boundaries, detect section headers and recognize the hierarchy of sections with good accuracy.
Q15. What is the impact of author disambiguation on the DBLP dataset?
Kim et al. [59] disambiguated the DBLP dataset using these three methods and compared their impact, concluding that author disambiguation can have a substantial influence on data quality and quality of service and analytics performed using the data.