A comparison of research data management platforms: architecture, flexible metadata and interoperability
Summary (2 min read)
1 Introduction
- The number of published scholarly papers is steadily increasing, and there is a growing awareness of the importance, diversity and complexity of data generated in research contexts [25].
- Implementation costs, architecture, interoperability, content dissemination capabilities, implemented search features and community acceptance are also taken into consideration.
- This evaluation considers aspects relevant to the authors’ ongoing work, focused on finding solutions to research data management, and takes into consideration their past experience in this field [33].
- Moreover, the authors look at staging platforms, which are especially tailored to capture metadata records as they are produced, offering researchers an integrated environment for their management along with the data.
- As datasets become organised and described, their value and their potential for reuse will prompt further preservation actions.
3 Scope of the analysis
- The stakeholders in the data management workflow can greatly influence whether research data is reused.
- The selection of platforms in the analysis acknowledges their role, as well as the importance of the adoption of community standards to help with data description and management in the long run.
- On the other hand, such solutions are usually harder to install and maintain by institutions in the so-called long tail of science—institutions that create large numbers of small datasets, though do not possess the necessary financial resources and preservation expertise to support a complete preservation workflow [18].
- The Fedora framework3 is used by some institutions, and is also under active development, with the recent release of Fedora 4.
- The former includes aspects such as how they are deployed into a production environment, the locations where they keep their data, whether their source code is available, and other aspects that are related to the compliance with preservation best practices.
4 Platform comparison
- Based on the selection of the evaluation scope, this section addresses the comparison of the platforms according to key features that can help in the selection of a platform for data management.
- Adopting a dynamic approach to data management, tasks can be made easier for the researchers, and motivate them to use the data management platform as part of their daily research activities, while they are working on the data.
- This platform is flexible, available under an open-source license, and compatible with several metadata representations, while still providing a complete API.
- While the evaluated platforms have different description requirements upon deposit, most of them lack the support for domainspecific metadata schemas.
- This search feature makes it easier for researchers to find the datasets that are from relevant domains and belong to specific collections or similar dataset categories (the concept varies between platforms as they have different organizational structures).
5 Data staging platforms
- Most of the analyzed solutions target data repositories, i.e. the end of the research workflow.
- These requirements have been identified by several research and data management institutions, who have implemented integrated solutions for researchers to manage data not only when it is created, but also throughout the entire research workflow.
- It provides researchers with 20GB of storage for free, and is integrated with other modules for dataset sharing and staging, including some computational processing on the stored data.
- Dendro is a single solution targeted at improving the overall availability and quality of research data.
- Curators can expand the platform’s data model by loading ontologies that specify domain-specific or generic metadata descriptors that can then be used by researchers in their projects.
6 Conclusion
- The evaluation showed that it can be hard to select a platform without first performing a careful study of the requirements of all stakeholders.
- Its features and the extensive API making it also possible to use this repository to manage research data, making use of its keyvalue dictionary to store any domain-level descriptors.
- A very important factor to consider is also the control over where the data is stored.
- The authors consider that these solutions should be compared to other collaborative solutions such as Dendro, a research data mana- gement solution currently under development.
- This should, of course, be done while taking into consideration available metadata standards that can contribute to overall better conditions for long-term preservation [36].
Did you find this useful? Give us your feedback
Citations
634 citations
83 citations
28 citations
25 citations
Cites background from "A comparison of research data manag..."
...Although several studies have promoted the benefits of open data, the latest research demonstrates a low willingness to share data across those platforms [8,9]....
[...]
...Considering the infrastructures and needs of European universities and research institutions, research has primarily examined researchers’ technical requirements and expectations towards technology [8]....
[...]
23 citations
References
31 citations
"A comparison of research data manag..." refers background in this paper
...The growth in the number of research publications, combined with a strong drive towards open-access policies [8, 10], continues to foster the development of opensource platforms for managing bibliographic records....
[...]
28 citations
24 citations
24 citations
"A comparison of research data manag..." refers background in this paper
...Institutions are also motivated to have their data recognized and preserved according to the requirements of funding institutions [16, 25]....
[...]
24 citations
"A comparison of research data manag..." refers background in this paper
...Several comparative studies between existing solutions were already carried out in order to evaluate different aspects of each implementation, confirming that this is an issue with increasing importance [3, 6, 15]....
[...]