The Task Force, led by Professor Pam Thomas (Pro-Vice-Chancellor for Research at the University of Warwick) was established by Jo Johnson, the Minister of State for Universities. Research and Innovation, and charged with providing independent advice in the form of a ‘roadmap’ for the UK’s national open research data infrastructure. This first report of the Task Force, published today Friday 30th June 2017, has identified the following key challenges that will need to be addressed in any roadmap for open research data infrastructure. These include :
- Finding the data - The mechanisms available to identify and find research data “sub-optimal”. They work reasonably effectively in only a few subject areas, and for experts in those fields. There are competing metadata standards, and there is no equivalent, for research data, to the ‘full-text’ searching typically used by search engines.
- Using and Reusing the data- “Differing data structures and formats often demand disproportionate efforts before a range of datasets can be used and analysed effectively”. The lack of interoperability poses particular obstacles to those operating in interdisciplinary fields, or wanting to use data from outside their own specialist area. It is important to work with users to ensure that data is indeed re-usable. Data may need to be re-structured and re-purposed in a variety of ways to make it truly reusable but it may well be an inefficient use of researchers’ time and expertise to undertake this re-purposing themselves but it remains unclear who should then do so.
- Software issues - There has been some progress in software development however the report also states that “software skills and understanding remain variable across the research community, and career paths for software specialists in the research community still need attention”.
- Data quality - “It remains often the case that researchers do not trust other researchers’ data. The provision of detailed documentation on provenance and on analytical procedures are critically important; but requirements for quality assurance can be multi-layered, difficult and time-consuming, and responsibilities for ensuring that data does indeed conform to basic quality standards are frequently not clearly defined. It is frequently unclear to both creators and users of research data what has been or will be done, and by whom.”
- Automation - Too many aspects of data management and curation involve manual intervention, and this “constitutes a significant barrier against wider adoption of good practice”. The development of widespread tools to facilitate automated workflows would be of great help in reducing such barriers.
- Selection, storage, and preservation - Outside a few relatively well-known fields, there can prevail a lack of common understanding of what data (along with associated software and documentation) should be stored, where, when and how, at different stages of the research process and thereafter
- Security - Protecting against criminal activity of various kinds - hacking, fraud, DDOS attacks and so on – will increase in importance as the volumes of research data continue to grow, along with the sources from which it is drawn, and as the research data infrastructure grows in complexity. This may give rise to new kinds of tensions between the desire for openness and for data security.
The Task Forces’ Chair Professor Pam Thomas said:
“The Task Force will now proceed to develop a full roadmap, that will aim to tackle the range of issues we have identified in our first report, and how that might be resourced, which will be published in 2018. However we will also provide some more immediate help and guidance through the publication throughout this summer of a number of case studies focusing on best practice in the management of open research data across disciplines and institutions.”
The Task Force also identifies a number of areas and opportunities that could lead to positive actions to help resolve the issues it has identified and which are likely to feature in the final roadmap these include:
- Good Practice - Examples of good practice in a number of fields including Genomics, astronomy and crystallography
- Journals taking a lead - Concerns about reproducibility and replicability may drive editorial boards of scholarly journals to have a significant role to play here in adopting and implementing appropriate data policies.
- Skills - A growing recognition of the need to identify and rectify significant gaps in skills among researchers in many disciplines. There is a need for a clearer understanding of what is reasonable to expect a competent researcher to be able to do unaided. There is also a consensus that there is a lack sufficient numbers of support staff with the appropriate skills to support researchers in managing their data effectively, and that career paths for such are not clear and need to be addressed
- Incentives - There is widespread agreement on the need for carrots rather more than sticks, and data citation is often seen as a potential key incentive for researchers. But there would need to be clear payback for researchers from good data management and open data.