Overview
The cost of not making your data FAIR has been estimated to be €10.2B per year within the EU alone, but how do you make your data FAIR? What knowledge, skills and aptitudes do you need or have to learn? There is a wide variety of Good Data Management and FAIR training materials and courses available, but how can you find them, or learn about the skills and expertise you need or will learn from them? We need terminology to define and link the competencies necessary to make and keep data FAIR, to enable the annotation and aggregation of FAIR training materials and the creation of FAIR curricula.
A community of FAIR researchers, training providers and ontologists assembled to create the terms4FAIRskills initiative (https://terms4fairskills.github.io/) to address this challenge. Thanks to an award from the EOSC Co-creation committee, we have delivered a proof-of-concept terminology to describe the skills and competencies necessary to make and keep data FAIR. This terminology supports cross-domain and cross-repository searching for training materials by the skills and competencies they require and confer.
Objectives & Challenges
This terminology can be applied to a variety of use cases, including but not limited to the:
- creation and assessment of stewardship curricula;
- annotation, discovery and evaluation of FAIR-enabling materials (e.g. training) and resources;
- formalisation of job descriptions and CVs with recognised, structured competencies.
The completed terminology is of use to trainers teaching FAIR data skills, researchers who wish to identify skill gaps in their teams, and managers who need to recruit individuals with specific skills and competencies, and to develop FAIR competency-based reward structures and training plans.
On a strategic level, these aims are confluent with the overall ambitions of the EOSC. Digital Skills for FAIR and Open Science (2021) identifies ten actors (including their roles and related skills) in the EOSC ecosystem for whom skills and training are relevant. The mature terminology is an asset for these audiences in the use cases listed above, as EOSC implements its vision of a web of FAIR data and related services for science, making research data interoperable and machine-actionable following the FAIR guiding principles.
Main Findings
As of the latest release (https://github.com/terms4fairskills/FAIRterminology/), the terms4FAIRskills terminology has 552 terms, split across five concepts. Of these, 261 are imported from the CASRAI RDM glossary. The remainder were created via workshops prior to or as part of the EOSC co-creation award.
Through a series of virtual workshops, we iteratively refined the model, terms, their definitions, and relationships. Thanks to contributions from our two use cases - the RDA/CODATA Summer Schools and the ELIXIR Training portal TeSS (https://tess.elixir-europe.org/), we have been able to build the terminology from just over 250 terms with no relationships between them to the current 552 terms with 3 unidirectional and 8 bidirectional relationships.
We will continue to maintain and develop terms4FAIRskills. We are now exploring using terms4FAIRskills in the ELIXIR Training Portal and the EOSC-Life FAIRassist.org tool (https://fairassist.org). The latest release, plus our issue tracker, can always be found on our GitHub repository (https://github.com/terms4fairskills/FAIRterminology) alongside the latest news, use cases and community calls on our website (https://terms4fairskills.github.io/).
Main Recommendations
Thanks to the EOSC Co-creation fund, we have been able to take a number of important steps forward. Firstly, through increased and repeated interaction with the community, we have been able to grow and refine the terminology itself. The terminology now has a greater number of terms, definitions and relationships. More importantly, the terminology model has been refined to better serve the two use cases and to provide richer linked annotation. Terms4FAIRskills has also provided an excellent use case as a pilot for the Semaphora annotation tool, which has benefited from use in the hack sessions and has also provided a suitable environment for increased and more accurate annotation.
Based on our experiences, we offer the following recommendations:
Plurality: Development teams should include ontologists and annotators from varied disciplines/sectors appropriate to the use cases, to ensure the terminology does not rely upon discipline/sector-specific assumptions;
Communication is key:
- Teams should provide for sufficient effort for engagement and communication as - particularly in short-term, agile work - it is critical that community engagement is focused, relevant, quick and responsive, and that engagement is tracked and timely communications are made on schedule;
- Terminology building projects, such as this one, should include the community at all stages of development, both in terms of the development and performance of annotations but also the competency questions the terminology is designed for. This will reduce the opportunity for scope creep and will ensure both the annotations and the terminology are tightly defined and appropriate for the use cases;
- Communication is as clear and inclusive as possible. With any community project, it’s important to ensure the aims are clear and agreed upon among all parties and that a governance system, however basic, is in place. In addition, frequent communication with end-users is fundamental;
- Openness - It’s important that the principles of Open Science are followed in any community project, but particularly in a project where a terminology is being built by one stakeholder for use by others;
Best Practice: Work, where possible and available, follows accepted community standards, such as the FAIR semantics and OBO Foundry guidelines as this will ensure the best possible practice.