In the last few months, large parts of the planned GeoERA Project Vocabularies have been completed. The majority of the vocabulary (a total of approx. 3000 concepts) describes:
- Fault systems and their systematics across borders in the countries of all HIKE, HotLime or GeoConnect3d participants.
- The fields of ornamental stones, geothermal energy or groundwater are dealt with in other vocabularies.
This vocabulary helps clarifying cross border terminology, for instance, scientific concepts and terms or names used in every GeoERA Project.
This system for “GeoERA project vocabularies” is also part of EGDI (European geological data infrastructure) and is used where no standardized code lists (e.g. INSPIRE, or GeoSciML) are applicable. Examples for such scientific concepts could be names of geologic formations, descriptive texts in geological maps, geological cross sections, named fault systems, groundwater bodies, named mineral deposits, regional or historical names of time-periods, and so on.
In principle, all vocabularies are published online according to “Linked Data” standards and can be reused and expanded in follow up projects. Through the joint use of these controlled vocabularies for data annotation, geodata sets also become harmonized (in the meaning of language) via semantic relations.
Additionally a newly compiled “GeoERA Keyword Thesaurus” (with approx. 2500 concepts, 10 languages) allows Search System to look for matches between the free text typed by the user and metadata keywords, taking into account the relationships (parent-child, etc.) established in the thesaurus. Thus, we can classify the Search System output according to the similarity with the typed text. Furthermore, the multilingual thesaurus feature enables searching in several languages.
In the Search System, the search string typed by the user is processed to improve and enrich the search, allowing users to get the desired datasets, even if:
- The language of their metadata items is different to the one used by the user in the search.
- The typed word is not in the metadata, but a similar term is present.
The thesaurus is a key element in the enrichment of the search, as it makes it possible to obtain narrower, broader and related terms and translations related to each of the words in the search string.
Documentation at https://github.com/GeoEra-GIP/WP4-Semantics/tree/master/Keyword%20Thesaurus
GIP-P is currently working on the technical implementation of the functionality of the “Project Vocabularies”, for example on websites or applications for querying, also in connection with online maps.
Next steps in May 2021 will be to change the test environment (see above) into a productive system at https://data.geoscience.earth/ncl/geoera and EDGI portal at http://www.europe-geology.eu/. This is the final part of our project with the presentation of our results, functionally integrated in the EGDI portal, as well as machine-readable published via a Sparql endpoint (Web API).