SoilWise HE progress report

ISRIC Fruitfull may 6th 2025

Fenny van Egmond, Thaisa van der Woude, Paul van Genuchten, Jiarong Li

2025-05-06

SoilWise Overview (2023-2027)

  • (yr1) Preparations and initial prototype development
  • (yr2) In preparation for 2nd software development iteration
  • (yr3) 2nd iteration
  • (yr4) 3rd iteration and wrap up

Overall findings

  • Difficult to identify on which aspects the project can contribute
  • Many technological partners on the project, limited soil researchers
  • Should we focus on improved technology, capacity building or showing where the gaps are

Data publication strategy

  • Capacity building on existing REA guidelines
  • Research existing and develop new conventions on top of REA guidelines to facilitate the Soil Data Community

Existing guidelines

  • Publish datasets and articles on endorsed repositories (is it persistent and harvested by OpenAire?)
  • Annotate the funding mechanism (Horizon Europe)
  • Publish supplemental materials (datasets) as individual assets
  • Use terms from common vocabularies (Gemet, AgroVoc, iso11074) as subject in metadata

Existing and new conventions

  • Describe a datamodel (colums of a dataset) using common vocabularies (iso28258, glosolan, glosis-ld, inspire)

Catalogue vs Repository

  • A catalogue typically ingests metadata from various sources on a specific domain
  • A repository archives resources with relevant metadata, typically identified by a DOI
  • Grant agreement claims SoilWise is a repository, we consider it a catalogue

Catalogue development

  • Starting point was the catalogue development for LSC, S4A, EJP
  • Extended with harvesting workflows, metadata is persisted in a postgres database
  • A SOLR backend is used for performance reasons
  • A link checker runs through all links in the catalogue on a weekly basis
  • Metadata is enriched using QA processes, partially based on NLP/LLM

Soil data harmonisation

  • The Soil Health Knowledge Graph aims to provide a standardised vocabulary for SoilHealth related projects
  • Data harmonisation efforts aim to find new approaches for harmonizing soil data, while keeping additional efforts by the soil scientisits at minimum

What’s in it for ISRIC?

  • The resource outcomes publication strategy will be a good input to an updated ISRIC strategy
  • The data harmonisation efforts and guidance will be an interesting source for future WOSIS and NSIS developments
  • A lot of practical experience in generative AI is collected, which will be of interest to future ISRIC projects
  • The novel catalogue components can be used in subsequent projects (AUSO, LSC)