Discount quality for responsible data science: Human-in-the-Loop for quality data ​


The technological boost in the capability of analyzing data for scientific research and reusing it in Open Science initiatives is enormous. The attempt to build data spaces, or data ecosystems, that support the publication and reuse of data for feeding data science pipelines has inspired several initiatives worldwide and in Europe. However, evaluating the quality of data and results can be highly resource-intensive, both in terms of computational power and human effort. In this scenario, the project is committed to responsible data science, with a human-in-the-loop (HITL) approach, focusing on making the whole process sustainable, both computationally and in terms of human effort.  

The original contributions of the project focus on data preparation, as it is known to reach up to 80% of the time required for data analysis, balancing the need to achieve a data quality level that makes the data “fit for use” in a given context and the effort needed for such a high-quality data preparation for a given analysis goal (“discount” quality). A theoretical basis and instruments will be developed to provide a minimally viable approach, which can be adapted to the context of use.


Two primary goals will be targeted to achieve sustainability:

1) reducing the computational effort required for data analysis ;

2) implementing Human-in-the-Loop (HITL) in a sustainable manner, ensuring human contributions are both impactful and minimized in terms of time and scope.