← Back to projects

Synthetic Data Evaluation Metrics

ELIXIR-led scoping review on evaluation metrics for synthetic data across genomics, transcriptomics, proteomics, phenomics, imaging and electronic health records.

Synthetic Data AI Evaluation Life Sciences
Synthetic Data Evaluation Metrics

Overview

This work was conducted within the ELIXIR Machine Learning Focus Group and focuses on one of the major open challenges in synthetic data research: how to systematically evaluate the quality, reliability and utility of synthetic datasets in life sciences.

Following PRISMA guidelines, we systematically reviewed more than 8000 records from multiple scientific databases and curated 188 publications to identify evaluation metrics currently used across multiple biomedical domains.

The review explored synthetic data evaluation practices in:

  • genomics
  • transcriptomics
  • proteomics
  • phenomics
  • medical imaging
  • electronic health records (EHRs)

My contribution focused on literature curation, metric extraction, analysis of evaluation methodologies and manuscript preparation.

The paper was published in NAR Genomics and Bioinformatics and is available here.

Funding

This work received funding from the SYNTHIA project and was supported by ELIXIR, the European research infrastructure for life-science data.