Synthetic Data Evaluation Metrics
ELIXIR-led scoping review on evaluation metrics for synthetic data across genomics, transcriptomics, proteomics, phenomics, imaging and electronic health records.
Overview
This work was conducted within the ELIXIR Machine Learning Focus Group and focuses on one of the major open challenges in synthetic data research: how to systematically evaluate the quality, reliability and utility of synthetic datasets in life sciences.
Following PRISMA guidelines, we systematically reviewed more than 8000 records from multiple scientific databases and curated 188 publications to identify evaluation metrics currently used across multiple biomedical domains.
The review explored synthetic data evaluation practices in:
- genomics
- transcriptomics
- proteomics
- phenomics
- medical imaging
- electronic health records (EHRs)
My contribution focused on literature curation, metric extraction, analysis of evaluation methodologies and manuscript preparation.
Related Publication
The paper was published in NAR Genomics and Bioinformatics and is available here.
Funding
This work received funding from the SYNTHIA project and was supported by ELIXIR, the European research infrastructure for life-science data.