← Back to projects

Synth4Bench

Framework for generating synthetic genomics datasets for benchmarking tumor-only somatic variant callers.

Synthetic Data Variant Calling Benchmarking
Synth4Bench

Overview

Synth4Bench is a framework for generating synthetic genomics datasets designed to support the systematic benchmarking of tumor-only somatic variant calling algorithms.

The project addresses a key challenge in cancer genomics: evaluating variant callers when high-quality ground truth data are limited or unavailable. By generating synthetic datasets with known variants, Synth4Bench enables controlled experiments where sequencing depth, read length, allele frequency and variant characteristics can be adjusted and evaluated.

My work focused on the design and implementation of the synthetic data generation workflow, the benchmarking strategy and the downstream analysis used to compare variant calling tools against known ground truth.

Code and Data

The code is available on GitHub and all data on Zenodo

  • Synth4Bench: Generating Synthetic Data for Benchmarking Tumor-Only Somatic Variant Calling Algorithms
    Manuscript in preparation / under review but preprint is available in bioRxiv

Funding

This work is related to the SYNTHIA project and was developed as part of my PhD research.