• A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads
  • Dimartino, Paola <1992>

Subject

  • MED/03 Genetica medica

Description

  • Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its declining costs and the development of new platforms that help clinicians in the analysis and interpretation of SNV and InDels. However, we still know very little on how CNV detection could increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, all showing good performances towards specific CNV classes and sizes, suggesting that the combination of multiple tools is needed to obtain an overall good detection performance. Here we present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2 Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X alignments to construct our dataset and train our model, make predictions on autosomes target regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that our algorithm outperformed them in terms of stability, as we identified both deletions and duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the minimum resolution of 2 target regions. We also evaluated the method robustness using a set of WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic setting.

Date

  • 2022-06-16

Type

  • Doctoral Thesis
  • PeerReviewed

Format

  • application/pdf

Identifier

urn:nbn:it:unibo-28671

Dimartino, Paola (2022) A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads, [Dissertation thesis], Alma Mater Studiorum UniversitĂ  di Bologna. Dottorato di ricerca in Data science and computation , 33 Ciclo. DOI 10.48676/unibo/amsdottorato/10374.

Relations