• Stochastic Modeling and Correlation Analysis of Omics Data
  • Budimir, Iva <1992>


  • FIS/07 Fisica applicata (a beni culturali, ambientali, biologia e medicina)


  • We studied the properties of three different types of omics data: protein domains in bacteria, gene length in metazoan genomes and methylation in humans. Gene elongation and protein domain diversification are some of the most important mechanisms in the evolution of functional complexity. For this reason, the investigation of the dynamic processes that led to their current configuration can highlight the important aspects of genome and proteome evolution and consequently of the evolution of living organisms. The potential of methylation to regulate the expression of genes is usually attributed to the groups of close CpG sites. We performed the correlation analysis to investigate the collaborative structure of all CpGs on chromosome 21. The long-tailed distributions of gene length and protein domain occurrences were successfully described by the stochastic evolutionary model and fitted with the Poisson Log-Normal distribution. This approach included both demographic and environmental stochasticity and the Gompertzian density regulation. The parameters of the fitted distributions were compared at the evolutionary scale. This allowed us to define a novel protein-domain-based phylogenetic method for bacteria which performed well at the intraspecies level. In the context of gene length distribution, we derived a new generalized population dynamics model for diverse subcommunities which allowed us to jointly model both coding and non-coding genomic sequences. A possible application of this approach is a method for differentiation between protein-coding genes and pseudogenes based on their length. General properties of the methylation correlation structure were firstly analyzed for the large data set of healthy controls and later compared to the Down syndrome (DS) data set. The CpGs demonstrated strong group behaviour even across the large genomic distances. Detected differences in DS were surprisingly small, possibly caused by the small sample size of DS which reduced the power of statistical analysis.


  • 2021-05-14


  • Doctoral Thesis
  • PeerReviewed


  • application/pdf



Budimir, Iva (2021) Stochastic Modeling and Correlation Analysis of Omics Data, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Fisica , 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9792.