• High-dimensional and one-class classification
  • Fortunato, Francesca <1990>

Subject

  • SECS-S/01 Statistica

Description

  • When dealing with high-dimensional data and, in particular, when the number of attributes p is large comparatively to the sample size n, several classification methods cannot be applied. Fisher's linear discriminant rule or the quadratic discriminant one are unfeasible, as the inverse of the involved covariance matrices cannot be computed. A recent approach to overcome this problem is based on Random Projections (RPs), which have emerged as a powerful method for dimensionality reduction. In 2017, Cannings and Samworth introduced the RP method in the ensemble context to extend to the high-dimensional domain classification methods originally designed for low-dimensional data. Although the RP ensemble classifier allows improving classification accuracy, it may still include redundant information. Moreover, differently from other ensemble classifiers (e.g. Random Forest), it does not provide any insight on the actual classification importance of the input features. To account for these aspects, in the first part of this thesis, we investigate two new directions of the RP ensemble classifier. Firstly, combining the original idea of using the Multiplicative Binomial distribution as the reference model to describe and predict the ensemble accuracy and an important result on such distribution, we introduce a stepwise strategy for post-pruning (called Ensemble Selection Algorithm). Secondly, we propose a criterion (called Variable Importance in Projection) that uses the feature coefficients in the best discriminant projections to measure the variable importance in classification. In the second part, we faced the new challenges posed by the high-dimensional data in a recently emerging classification context: one-class classification. This is a special classification task, where only one class is fully known (the target class), while the information on the others is completely missing. In particular, we address this task by using Gini's transvariation probability as a measure of typicality, aimed at identifying the best boundary around the target class.

Date

  • 2018-05-08

Type

  • Doctoral Thesis
  • PeerReviewed

Format

  • application/pdf

Identifier

urn:nbn:it:unibo-23382

Fortunato, Francesca (2018) High-dimensional and one-class classification, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Scienze statistiche , 30 Ciclo. DOI 10.6092/unibo/amsdottorato/8412.

Relations