• Big Code Applications and Approaches
  • Del Bonifro, Francesca <1992>

Subject

  • INF/01 Informatica

Description

  • The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed.

Date

  • 2022-06-23

Type

  • Doctoral Thesis
  • PeerReviewed

Format

  • application/pdf

Identifier

urn:nbn:it:unibo-28544

Del Bonifro, Francesca (2022) Big Code Applications and Approaches, [Dissertation thesis], Alma Mater Studiorum UniversitĂ  di Bologna. Dottorato di ricerca in Computer science and engineering , 34 Ciclo. DOI 10.48676/unibo/amsdottorato/10255.

Relations