• Pattern-based segmentation of digital documents: model and implementation
  • di Iorio, Angelo <1977>


  • INF/01 Informatica


  • This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress).


  • 2007-04-16


  • Doctoral Thesis
  • PeerReviewed


  • application/pdf



di Iorio, Angelo (2007) Pattern-based segmentation of digital documents: model and implementation, [Dissertation thesis], Alma Mater Studiorum UniversitĂ  di Bologna. Dottorato di ricerca in Informatica , 19 Ciclo. DOI 10.6092/unibo/amsdottorato/370.