NBN_resolver

Techniques for large-scale automatic detection of web site defacements.

Medvet, Eric

Subject

defacement

anomaly detection

genetic programming

intrusion detection

monitoring

large-scale

INGEGNERIA DELL'INFORMAZIONE

ING-INF/05 Sistemi di elaborazione delle informazioni

Description

2006/2007

Web site defacement, the process of introducing unauthorized modifications to a web site, is a very common form of attack. This thesis describes the design and experimental evaluation of a framework that may constitute the basis for a defacement detection service capable of monitoring thousands of remote web sites sistematically and automatically. With this framework an organization may join the service by simply providing the URL of the resource to be monitored along with the contact point of an administrator. The monitored organization may thus take advantage of the service with just a few mouse clicks, without installing any software locally nor changing its own daily operational processes. The main proposed approach is based on anomaly detection and allows monitoring the integrity of many remote web resources automatically while remaining fully decoupled from them, in particular, without requiring any prior knowledge about those resources. During a preliminary learning phase a profile of the monitored resource is built automatically. Then, while monitoring, the remote resource is retrieved periodically and an alert is generated whenever something "unusual" shows up. The thesis discusses about the effectiveness of the approach in terms of accuracy of detection---i.e., missed detections and false alarms. The thesis also considers the problem of misclassified readings in the learning set. The effectiveness of anomaly detection approach, and hence of the proposed framework, bases on the assumption that the profile is computed starting from a learning set which is not corrupted by attacks; this assumption is often taken for granted. The influence of leaning set corruption on our framework effectiveness is assessed and a procedure aimed at discovering when a given unknown learning set is corrupted by positive readings is proposed and evaluated experimentally. An approach to automatic defacement detection based on Genetic Programming (GP), an automatic method for creating computer programs by means of artificial evolution, is proposed and evaluated experimentally. Moreover, a set of techniques that have been used in literature for designing several host-based or network-based Intrusion Detection Systems are considered and evaluated experimentally, in comparison with the proposed approach. Finally, the thesis presents the findings of a large-scale study on reaction time to web site defacement. There exist several statistics that indicate the number of incidents of this sort but there is a crucial piece of information still lacking: the typical duration of a defacement. A two months monitoring activity has been performed over more than 62000 defacements in order to figure out whether and when a reaction to the defacement is taken. It is shown that such time tends to be unacceptably long---in the order of several days---and with a long-tailed distribution.

Il web site defacement, che consiste nell'introdurre modifiche non autorizzate ad un sito web, è una forma di attacco molto comune. Questa tesi descrive il progetto, la realizzazione e la valutazione sperimentale di una sistema che può costituire la base per un servizio capace di monitorare migliaia di siti web remoti in maniera sistematica e automatica. Con questo sistema un'organizzazione può avvalersi del servizio semplicemente fornendo l'URL della risorsa da monitorare e un punto di contatto per l'amministratore. L'organizzazione monitorata può quindi avvantaggiarsi del servizio con pochi click del mouse, senza dover installare nessun software in locale e senza dover cambiare le sue attività quotidiane. Il principale approccio proposto è basato sull'anomaly detection e permette di monitorare l'integrita di molte risorse web remote automaticamente rimanendo completamente distaccato da queste e, in particolare, non richiedendo nessuna conoscenza a priori delle stesse. Durante una fase preliminare di apprendimento viene generato automaticamente un profilo della risorsa. Successivamente, durante il monitoraggio, la risorsa è controllata periodicamente ed un allarme viene generato quando qualcosa di "unusuale" si manifesta. La tesi prende in considerazione l'efficacia dell'approccio in termini di accuratezza di rilevamento---cioè, attacchi non rilevati e falsi allarmi generati. La tesi considera anche il problema dei reading mal classificati presenti nel learning set. L'efficiacia dell'approccio anomaly detection, e quindi del sistema proposto, si basa sull'ipotesi che il profilo è generato a partire da un learning set che non è corrotto dalla presenza di attacchi; questa ipotesi viene spesso data per vera. Viene quantificata l'influenza della presenza di reading corrotti sull'efficacia del sistema proposto e viene proposta e valutata sperimentalmente una procedura atta a rilevare quando un learning set ignoto è corrotto dalla presenza di reading positivi. Viene proposto e valutato sperimentalmente un approccio per la rilevazione automatica dei defacement basato sul Genetic Programming (GP), un metodo automatico per creare programmi in termini di evoluzione artificiale. Inoltre, vengono valutate sperimentalmente, in riferimento all'approccio proposto, un insieme di tecniche che sono state utilizzate per progettare Intrusion Detection Systems, sia host based che network-based. Infine, la tesi presenta i risultati di uno studio su larga scala sul tempo di reazione ai defacement. Ci sono diverse statistiche che indicano quale sia il numero di questo tipo di attacchi ma manca un'informazione molto importante: la durata tipica di un defacement. Si è effettuato un monitoraggio di oltre 62000 pagine defacciate per circa due mesi per scoprire se e quando viene presa una contromisura in seguito ad un defacement. Lo studio mostra che i tempi sono inaccettabilmente lunghi---dell'ordine di molti giorni---e con una distribuzione a coda lunga.

XX Ciclo

1979

Date

2008-04-23T08:33:12Z

2008-04-23T08:33:12Z

2008-03-18

Type

Doctoral Thesis

Format

application/pdf

application/pdf

application/pdf

Identifier

http://hdl.handle.net/10077/2579

urn:nbn:it:units-5784