Expressiveness Benchmarking for System-level Provenance

Over the past decade a number of research prototypes that record provenance or other forms of rich audit logs at the operating system level. The last few years have seen the increasing use of such systems for security and audit, notably in DARPA's $60m investment in the Transparent Computing program. Yet the foundations for trust in such systems remains unclear; the correct behaviour of a provenance recording system has not yet been clearly specified or proved correct. Therefore, attempts to improve security through auditing provenance records may fail due to missing or inaccurate provenance, or misunderstanding the intentions of the system designers, particularly when integrating provenance records from different systems. Even worse, provenance recording systems are not even straightforward to test, because the expected behaviour is nondeterministic: running the same program at different times or different machines is guaranteed to yield different provenance graphs, and running programs with nontrivial concurrency behaviour typically also yields multiple possible provenance graphs with different structure. We believe that such systems can be formally specified and verified, and should be in order to remove complex provenance recording systems from the trusted computing base. However, formally verifying such a system seems to require first having an accepted formal model of the operating system kernel itself, which is a nontrivial undertaking. In the short term, we propose provenance expressiveness benchmarking, an approach to understanding the current behaviour of a provenance recording system. The key idea (which is simple in principle) is to generate provenance records for individual system calls or short sequences of calls, and for each one generate a provenance graph fragment that shows how the call was recorded in the provenance graph. The challenge is how to automate this process, given that provenance recording tools work in different ways, use different output formats, and generate different (but similar) graphs containing both target activity and background noise. I will present work on this problem so far, focusing on how to automate the NP-complete approximate subgraph isomorphism problems we need to solve to automatically extract benchmark results.

Added By:	Ms Amber Bu
Date Added:	21 May 2018 14:41
Creators Name:	James Cheney
Tags:	WAIS Research Seminar, WAIS seminar, Data Science, provenance
Viewing permissions:	University
Link:	http://edshare.soton.ac.uk/id/eprint/19346
Downloads & Views

Toolbox

There are no actions available for this resource.

Collection(s)

Web & Internet Science Seminar Recordings 2018