Abstract: | Both human analysts and particularly automated tool suites are capable of deriving sensitive information and conclusions from
collections of data items that individually cannot be considered critical or sensitive. This activity of analysing and correlating
material that is not immediately related is, in fact, highly desirable in many application areas and cannot be controlled
precisely in advance. The decision whether a program or an analyst is performing searches and correlations beyond the scope
of his authorisation or current mission can frequently be determined only ex post based on a heuristic analysis of documents
accessed.
In this paper we describe a mechanism for the instrumentation of operating systems to obtain information on the documents
and resources accessed by arbitrary processes. Such a mechanism could be an important component of the infrastructure of an
operational risk management system, generating an audit trail for compliance and forensic investigation, and acting as a sensor
generating data for analysis. Addressing the latter application, the paper also outlines an approach for extracting textual
information and metadata from accessed documents, regardless of the application program and workflow mechanisms used, without
unduly impeding either workflows or operator performance.
This information can then be subjected to an heuristic analysis based on natural language processing to extract the semantic
context of each document or segment. Clustering this content and extracting the conceptual patterns that a user has accessed
can then allow abnormal behaviour to be identified. This can then be refined further to determine heuristically whether the
authorised remit of the user has been breached and whether an investigation is warranted. We argue that the risk of misbehaviour
can be reduced while at the same time increasing productivity. This is made possible by enhancing the degree of freedom for
individual users to act in the interest of their mission objectives and at the same time providing automated mechanisms for
analysing user behaviour. |