Next Generation Set of Evidence Gathering Tools and Techniques

by Somayeh Kargaran (SCCH) at 20 February, 2025

A key objective of the EMERALD project is to establish a unified view of the cloud service under certification by extracting and enriching knowledge from different layers of the service and providing suitable evidence for security metrics. We, therefore, are working on providing a next generation set of evidence gathering tools and techniques based on a certification knowledge graph. This will be achieved by supporting an improved and unified tool-supported approach to continuously extract knowledge from various sources, e.g., infrastructure, application, machine learning (ML) models, and policy documents. A graph-based model, i.e. the CertGraph ontology, serves as a common structure that is filled by all evidence extractors.

The main achievements so far are first versions of prototypes for the following EMERALD evidence extractors:

The Clouditor-Discovery component is designed to extract runtime information from cloud services. It focuses on generating evidence for security-related configurations, such as encryption in use, encryption at rest and restricted ports. The prototype has been enhanced to incorporate a variety of discovered resources and the use of the newly developed owl2proto tool for the automatic generation of the necessary ontology objects.
Two source evidence extractors, Codyze and eknows-e3, support the assessment of security and compliance of a cloud application’s source code. Both support the same evidence format and complement each other, which shows that the EMERALD solution is not tied to a specific tool. In addition, a new component Codyze-Provenance adds runtime evidence extraction capabilities to Codyze by creating a verifiable trail of evidence from source code to running cloud services and applications.
AI-SEC supports evidence extraction for ML models and aims at identifying critical security-related features. The prototype provides a comprehensive toolkit for evaluating and improving the security of ML models by focusing on adversarial robustness testing, privacy vulnerability assessment, data poisoning attacks, and model interpretability.
The evidence extractor AMOE extracts relevant information based on security metrics from different policy documents. It is built using different natural language processing (NLP) libraries and pre-trained artificial intelligence (AI) models to identify relevant text segments related to security related features, as defined in the respective EMERALD metrics based on specific controls and security requirements of various security schemes.

[ TECHNICAL ADVANCEMENTS ]