-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Map Extracted Files to Artifact Definitions in image_export.py #4949
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4949 +/- ##
==========================================
+ Coverage 85.05% 85.11% +0.06%
==========================================
Files 431 432 +1
Lines 38648 38822 +174
==========================================
+ Hits 32873 33045 +172
- Misses 5775 5777 +2 ☔ View full report in Codecov by Sentry. |
""" | ||
artifact_path_segments = self._GetNonEmptyPathSegments( | ||
artifact_path, artifact_path_seperator) | ||
sanitized_path_segments = path_helper.PathHelper.SanitizePathSegments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sanitize the path here? doesn't that cause matches that incorrect matches given the sanitation is lossy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing so because _CreateSanitizedDestination
is called inside _ExtractDataStream
in image_export.py
while building the target_directory
, target_filename
of the output and _CreateSanitizedDestination
calls SanitizePathSegments
to sanitize the path under the hood, thus catching cases where sanitized path won't match the path extracted from the trie
@classmethod | ||
def SanitizePathSegments(cls, path_segments): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend we move this to the cli submodule and only use it for CLI and log output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will wait for you reply to the comment above regarding comparing path of extracted artifact to the sanitized version of the path extracted from the trie as well. If we remove this, I will move this back to the cli sub module. but moving it there now will introduce cyclic import as cli module import artifact_trie which needs to access this method
Feature: Map Extracted Files to Artifact Definitions in image_export.py
Description:
This PR adds an optional feature to Plaso's image_export.py tool to generate a JSON file mapping extracted files to the artifact definitions that led to their extraction. This mapping provides valuable context about the extracted files.
Functionality:
The new
--enable_artifacts_map
flag activates this feature. When enabled, the tool creates anartifacts_map.json
file in the output directory. This file contains a dictionary where:Keys: Artifact definition names (e.g.,
JupyterConfigFile
,SshdConfigFile
,WindowsEnvironmentVariableComSpec
).Values: Lists of extracted file paths (relative to the output directory) that matched the corresponding artifact definition.
This command would produce an
artifacts_map.json
file similar to:This output indicates that the files
etc/ssh/sshd_config
andhome/dummyuser/.jupyter/jupyter_notebook_config.py
were extracted because they matched theSshdConfigFile
andJupyterConfigFile
artifact definitions, respectively.Registry Artifacts:
For artifacts that rely on Windows Registry keys or values (e.g.,
WindowsEnvironmentVariableComSpec
), the tool automatically extracts the relevant registry hive files (e.g.,SYSTEM
,SOFTWARE
,NTUSER.DAT
). Theartifacts_map.json
will map these hive files to both:The artifact that directly triggered the hive's extraction (e.g.,
WindowsSystemRegistryFiles
).Any artifacts that rely on data within those hives (e.g.,
WindowsEnvironmentVariableComSpec
).Example with Registry Artifacts:
If you run
image_export.py
with--artifact_filters WindowsEnvironmentVariableComSpec
, theartifacts_map.json
might contain:This shows that the
SYSTEM
,SOFTWARE
, and other hive files were extracted because of bothWindowsSystemRegistryFiles
andWindowsEnvironmentVariableComSpec
, the mapped paths will be relative to the provided output path under the--write
argument.Technical Details:
The core of this feature is the ArtifactsTrie class, which stores artifact definition paths in a Trie (prefix tree) data structure.
Artifacts Trie Structure
Example Trie:
Matching Logic
Paths are normalized to use os.sep as the separator.
The
GetMatchingArtifacts
method traverses the Trie based on input path segments, usingfnmatch.fnmatch
for glob matching.**
is handled recursively to match zero or more directory levels.Source Type Handling
When the input to the tool is:
dfvfs.FileSystem
object of typeOS
is created, with adfvfs.FileSystemSearcher
using the input directory as the mount point. The tool extracts files matching the FindSpec's criteria within this directory.ExtractPathSpecs
yields the input file path directly without searching, as it's assumed that a user-provided file path should be extracted.Added safeguard check to exit and print if input is file, this tool can handle images, block devices and hierarchy of directories from the evidence system