-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display analysis information #2134
base: master
Are you sure you want to change the base?
Conversation
This commit introduces two new metadata fields: - apicall_count: total count of all API calls made in the sample - import_count: total count of Import symbols in the sample
Note, when using rutils.warn(), flake8 raises an error. So using rutils.bold() for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, pending the tests to succeed
I think this requires regenerating the files in |
Should be good to go once mandiant/capa-testfiles#239 is merged. |
@@ -96,7 +98,7 @@ def find_basic_block_capabilities( | |||
|
|||
def find_code_capabilities( | |||
ruleset: RuleSet, extractor: StaticFeatureExtractor, fh: FunctionHandle | |||
) -> Tuple[MatchResults, MatchResults, MatchResults, int]: | |||
) -> Tuple[MatchResults, MatchResults, MatchResults, FeatureSet]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the signature of a function is a breaking change, so this should wait until the next major release.
feature_counts.file = feature_count | ||
|
||
# cumulatively count the total number of Import features | ||
for feature, _ in file_features.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use .keys()
here to indicate that you won't use the value
@@ -19,6 +19,9 @@ | |||
|
|||
tabulate.PRESERVE_WHITESPACE = True | |||
|
|||
MIN_LIBFUNCS_RATIO = 0.4 | |||
MIN_API_CALLS = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did these numbers come from? and how should i interpret them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIN_LIBFUNCS_RATIO
: When the total count of library function present in a sample is less then 40%, we inform users that capa might pick false positive matches from other functions that would have been classified as library functions. I don't have any statistical data to back this up other than this hex-rays blogpost.MIN_API_CALLS
: When the sample has very few API calls, it is a strong indication that it might be packed/encrypted as regular programs tend to make a lot more than 10 calls (though, we have to run a benchmark across multiple sample to decide what's a good number here). For example this packed capa-testfile emits 0 API features, luckily we detect that it is packed with UPX. If that weren't the case, this banner could serve as an indication that the sample might packed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good explanations!
would you include the key parts here as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i'm interested to see how frequently this message is shown to users. I don't think our dogs will identify 40% of functions in most binaries, so i'm a little concerned this message will be shown too often.
have you had a chance to collect these stats against a large number of samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still helpful information since we know there's most likely more library code than we've identified.
Stepping back here for a moment, let's consider if we want to implement this differently:
That way we can handle the various limitations/warnings consistently. The core extraction logic still resides in capa but we don't have to extend the meta data. Related: should we provide functionality to easier leverage this in other tools? Right now other tools need to reimplement the logic we have in |
@mr-tz this would require many fewer breaking changes, which i like |
Closes #857.
This commit introduces two new metadata fields to result_document. Would this be considered a breaking change?
This would require regenrating the rdoc test files. see mandiant/capa-testfiles#239.
Checklist