dynamic: add "span of calls" scope #2532

williballenthin · 2024-12-09T13:36:29Z

This PR implements the dynamic "span of calls" scope introduced here: mandiant/capa-rules#951

In summary, we want a way to match across calls (in dynamic mode) without resorting to the entire thread (which may be very long, like thousands of events). So, we add a new scope "span of calls" that represents the sliding 20-tuples of calls across each thread. Rules can match against any set of logic within each of these 20-tuples.

For example, consider the initial behavior of thread 3064 in our test CAPE file 0000a657:

This is a long thread with many calls, so yesterday it was tough to write a rule for any behavior that spans multiple calls without introducing false positives. Consider matching on the dynamic resolution and invocation of AddVectoredExceptionHandler. Now we can write a rule like:

So, within a region of 20 calls, match all this logic.

Here's what the output looks like:

The implementation is pretty easy: maintain a deque of the trailing 20 call events, merging and matching those features.

I picked 20 fairly randomly. I think we can tweak this number as necessary. Smaller and its harder to match logic. Larger and the performance might decrease a bit, and then there's more FP possibility. But I don't think this is too risky.

I think this will affect runtime a bit, since we're matching features twice for each call event (one for the precise call event, one for the sliding window).

There's probably some edge cases to work out around overlapping windows. Consider a rule that matches a single call event within a sequence: that call event is contained by 20 sequences (some covering the events before, some covering the events after). So, we may have to do a little more work (TODO) to not emit those matches twice. I'm not precisely sure of the behavior at this moment. I'll write a test for it.

Checklist

changelog update needed
documentation needed

CHANGELOG updated or no update needed, thanks! 😄

williballenthin · 2024-12-09T13:43:50Z

we also may want to update the vverbose render to only show each call event once, leaving the match details to a separate section, maybe like:

sequence: processs1, pid, tid, calls{1, 2}
  and:
    api: CreateFile @ call{1}
    api: CloseFile @ call{2}
  referenced call events:
    call{1}: CreateFile
    call{2}: CloseFile

williballenthin · 2024-12-09T13:47:44Z

@jorik-utwente FYI

capa/capabilities/dynamic.py

williballenthin · 2024-12-09T13:52:26Z

I realize I dropped this PR without much warning 😇 I went from "I wonder how this would work" to "huh, it seems to work OK" pretty quickly.

mr-tz

awesome, this looks very promising already!!

major things to discuss include the naming and potentially handling of loops

capa/capabilities/dynamic.py

mr-tz · 2024-12-09T15:27:17Z

CHANGELOG.md

@@ -4,6 +4,8 @@

 ### New Features

+- add dynamic sequence scope for matching nearby calls within a thread #2532 @williballenthin


naming alternatives to sequence (matching occurs in any order): span, ngram, group/cluster

+1 cluster

"window", "slice", "range"

math: multiset (or bag, or mset) - https://en.wikipedia.org/wiki/Multiset

multiple instances of same object

order doesn't matter

optionally prefix with "call", e.g., callbag, callcluster?

To summarize: I don't think we should use the term "sequence" because it implies that the order of the events matter. capa doesn't match with any care for the order of API calls, so we don't want users to think they can rely on that.

Some reasonable alternatives:

span

group

cluster

window

range

Other terms, which work, but are more technical/jargon:

ngram

multiset

bag

As mentioned by @mr-tz, we can (should?) use a prefix, like "call span" or "call range".

I think I most prefer "range" and "span".

The candidates "call range" or "call span" make it seem like the range/span are characteristics of a particular call, rather than a collection of calls. Therefore, maybe we should use "range of calls" or "span of calls" within the rule text and documentation.

So I'd propose: "range of calls"

(in the future, if we supported configurable sequences sizes, we could make the name like: "range of 20 calls" which is fairly nice.)

I'm going to update the PR with the proposed new name here, but I would very much like feedback @mike-hunhoff @mr-tz @fariss @yelhamer and anyone else.

"range of calls" is a good name for this new scope. It makes the intention clear and, as mentioned, can be easily expanded to in the future, e.g. "range of 20 calls".

so i lost this thread (i had a link below that stopped working and thought GH deleted it) and in the interim made a guess at what i had just concluded and renamed things "span of calls". does that work? or do you think its worthwhile to swap over to "range"?

No worries, I meant "span" but it came out "range" because I had just finished reading your comments above and it was on my mind 😅

The definition of "span" works great for this scope:

the full extent of something from end to end; the amount of space that something covers.
"a warehouse with a clear span of 28 feet"

So no changes needed from my perspective

capa/capabilities/dynamic.py

williballenthin · 2024-12-09T15:51:09Z

potentially handling of loops

Good point. I think we'd want to see how this works in practice against a large number of samples and the rules we can translate to use this construct. In particular, loops (like you say) such as you'd see in ransomware.

mike-hunhoff

Great work, I'm excited about where this is going for an initial implementation. I echo a few of @mr-tz 's comments/concerns. Additionally, the value 5 comes close to being too small for some of our existing rules, e.g. https://github.com/mandiant/capa-rules/blob/e033410c8910f8b46718a5eefd9f0c7768be1b99/communication/c2/shell/create-reverse-shell.yml#L19-L23 so we'll need to do some additional work to find the sweet spot.

capa/capabilities/dynamic.py

mr-tz

I spent a few moments focusing on the core extension here and added some places for additional documentation.

capa/capabilities/dynamic.py

tests/test_dynamic_sequence_scope.py

williballenthin · 2024-12-12T15:34:52Z

computing the features for the sequence, which involves merging features from many calls, seems to take quite a bit of time:

i'll have to think on whether there's a creative way to optimize this

profile information

before: sequence length: 20

before: sequence length: 0

(convenient this works!)

optimized, sequence length 1 and 20:

conclusion:

So, there's a bit of overhead to use this new algorithm, but it's independent of SEQUENCE_LENGTH, which is desirable.

capa/capabilities/dynamic.py

mr-tz · 2024-12-16T09:38:22Z

TODO?!

test sequence scope with submatch (call scope)
test sequence scope with submatch (sequence scope)
test sequence scope with submatch (thread or other scope - error?)

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

to ensure its not modified by reference after we expect it to be

foo

addresses discussion in mandiant/capa-rules#951 pep8 sequence: add test showing multiple sequences overlapping a single event

also, for repeating behavior, match only the first instance.

sequence: add more tests

contains the call ids for all the calls within the sequence, so we know where to look for related matched. sequence: refactor SequenceMatcher sequence: don't use sequence addresses sequence: remove sequence address

…ents

pep8

CHANGELOG updated or no update needed, thanks! 😄

capa/capabilities/dynamic.py

pep8 fix ref update submodules update testfiles submodule duplicate variable

mike-hunhoff · 2025-01-17T20:37:40Z

capa/render/verbose.py

+def render_span_of_calls(layout: rd.DynamicLayout, addrs: list[frz.Address]) -> str:
+    calls: list[capa.features.address.DynamicCallAddress] = [addr.to_capa() for addr in addrs]  # type: ignore
+    for call in calls:
+        assert isinstance(call, capa.features.address.DynamicCallAddress)
+
+    pname = _get_process_name(layout, frz.Address.from_capa(calls[0].thread.process))
+    call_ids = [str(call.id) for call in calls]
+    return f"{pname}{{pid:{call.thread.process.pid},tid:{call.thread.tid},calls:{{{','.join(call_ids)}}}}}"


I'm seeing incorrect results for the call list, e.g. in the following output there is only one call displayed but four call ids are listed:

$ python -m capa.main tests/data/dynamic/vmray/2f8a79b12a7a989ac7e5f6ec65050036588a92e65aeb6841e08dc228ff0e21b4_min_archive.zip -vv [...] capture screenshot namespace collection/screenshot author [email protected], @_re_fox, [email protected] scope span of calls att&ck Collection::Screen Capture [T1113] mbc Collection::Screen Capture::WinAPI [E1113.m01] span of calls @ mulvpilibfy.exe (C:\Users\8qy2SK\Desktop\mulvpilibfy.exe){pid:7104,tid:7108,calls:{36462,36465,37084,37146}} or: call: and: api: BitBlt @ mulvpilibfy.exe (C:\Users\8qy2SK\Desktop\mulvpilibfy.exe){pid:7104,tid:7108,call:37146} BitBlt( hdc: 0x2a010781, x: 0, y: 0, cx: 1440, cy: 900, hdcSrc: 0x4d010784, x1: 0, y1: 0, rop: 0xcc0020, ) -> ret_val: 1 [...]

And I've encountered other instances where multiple call matches are displayed but only the call id of the last match displayed is listed.

good catch! there's definitely some weirdness happening.

I think that when we collect all the potentially relevant call IDs, we're not validating that they come from branches that evaluated to True. Oops.

Here's my work:

I'll fix this up early next week. Not anticipating this to be a major problem.

e.g., here: https://github.com/mandiant/capa/pull/2532/files#diff-603cfd484a8c3bc11c9b7251492139889b9f2d4c29e1b5a8054b6eac373737a6R339-R340

we should first ensure the node evaluated to true before collecting from the children.

nice example - false negative for the more specific branch GetDC/BitBlt/CreateCompatibleDC.

maybe we need to:

add DISPLAY* to CreateDC

add Gdip routines (GdipCreateBitmapFromScan0, GdipGetImageGraphicsContext, GdipGetDC)

williballenthin added enhancement New feature or request breaking-change introduces a breaking change that should be released in a major version dynamic related to dynamic analysis flavor labels Dec 9, 2024

This comment was marked as resolved.

Sign in to view

williballenthin requested review from mr-tz, mike-hunhoff and yelhamer December 9, 2024 13:39

williballenthin commented Dec 9, 2024

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

williballenthin marked this pull request as draft December 9, 2024 14:34

mr-tz reviewed Dec 9, 2024

View reviewed changes

mike-hunhoff reviewed Dec 9, 2024

View reviewed changes

williballenthin force-pushed the feat/dynamic-sequence-scope branch from d6106ea to 6d05d3c Compare December 10, 2024 12:55

mr-tz reviewed Dec 11, 2024

View reviewed changes

capa/capabilities/dynamic.py Show resolved Hide resolved

mr-tz reviewed Dec 11, 2024

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

mr-tz reviewed Dec 11, 2024

View reviewed changes

tests/test_dynamic_sequence_scope.py Outdated Show resolved Hide resolved

williballenthin force-pushed the feat/dynamic-sequence-scope branch 4 times, most recently from ea9daed to b10d591 Compare December 12, 2024 15:14

mr-tz reviewed Dec 13, 2024

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

mr-tz mentioned this pull request Dec 16, 2024

tmp: update to newscope (placeholder) mandiant/capa-rules#972

Closed

williballenthin force-pushed the feat/dynamic-sequence-scope branch 2 times, most recently from 4683882 to 69f4728 Compare December 16, 2024 15:51

williballenthin force-pushed the feat/dynamic-sequence-scope branch 2 times, most recently from 6887ba8 to 7d409ae Compare January 17, 2025 11:19

github-actions bot previously requested changes Jan 17, 2025

View reviewed changes

williballenthin added 10 commits January 17, 2025 11:20

rd: debugging helper formatting

0002c0b

result: make copy of locations

65d7166

to ensure its not modified by reference after we expect it to be

capabilities: use dataclasses to represent complicated return types

2510d9b

foo

dynamic: add sequence scope

9bae2d8

addresses discussion in mandiant/capa-rules#951 pep8 sequence: add test showing multiple sequences overlapping a single event

sequence: only match first overlapping sequence

88d8ea8

also, for repeating behavior, match only the first instance.

sequence scope: optimize matching

071b698

sequence: documentation and tests

6002a54

sequence: add more tests

sequence: refactor into SequenceMatcher

08ca1d8

contains the call ids for all the calls within the sequence, so we know where to look for related matched. sequence: refactor SequenceMatcher sequence: don't use sequence addresses sequence: remove sequence address

sequence: better collect sequence-related addresses from Range statem…

deb33f2

…ents

sequence: don't update feature locations in place

6039076

pep8

williballenthin force-pushed the feat/dynamic-sequence-scope branch from 7d409ae to 6039076 Compare January 17, 2025 11:59

changelog: add sequence scope

261b384

williballenthin force-pushed the feat/dynamic-sequence-scope branch from 0923bab to 06472c1 Compare January 17, 2025 12:46

williballenthin changed the title ~~dynamic: add sequence scope~~ dynamic: add "span of calls" scope Jan 17, 2025

williballenthin marked this pull request as ready for review January 17, 2025 12:48

williballenthin requested review from a team, mike-hunhoff and mr-tz January 17, 2025 12:48

williballenthin force-pushed the feat/dynamic-sequence-scope branch 3 times, most recently from 32bba98 to 139092a Compare January 17, 2025 12:56

VascoSch92 reviewed Jan 17, 2025

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

rename "sequence" scope to "span of calls" scope

7b3bf0d

pep8 fix ref update submodules update testfiles submodule duplicate variable

williballenthin force-pushed the feat/dynamic-sequence-scope branch from 139092a to 7b3bf0d Compare January 17, 2025 15:19

mike-hunhoff reviewed Jan 17, 2025

View reviewed changes

williballenthin mentioned this pull request Jan 17, 2025

false negative for screenshot mandiant/capa-rules#981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic: add "span of calls" scope #2532

dynamic: add "span of calls" scope #2532

williballenthin commented Dec 9, 2024 •

edited

Loading

This comment was marked as resolved.

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin commented Dec 9, 2024

williballenthin commented Dec 9, 2024

mr-tz left a comment

mr-tz Dec 9, 2024

mike-hunhoff Dec 9, 2024

williballenthin Dec 12, 2024 •

edited

Loading

mr-tz Dec 16, 2024

mr-tz Dec 16, 2024

williballenthin Jan 17, 2025

williballenthin Jan 17, 2025

mike-hunhoff Jan 17, 2025

williballenthin Jan 17, 2025

mike-hunhoff Jan 17, 2025

williballenthin commented Dec 9, 2024

mike-hunhoff left a comment

mr-tz left a comment

williballenthin commented Dec 12, 2024 •

edited

Loading

mr-tz commented Dec 16, 2024 •

edited by williballenthin

Loading

github-actions bot left a comment

mike-hunhoff Jan 17, 2025

williballenthin Jan 17, 2025

williballenthin Jan 17, 2025

williballenthin Jan 17, 2025

		@@ -4,6 +4,8 @@

		### New Features

		- add dynamic sequence scope for matching nearby calls within a thread #2532 @williballenthin

dynamic: add "span of calls" scope #2532

Are you sure you want to change the base?

dynamic: add "span of calls" scope #2532

Conversation

williballenthin commented Dec 9, 2024 • edited Loading

Checklist

This comment was marked as resolved.

williballenthin commented Dec 9, 2024 • edited Loading

williballenthin commented Dec 9, 2024

williballenthin commented Dec 9, 2024

mr-tz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin commented Dec 9, 2024

mike-hunhoff left a comment

Choose a reason for hiding this comment

mr-tz left a comment

Choose a reason for hiding this comment

williballenthin commented Dec 12, 2024 • edited Loading

profile information

before: sequence length: 20

before: sequence length: 0

optimized, sequence length 1 and 20:

conclusion:

mr-tz commented Dec 16, 2024 • edited by williballenthin Loading

github-actions bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin Dec 12, 2024 •

edited

Loading

williballenthin commented Dec 12, 2024 •

edited

Loading

mr-tz commented Dec 16, 2024 •

edited by williballenthin

Loading