Skip to content

Commit

Permalink
automod polish (#456)
Browse files Browse the repository at this point in the history
Bunch of changes here, getting automod to roughly "v0". Intent is to get
this PR merged, then going forward do proper code review of all future
changes.

New here:

- "distinct value" counters (uses redis HyperLogLog; in-memory is
possibly-huge `map[string]bool`
- persist "flags" in redis
- slack webhook notifications for "new mod actions"
- fixes to exiting rules, and disable some trivial examples from
"default" ruleset
- new trivial/example rules, such as GTUBE spam string, counters
- new rule: interaction churn (follow/unfollow)
- new rule: new account reply promo
- helper command to re-process most-recent N posts from an account
- various helper routines in the rules package
  • Loading branch information
bnewbold authored Dec 4, 2023
2 parents a857fdc + 7adf0f8 commit a18c4f4
Show file tree
Hide file tree
Showing 36 changed files with 1,485 additions and 145 deletions.
69 changes: 64 additions & 5 deletions automod/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,77 @@
indigo/automod
==============
`indigo/automod`
================

This package (`github.com/bluesky-social/indigo/automod`) contains a "rules engine" to augment human moderators in the atproto network. Batches of rules are processed for novel "events" such as a new post or update of an account handle. Counters and other statistics are collected, which can drive subsequent rule invocations. The outcome of rules can be moderation events like "report account for human review" or "label post". A lot of what this package does is collect and maintain caches of relevant metadata about accounts and pieces of content, so that rules have efficient access to this information.

A primary design goal is to have a flexible framework to allow new rules to be written and deployed rapidly in response to new patterns of spam and abuse.

Some example rules are included in the `automod/rules` package, but the expectation is that some real-world rules will be kept secret.

Code for subscribing to a firehose is not included here; see `cmd/hepa` for a complete service built on this library.
Code for subscribing to a firehose is not included here; see `../cmd/hepa` for a service daemon built on this package.

API reference documentation can be found on [pkg.go.dev](https://pkg.go.dev/github.com/bluesky-social/indigo/automod).

## Design
## Architecture

Prior art and inspiration:
The runtime (`automod.Engine`) manages network requests, caching, and configuration. Outside calling code makes concurrent calls to the `Process*Event` methods that the runtime provides. The runtime constructs event structs (eg, `automod.RecordEvent`), hydrates relevant context metadata from (cached) external services, and then executes a configured set of rules on the event. Rules may request additional context, do arbitrary local compute, and mute the event with any moderation "actions". After all rules have run, the runtime will inspect the event, update counter state, and push any new moderation actions to external services.

The runtime keeps state in several "stores", each of which has an interface and both in-memory and Redis implementations. It is expected that Redis is used in virtually all deployments. The store types are:

- `automod.CacheStore`: generic data caching with expiration (TTL) and explicit purging. Used to cache account-level metadata, including identity lookups and (if available) private account metadata
- `automod.CountStore`: keyed integer counters with time bucketing (eg, "hour", "day", "total"). Also includes probabilistic "distinct value" counters (eg, Redis HyperLogLog counters, with roughly 2% precision)
- `automod.SetStore`: configurable static string sets. May eventually be runtime configurable
- `automod.FlagStore`: mechanism to keep track of automod-generated "flags" (like labels or hashtags) on accounts or records. Mostly used to detect *new* flags. May eventually be moved in to the moderation service itself, similar to labels


## Rule API

Here is a simple example rule, which handles creation of new events:

```golang
var gtubeString = "XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X"

func GtubePostRule(evt *automod.RecordEvent, post *appbsky.FeedPost) error {
if strings.Contains(post.Text, gtubeString) {
evt.AddRecordLabel("spam")
}
return nil
}
```

Every new post record will be inspected to see if it contains a static test string. If it does, the label `spam` will be applied to the record itself.

The `evt` parameter provides access to relevant pre-fetched metadata; methods to fetch additional metadata from the network; a `slog` logging interface; and methods to store output decisions. The runtime will catch and recover from unexpected panics, and will log returned errors, but rules are generally expected to run robustly and efficiently, and not have complex control flow needs.

Some of the more commonly used features of `evt` (`automod.RecordEvent`):

- `evt.Logger`: a `log/slog` logging interface
- `evt.Account.Identity`: atproto identity for the author account, including DID, handle, and PDS endpoint
- `evt.Account.Private`: when not-null (aka, when the runtime has administrator access) will contain things like `.IndexedAt` (account first seen) and `.Email` (the current registered account email)
- `evt.Account.Profile`: a cached subset of the account's `app.bsky.actor.profile` record (if non-null)
- `evt.GetCount(<namespace>, <value>, <time-period>)` and `evt.Increment(<namespace>, <value>)`: to access and update simple counters (by hour, day, or total). Incrementing counters is lazy and happens in batch after all rules have executed: this means that multiple calls are de-duplicated, and that `GetCount` will not reflect any prior `Increment` calls in the same rule (or between rules).
- `evt.GetCountDistinct(<namespace>, <bucket>, <time-period>)` and `evt.IncrementDistinct(<namespace>, <bucket>, <value>)`: similar to simple counters, but counts "unique distinct values"
- `evt.InSet(<set-name>, <value>)`: checks if a string is in a named set


## Developing New Rules

The current tl;dr process to deploy a new rule:

- copy a similar existing rule from `automod/rules`
- add the new rule to a `RuleSet`, so it will be invoked
- test against content that triggers the new rule
- deploy

You'll usually want to start with both a known pattern you are looking for, and some example real-world content which you want to trigger on.

The `automod/rules` package contains a set of example rules and some shared helper functions, and demonstrates some patterns for how to use counters, sets, filters, and account metadata to compose a rule pattern.

The `hepa` command provides `process-record` and `process-recent` sub-commands which will pull an existing individual record (by AT-URI) or all recent bsky posts for an account (by handle or DID), which can be helpful for testing.

When deploying a new rule, it is recommended to start with a minimal action, like setting a flag or just logging. Any "action" (including new flag creation) can result in a Slack notification. You can gain confidence in the rule by running against the full firehose with these limited actions, tweaking the rule until it seems to have acceptable sensitivity (eg, few false positives), and then escalate the actions to reporting (adds to the human review queue), or action-and-report (label or takedown, and concurrently report for humans to review the action).


## Prior Art

* The [SQRL language](https://sqrl-lang.github.io/sqrl/) and runtime was originally developed by an industry vendor named Smyte, then acquired by Twitter, with some core Javascript components released open source in 2023. The SQRL documentation is extensive and describes many of the design trade-offs and features specific to rules engines. Bluesky considered adopting SQRL but decided to start with a simpler runtime with rules in a known language (golang).

Expand Down
33 changes: 24 additions & 9 deletions automod/account_meta.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,16 @@ type AccountPrivate struct {

// information about a repo/account/identity, always pre-populated and relevant to many rules
type AccountMeta struct {
Identity *identity.Identity
Profile ProfileSummary
Private *AccountPrivate
AccountLabels []string
FollowersCount int64
FollowsCount int64
PostsCount int64
Identity *identity.Identity
Profile ProfileSummary
Private *AccountPrivate
AccountLabels []string
AccountNegatedLabels []string
AccountFlags []string
FollowersCount int64
FollowsCount int64
PostsCount int64
Takendown bool
}

func (e *Engine) GetAccountMeta(ctx context.Context, ident *identity.Identity) (*AccountMeta, error) {
Expand Down Expand Up @@ -71,8 +74,18 @@ func (e *Engine) GetAccountMeta(ctx context.Context, ident *identity.Identity) (
}

var labels []string
var negLabels []string
for _, lbl := range pv.Labels {
labels = append(labels, lbl.Val)
if lbl.Neg != nil && *lbl.Neg == true {
negLabels = append(negLabels, lbl.Val)
} else {
labels = append(labels, lbl.Val)
}
}

flags, err := e.Flags.Get(ctx, ident.DID.String())
if err != nil {
return nil, err
}

am := AccountMeta{
Expand All @@ -82,7 +95,9 @@ func (e *Engine) GetAccountMeta(ctx context.Context, ident *identity.Identity) (
Description: pv.Description,
DisplayName: pv.DisplayName,
},
AccountLabels: dedupeStrings(labels),
AccountLabels: dedupeStrings(labels),
AccountNegatedLabels: dedupeStrings(negLabels),
AccountFlags: flags,
}
if pv.PostsCount != nil {
am.PostsCount = *pv.PostsCount
Expand Down
6 changes: 6 additions & 0 deletions automod/cachestore.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
type CacheStore interface {
Get(ctx context.Context, name, key string) (string, error)
Set(ctx context.Context, name, key string, val string) error
Purge(ctx context.Context, name, key string) error
}

type MemCacheStore struct {
Expand All @@ -34,3 +35,8 @@ func (s MemCacheStore) Set(ctx context.Context, name, key string, val string) er
s.Data.Add(name+"/"+key, val)
return nil
}

func (s MemCacheStore) Purge(ctx context.Context, name, key string) error {
s.Data.Remove(name + "/" + key)
return nil
}
29 changes: 27 additions & 2 deletions automod/countstore.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,20 @@ type CountStore interface {
GetCount(ctx context.Context, name, val, period string) (int, error)
Increment(ctx context.Context, name, val string) error
// TODO: batch increment method
GetCountDistinct(ctx context.Context, name, bucket, period string) (int, error)
IncrementDistinct(ctx context.Context, name, bucket, val string) error
}

// TODO: this implementation isn't race-safe (yet)!
type MemCountStore struct {
Counts map[string]int
Counts map[string]int
DistinctCounts map[string]map[string]bool
}

func NewMemCountStore() MemCountStore {
return MemCountStore{
Counts: make(map[string]int),
Counts: make(map[string]int),
DistinctCounts: make(map[string]map[string]bool),
}
}

Expand Down Expand Up @@ -66,3 +70,24 @@ func (s MemCountStore) Increment(ctx context.Context, name, val string) error {
}
return nil
}

func (s MemCountStore) GetCountDistinct(ctx context.Context, name, bucket, period string) (int, error) {
v, ok := s.DistinctCounts[PeriodBucket(name, bucket, period)]
if !ok {
return 0, nil
}
return len(v), nil
}

func (s MemCountStore) IncrementDistinct(ctx context.Context, name, bucket, val string) error {
for _, p := range []string{PeriodTotal, PeriodDay, PeriodHour} {
k := PeriodBucket(name, bucket, p)
m, ok := s.DistinctCounts[k]
if !ok {
m = make(map[string]bool)
}
m[val] = true
s.DistinctCounts[k] = m
}
return nil
}
40 changes: 40 additions & 0 deletions automod/countstore_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
package automod

import (
"context"
"testing"

"github.com/stretchr/testify/assert"
)

func TestMemCountStoreBasics(t *testing.T) {
assert := assert.New(t)
ctx := context.Background()

cs := NewMemCountStore()

c, err := cs.GetCount(ctx, "test1", "val1", PeriodTotal)
assert.NoError(err)
assert.Equal(0, c)
assert.NoError(cs.Increment(ctx, "test1", "val1"))
assert.NoError(cs.Increment(ctx, "test1", "val1"))
c, err = cs.GetCount(ctx, "test1", "val1", PeriodTotal)
assert.NoError(err)
assert.Equal(2, c)

c, err = cs.GetCountDistinct(ctx, "test2", "val2", PeriodTotal)
assert.NoError(err)
assert.Equal(0, c)
assert.NoError(cs.IncrementDistinct(ctx, "test2", "val2", "one"))
assert.NoError(cs.IncrementDistinct(ctx, "test2", "val2", "one"))
assert.NoError(cs.IncrementDistinct(ctx, "test2", "val2", "one"))
c, err = cs.GetCountDistinct(ctx, "test2", "val2", PeriodTotal)
assert.NoError(err)
assert.Equal(1, c)

assert.NoError(cs.IncrementDistinct(ctx, "test2", "val2", "two"))
assert.NoError(cs.IncrementDistinct(ctx, "test2", "val2", "three"))
c, err = cs.GetCountDistinct(ctx, "test2", "val2", PeriodTotal)
assert.NoError(err)
assert.Equal(3, c)
}
Loading

0 comments on commit a18c4f4

Please sign in to comment.