Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goat: repo mst command for displaying MST structure #885

Merged
merged 8 commits into from
Jan 7, 2025

Conversation

devinivy
Copy link
Contributor

@devinivy devinivy commented Dec 20, 2024

This adds a flag --mst to goat's repo inspect command. When passed, the command will output a representation of the repository's MST structure. It may also be used to display the partial MST of a repository, i.e. in the case of a record proof.

As part of this work I exposed the NodeData and TreeEntry types from the mst module, which are sufficient for working with raw MST data.

The output is intended to be a faithful representation of the MST:

  • Siblings items are entries within a single data node.
  • The dots (....) represent the length of the key prefix.
  • The text after the dots is the rest of the key.
  • The hashes in parens are the last 7 characters of the CID pointer represented in base32.
  • Each CID pointer points to either a. a subtree (i.e. another data node) when they appear alone or b. a record block when they appear next to a key.
  • The appears in the output when the block behind a CID pointer is not present, which should not occur when inspecting a full repository but is generally expected when inspecting a proof.

Example:

$ curl 'https://morel.us-east.host.bsky.network/xrpc/com.atproto.sync.getRecord?did=did:plc:l3rouwludahu3ui3bt66mfvj&collection=app.bsky.feed.like&rkey=3jwnnk25rrk2s' > record-proof.car
$ goat repo inspect --mst ./record-proof.car
(lxmb3rm)
├── (cpibfny)
│   ├── (veqbjjq)
│   │   └── (s4zluvy)
│   │       ├── (6qiyhle) ✗
│   │       ├── app.bsky.feed.like/3jqa6sh7bic2a (edo6qsa) ✗
│   │       ├── (xswayqm) ✗
│   │       ├── .....................sfpt3arg22b (glnh3mi) ✗
│   │       ├── (55dhu6q) ✗
│   │       ├── .....................urpqzpats2e (xqygjyq) ✗
│   │       ├── (pblbwl4) ✗
│   │       ├── .....................wnnk25rrk2s (feoom4e)
│   │       └── (3jwkksm) ✗
│   ├── app.bsky.feed.like/3kb4ztebs4324 (hc4ruf4) ✗
│   └── (t4os5oq) ✗
├── app.bsky.feed.post/3kps37ajdyk24 (yk4ymya) ✗
└── (chuh4na) ✗

@bnewbold
Copy link
Collaborator

I like it! And cool how (relatively) little code this is.

A couple high-level notes before I dig in to code and play with it myself:

  • I think this could be a separate command: goat repo mst
  • would be very open to color for this. I don't think we have a CLI color library in go but would be open to adding a small one as a dep
  • the partial CIDs make me double-take. what if we prefixed like cid:lxmb3rm? or maybe in square brackets with elipses [lxmb3rm…]? color might help. maybe a CLI flag/arg to control "full CID" vs "partial CID" (for those of us with big monitors?)
  • I like visually differentiating the partial "key" structure, but don't like the collision that technically a period could appear in MST text (even if that would be invalid at the "atproto repo" layer). we could print the full key (collection/rkey) and use "grey text" vs "bold white" to visually emphasize?

Additional feature brainstorming:

  • any special display for the "commit" object as a node above the "root MST" node?
  • ability to visualize entire tree, but highlight the "proof chain" to a single record (by collection/rkey path)
  • ability to parse a full repo CAR file, but display only the proof chain to a single record (distinct from highlighting within the full tree structure)
  • some kind of JSON output structure

@devinivy
Copy link
Contributor Author

devinivy commented Dec 31, 2024

@bnewbold good thoughts! I think the output should make sense without color, but allow color to enhance it. How do you feel about this iteration on the output? I think this way of displaying the pointers also helps visually call out proof paths.

❯ goat repo mst ./record-proof.car                     
[…lxmb3rm]─◉
├── […cpibfny]─◉
│   ├── […veqbjjq]─◉
│   │   └── […s4zluvy]─◉
│   │       ├── […6qiyhle]─◌
│   │       ├── app.bsky.feed.like/3jqa6sh7bic2a […edo6qsa]─◌
│   │       ├── […xswayqm]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙sfpt3arg22b […glnh3mi]─◌
│   │       ├── […55dhu6q]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙urpqzpats2e […xqygjyq]─◌
│   │       ├── […pblbwl4]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙wnnk25rrk2s […feoom4e]─◉
│   │       └── […3jwkksm]─◌
│   ├── app.bsky.feed.like/3kb4ztebs4324 […hc4ruf4]─◌
│   └── […t4os5oq]─◌
├── app.bsky.feed.post/3kps37ajdyk24 […yk4ymya]─◌
└── […chuh4na]─◌

@devinivy
Copy link
Contributor Author

devinivy commented Dec 31, 2024

Here's an example with full CIDs in the output. I display them on a separate line from the key in this case, otherwise the keys are pretty hard to see.

❯ goat repo mst ./record-proof.car                     
[bafyreihkw5nkomrmsim3ryrwm7lylpsmsy2ekvmg4ko3mw4tafelxmb3rm]─◉
├── [bafyreia3gpr2o3wm5enras7ow36hzctxgpyj4va4ah5uxmfhskycpibfny]─◉
│   ├── [bafyreib74yq5hoqitnshld55t2lyyqilt6tzuds24rakwg33x2bveqbjjq]─◉
│   │   └── [bafyreif3tijkgkxqop33bermbx7eudoyow6xvffsdujwwcnoqs6s4zluvy]─◉
│   │       ├── [bafyreihuzaarhygajwpmcpps54dezxpfs2f3puldt5dhwmmdt2a6qiyhle]─◌
│   │       ├── app.bsky.feed.like/3jqa6sh7bic2a
│   │       │   [bafyreibu6mqlgjzytz5an7qxvuhdiu3zdgfbwdjt2fj6w5xx6loedo6qsa]─◌
│   │       ├── [bafyreidd4t3ktu6kjmojeyllyailcuemrv3amjbxcbkfyvwstbaxswayqm]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙sfpt3arg22b
│   │       │   [bafyreidnckhyuxta6nc2dkmnvwh6mnfl2re25djojfwdgs664yzglnh3mi]─◌
│   │       ├── [bafyreifrzyi3whh6eohov5njqar5rkqxk2yzd7nrm5wtembpx5k55dhu6q]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙urpqzpats2e
│   │       │   [bafyreidbbbu7finfdayb2ocwd7amyckopeaial57yjiit4gfqwmxqygjyq]─◌
│   │       ├── [bafyreicuzwbb52h5sxrmzzw4nhp2e4dqu7s3vfqarvd6r5y37rnpblbwl4]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙wnnk25rrk2s
│   │       │   [bafyreih4jnl2v3ejuihp5vn7ujzcrougx7dju35pfuyrripsqnjfeoom4e]─◉
│   │       └── [bafyreibdd6ewiaq4rxyxvvmqcuguti3aykjjxcleaz3a2kiqdsn3jwkksm]─◌
│   ├── app.bsky.feed.like/3kb4ztebs4324
│   │   [bafyreibj7lie3rnxic2d3umshoruqpg24ngzkoidx7wxswfuambhc4ruf4]─◌
│   └── [bafyreidprpdujgb2fvkcdhaifsj3xory5e2bpv3ufb775ixnulwt4os5oq]─◌
├── app.bsky.feed.post/3kps37ajdyk24
│   [bafyreidreozqllyctjtmylc2igwjccqnd437quvnbcf73746yswyk4ymya]─◌
└── [bafyreidwf4dy3442pdcotkdgpi3yanbyfkzcbk5p3tzad2cyquschuh4na]─◌

Copy link

@rafaelbsky rafaelbsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it pretty cool!

If we ever go for colors, a pattern like this would be cool to keep track of the levels.

We could have an option for downloading repos "on-demand" instead of operating on previously downloaded files. I think that's generally more useful for investigations, etc.

Made a small example:

diff --git a/cmd/goat/repo.go b/cmd/goat/repo.go
index 45501445..a2a0466d 100644
--- a/cmd/goat/repo.go
+++ b/cmd/goat/repo.go
@@ -68,7 +68,7 @@ var cmdRepo = &cli.Command{
 		&cli.Command{
 			Name:      "mst",
 			Usage:     "show repo MST structure",
-			ArgsUsage: `<car-file>`,
+			ArgsUsage: `<at-identifier>`,
 			Flags:     []cli.Flag{},
 			Action:    runRepoMST,
 		},
@@ -210,17 +210,29 @@ func runRepoInspect(cctx *cli.Context) error {
 
 func runRepoMST(cctx *cli.Context) error {
 	ctx := context.Background()
-	carPath := cctx.Args().First()
-	if carPath == "" {
-		return fmt.Errorf("need to provide path to CAR file as argument")
+	username := cctx.Args().First()
+	if username == "" {
+		return fmt.Errorf("need to provide username as an argument")
 	}
-	fi, err := os.Open(carPath)
+	ident, err := resolveIdent(ctx, username)
+	if err != nil {
+		return err
+	}
+
+	// create a new API client to connect to the account's PDS
+	xrpcc := xrpc.Client{
+		Host: ident.PDSEndpoint(),
+	}
+	if xrpcc.Host == "" {
+		return fmt.Errorf("no PDS endpoint for identity")
+	}
+	repoBytes, err := comatproto.SyncGetRepo(ctx, &xrpcc, ident.DID.String(), "")
 	if err != nil {
 		return err
 	}
 
 	// read repository tree in to memory
-	r, err := repo.ReadRepoFromCar(ctx, fi)
+	r, err := repo.ReadRepoFromCar(ctx, bytes.NewReader(repoBytes))
 	if err != nil {
 		return err
 	}

and then goat repo mst divy.zone (or go run . repo mst divy.zone). If we did this then it could make sense to be outside of the repo subcommand.

There are other commands (at least list) that also have this duality of making sense to work out of a downloaded file or directly talking to a PDS. If we like the approach, we could have a more general solution for these cases.

cmd/goat/repo.go Outdated Show resolved Hide resolved
@devinivy devinivy self-assigned this Jan 3, 2025
@devinivy
Copy link
Contributor Author

devinivy commented Jan 3, 2025

Tidied this up a bit.

  • --root flag allows jumping into the tree from a different root.
  • --full-cid flag displays full CIDs in output.
  • supports input from stdin (in addition to the file arg) so you can e.g. pipe a curl for a proof or repo directly to goat repo mst to visualize it.
❯ curl -s 'https://morel.us-east.host.bsky.network/xrpc/com.atproto.sync.getRecord?did=did:plc:l3rouwludahu3ui3bt66mfvj&collection=app.bsky.feed.like&rkey=3jwnnk25rrk2s' | \
… pipe ❯ goat repo mst                         
[…t73n4ly]─◉
├── […3o6msve]─◉
│   ├── […veqbjjq]─◉
│   │   └── […s4zluvy]─◉
│   │       ├── […6qiyhle]─◌
│   │       ├── app.bsky.feed.like/3jqa6sh7bic2a […edo6qsa]─◌
│   │       ├── […xswayqm]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙sfpt3arg22b […glnh3mi]─◌
│   │       ├── […55dhu6q]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙urpqzpats2e […xqygjyq]─◌
│   │       ├── […pblbwl4]─◌
│   │       ├── ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙wnnk25rrk2s […feoom4e]─◉
│   │       └── […3jwkksm]─◌
│   ├── app.bsky.feed.like/3kb4ztebs4324 […hc4ruf4]─◌
│   └── […4skfg44]─◌
├── app.bsky.feed.post/3kps37ajdyk24 […yk4ymya]─◌
└── […igt5xuy]─◌

@devinivy
Copy link
Contributor Author

devinivy commented Jan 6, 2025

@rafaelbsky I appreciate the functionality you're going for there! The main things I miss there is that a. it only works on the live version of a repo, not a snapshot, and b. it wouldn't be able to be used to inspect proofs or firehose event contents.

You may notice that goat repo mst now accepts input from stdin. So another option would be for us to design the commands to pipe together, e.g. goat repo export divy.zone | goat repo mst. In the example above I chain a curl to sync.getRecord, which returns a proof, and pipe it to goat repo mst, which feels pretty great to use. Thoughts?

@rafaelbsky
Copy link

@devinivy That's nice! My intention was that it was an option, I just didn't make it as an option to simplify the example.

But using pipes sounds like the ideal approach 👌

@devinivy devinivy changed the title goat: repo inspect --mst flag for displaying MST structure goat: repo mst command for displaying MST structure Jan 7, 2025
@devinivy devinivy merged commit 5e1b394 into main Jan 7, 2025
8 checks passed
@devinivy devinivy deleted the divy/goat-repo-inspect-mst branch January 7, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants