Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically clean up cached bundles directory #5976

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

martintomazic
Copy link
Contributor

@martintomazic martintomazic commented Dec 17, 2024

What

Why

Save on disk usage/ease the maintenance.

How

  1. Regular and detached exploded bundles no longer present in the config, are removed during discovery startup. This way we are not blocking initialization -> Done here go/runtime/bundle: Cleanup bundles on startup #6003
  2. Regular bundles with version lower then active are removed by watching new epochs and checking if bundle registry has bundles lower than the active version for the current epoch.

How to test

e2e

.buildkite/scripts/test_e2e.sh --scenario e2e.runtime.runtime-upgrade

Copy link

netlify bot commented Dec 17, 2024

Deploy Preview for oasisprotocol-oasis-core canceled.

Name Link
🔨 Latest commit f8a7391
🔍 Latest deploy log https://app.netlify.com/sites/oasisprotocol-oasis-core/deploys/67936095aa909e0008f9d442

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 8682e72 to 6e5f668 Compare January 7, 2025 04:52
@martintomazic martintomazic changed the title Martin/feature/cached bundles clean up Periodically clean up cached bundles directory Jan 7, 2025
martintomazic

This comment was marked as outdated.

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 2 times, most recently from 9ef3941 to 0869adc Compare January 10, 2025 03:27
martintomazic

This comment was marked as outdated.

@kostko kostko linked an issue Jan 10, 2025 that may be closed by this pull request
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 3 times, most recently from c8ded6d to f3f52e3 Compare January 10, 2025 19:20
@martintomazic martintomazic marked this pull request as ready for review January 10, 2025 19:27
Copy link
Member

@ptrus ptrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't dive too deeply into the overall logic, but it looks good based on an initial look! I left a couple of minor comments on the code.

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from f3f52e3 to 4ed1a32 Compare January 11, 2025 22:05
@martintomazic
Copy link
Contributor Author

martintomazic commented Jan 13, 2025

// and cached bundles (that are guaranteed to be exploded) to the registry.
func (d *Discovery) Init() error {
	// Consolidate all bundles in one place, which could be useful
	// if we implement P2P sharing in the future.
	if err := d.copyBundles(); err != nil {
		return err
	}

	// Add copied and cached bundles (that are guaranteed to be exploded)
	// to the registry.
	if err := d.Discover(); err != nil {
		return err
	}

I think will will either have 1. stop copying bundles configured via legacy path or 2. block at init time for the cleanup.

With current design, even if you remove the bundle from the config (bundle path), it was previousy copied as part of d.copyBundle. Meaning any subsequent reboot would call Discovered (even if no longer configured) i.e. add it to the registry. This is done at initialization but is blocking.

Unless we do cleanup before that (we don't as we would block further with cleanup?), you cannot know after that (GetConfiguredRuntimeIDs function) which one is stale and which one is not.

Update this actually has a further implication:

I have confirmed rn the master has a bug. Concretely, whenever your run a node configured via deprecated runtime/paths or use new runtime.Runtimes, the bundle is copied to the bundle dir. Now, no matter if you remove the bundle from the configuration, Discover will still find it and thus adding the runtime to the registry, which will be then also displayed by oasis-node control status.

Update of update
Fixed here: #6003

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from dbddfa5 to 8bfe9f6 Compare January 14, 2025 11:15
go/runtime/bundle/manifest.go Outdated Show resolved Hide resolved
go/runtime/bundle/helper_test.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery_test.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery_test.go Outdated Show resolved Hide resolved
go/runtime/bundle/discovery_test.go Outdated Show resolved Hide resolved
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 2 times, most recently from 64a7aaa to 622d0bd Compare January 14, 2025 11:57
@martintomazic
Copy link
Contributor Author

martintomazic commented Jan 14, 2025

One last think do we want e2e test for both my bug (nil runtimeIDs ) and the one currently in the master.

While the latter doesn't bother me that much, I find it a bit scary that tests pass even when #5976 (comment).

I can confirm thought that e.g. if I delete bundles too early (when I receive new runtime descriptor) as was the case here, the runtime was actually suspended so test failed. This is good.

Update:
The test is now failing as it should. It was false positive before due to discovery bug interleaving (fixed here 562ab5f)

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 9cc236e to 3d4b25d Compare January 14, 2025 17:34
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 84.69388% with 15 lines in your changes missing coverage. Please review.

Project coverage is 65.16%. Comparing base (6355ca3) to head (2310fd6).

Files with missing lines Patch % Lines
go/runtime/bundle/manager.go 78.43% 7 Missing and 4 partials ⚠️
go/runtime/bundle/manifest.go 75.00% 1 Missing and 1 partial ⚠️
go/runtime/bundle/registry.go 92.30% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5976       +/-   ##
===========================================
+ Coverage        0   65.16%   +65.16%     
===========================================
  Files           0      631      +631     
  Lines           0    64508    +64508     
===========================================
+ Hits            0    42036    +42036     
- Misses          0    17548    +17548     
- Partials        0     4924     +4924     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 3d4b25d to cba543b Compare January 14, 2025 23:04
go/runtime/registry/config.go Outdated Show resolved Hide resolved
go/runtime/registry/config.go Outdated Show resolved Hide resolved
go/runtime/registry/config.go Outdated Show resolved Hide resolved
.changelog/5737.feature.md Outdated Show resolved Hide resolved
go/runtime/bundle/bundle.go Outdated Show resolved Hide resolved
go/runtime/registry/registry.go Outdated Show resolved Hide resolved
go/runtime/registry/registry.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
return
default:
if v.Less(active) {
r.logger.Info("Removing bundle with version lower then active",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would first log that we are removing all that have version less than active, and then in the for loop log for every bundle that is removed. This way, you see that we tried to removed bundles, but nothing was needed to be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is desirable since this function is called everytime an epoch changes. I would prefer logging if we do an actual removal? Anyways let's see how this changes once rebase on top of manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epoch are every 1h. You can log as Info almost whatever you want. This way you can be sure that background tasks are triggered, even though they do nothing as nothing is upgraded.

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from cba543b to 5197ed3 Compare January 16, 2025 12:30
@peternose
Copy link
Contributor

if I delete bundles too early

If you fetch the current epoch, read active version for that epoch, and delete all previous versions, the bundles should not be deleted too early.

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 5197ed3 to c405417 Compare January 17, 2025 15:55
go/runtime/bundle/manager.go Outdated Show resolved Hide resolved
go/runtime/bundle/manager.go Outdated Show resolved Hide resolved
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from c405417 to 3b4eb79 Compare January 17, 2025 16:22
@martintomazic
Copy link
Contributor Author

If you fetch the current epoch, read active version for that epoch, and delete all previous versions, the bundles should not be deleted too early.

Correct. I was just confirming I made sure my e2e test is failing when clean-up was not implemented as you write above. This was happening here: #5976 (comment) when I initially mis-understood the registry updates, thus deleting things to early. :)

@martintomazic
Copy link
Contributor Author

Freshly rebased on top of #6003. Should be ready for a second round of reviews. :)

go/common/version/version_test.go Show resolved Hide resolved
go/common/version/version.go Outdated Show resolved Hide resolved
go/runtime/bundle/bundle.go Outdated Show resolved Hide resolved
go/runtime/bundle/bundle.go Outdated Show resolved Hide resolved
go/runtime/bundle/bundle.go Outdated Show resolved Hide resolved
return sc.RunTestClientAndCheckLogs(ctx, childEnv)
}

func ensureCorrectBundlesDir(logger *logging.Logger, workerName, workerDir string) error {
logger.Info("ensuring cached exploded bundle for version 0.0.0 was removed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Info("ensuring cached exploded bundle for version 0.0.0 was removed",
logger.Info("verifying exploded bundle directories")

You don't need to be so specific.

Copy link
Contributor Author

@martintomazic martintomazic Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still within a line? Some extra details don't help especially when test actually fails...

if up := r.updateActiveDescriptor(ctx); up && !activeInitialized {
close(r.activeDescriptorCh)
activeInitialized = true
}

// Trigger clean-up for bundles less than active version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to move this code and the code below that triggers downloads to the manager once that is possible (i.e. when there are no more dependencies). Maybe this is already possible 🤔 However, in the future the manager should register to active and registry descriptor events (maybe only to the latter), and trigger cleanup and download when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so you mean that discovery is no longer started via runtime.registry but instead directly as a background service where we start common workers?

However, in the future the manager should register to active and registry descriptor events (maybe only to the latter)

And new epochs I assume? I think with ActiveVersion(), together with registry descriptor it would suffice.

Would prefer to the the refactor in the follow-up unless we find quick consensus here :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will move in another PR, just mentioning here.

// CleanStaleBundles removes outdated manifest hashes and deletes corresponding
// exploded bundles for runtimes in the clean-up queue.
func (m *Manager) CleanStaleBundles() {
m.logger.Info("removing regular bundles with version less than active")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m.logger.Info("removing regular bundles with version less than active")
m.logger.Info("cleaning bundles")

If you are mentioning active version, one would like to know what the active version is. The better way is to just make a simple log, so that we know that the cleanup is triggered. Could also add a similar message downloading bundles to Download.

go/runtime/bundle/manager.go Outdated Show resolved Hide resolved
go/runtime/bundle/registry.go Outdated Show resolved Hide resolved
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 3 times, most recently from da9d9aa to 83a760b Compare January 20, 2025 23:41
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 2 times, most recently from 12de758 to 7474376 Compare January 22, 2025 12:45
// bundle removed from its bundles dir.
for _, worker := range sc.Net.ComputeWorkers() {
if err := sc.verifyBundleDir(ctx, worker); err != nil {
sc.Logger.Error("compute worker bundle dir clean-up error",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log could be omitted as you already log in verifyBundleDir, and the error will be logged anyway.

if err != nil {
return err
}
// if n := len(entries); n != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncomment.

return fmt.Errorf("%s is not a dir", entry.Name())
}

// Ensure exploded cached bundle is for the latest version (0.1.0).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts the logged info comment above, which should be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ensuring cached exploded bundle for version 0.0.0 was removed" ?

Up it says was removed for 0.0.0 here that that is left for 0.1.0?

Happy to simplify this, agree may be weird?

// Fetch registry descriptor.
rt, err := sc.Net.Controller().Registry.GetRuntime(ctx, &registry.GetRuntimeQuery{
Height: consensus.HeightLatest,
ID: sc.Net.Runtimes()[sc.upgradedRuntimeIndex].ID(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use key value runtime id.

return fmt.Errorf("failed to unmarshal dir name to hash")
}
if !want.Equal(&got) {
return fmt.Errorf("unexpected exploded bundle hash: want %v, got %v", want, got)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could be improved, to say that the folder name is not correct or that folder content failed to verify, as this is what we are testing.

@@ -19,14 +19,20 @@ import (
rtConfig "github.com/oasisprotocol/oasis-core/go/runtime/config"
)

// explodedManifest is manifest with corresponding exploded bundle dir.
type explodedManifest struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this would be public, like ExplodedComponent.

regularManifests map[common.Namespace]map[version.Version]*Manifest
components map[common.Namespace]map[component.ID]map[version.Version]*ExplodedComponent
notifiers map[common.Namespace]*pubsub.Broker
explodedManifests map[hash.Hash]explodedManifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All manifests are exploded, so this renaming is not needed.

r.mu.RLock()
defer r.mu.RUnlock()
var manifests []*Manifest
for _, m := range r.explodedManifests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See maps (and slices ) lib to optimize this.

defer r.mu.Unlock()
explManifest, ok := r.explodedManifests[hash]
if !ok {
return "", fmt.Errorf("missing manifest with hash %s", hash.Hex())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return false, this is not an error.

for _, c := range explManifest.manifest.Components {
delete(r.components[runtimeID][c.ID()], c.Version)
}
return explManifest.explodedDir, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Manifests() returns []*ExplodedManifest, you don't need this.

@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch 2 times, most recently from 2310fd6 to 7f1fb9b Compare January 23, 2025 12:00
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 7f1fb9b to 43c5a11 Compare January 23, 2025 14:12
@martintomazic martintomazic force-pushed the martin/feature/cached_bundles_clean-up branch from 43c5a11 to f8a7391 Compare January 24, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Periodically clean up cached bundles directory
3 participants