chore(metastore): Update metastores after Flushing a dataobj #15883

benclive · 2025-01-22T16:47:58Z

What this PR does / why we need it:

Adds support for building and updating a metastore component after pushing.
Relies on a fork of thanos-io/objstore which introduces a GetAndReplace method - this does the conditional writes and will fail if the underlying object changes before the write is processed.
I added a few more metrics so I can see how this performs over time. I'm currently using a 12 hour window per metastore file but I'm not sure if this is going to get too big and slow down the update process over time.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Reintroduce a more efficient sorting implementation for the logs section: * Log records across streams are accumulated into different sets of builders. * At encode time, each stream is sorted by timestamp, and then all streams are combined and encoded into a final data set. As the set of streams resets between encodes, column builders are pooled to easily reuse them after an encode.

…consumer

This reverts commit 223df08, reversing changes made to 9fd28d1.

benclive · 2025-01-22T16:49:08Z

pkg/dataobj/dataobj.go

@@ -181,6 +183,45 @@ func NewBuilder(cfg BuilderConfig, bucket objstore.Bucket, tenantID string) (*Bu
 	}, nil
 }

+// FromExisting updates this builder with content from an existing data object, replicating all the state like stream IDs and logs.
+func (b *Builder) FromExisting(f io.ReadSeeker) error {


I'm not a massive fan of this method, and I think it could be made more efficient. I want to test it out before committing to any improvements here though.

rfratto

🎉

I'm not sure about this approach long-term (data objects aren't very friendly to appends after they've already been written), but I don't want to block this moving forward.

I left a few small comments about package/API design. Not anything that needs to get addressed right this moment in the prototyping phase, though, but probably something that should be addressed prior to merging into main.

rfratto · 2025-01-22T21:28:13Z

pkg/dataobj/dataobj.go

-		}
-		b.state = builderStateFlush
+func (b *Builder) Flush(ctx context.Context) (string, error) {
+	_, err := b.FlushToBuffer()


By the way, this changes the behaviour of Flush where calling Flush immediately after a successful flush will cause it to re-write the same object.

I've moved Reset back into this method now I'm returning the FlushResult summary. Does that fix the issue?

rfratto · 2025-01-22T21:41:05Z

pkg/dataobj/consumer/partition_processor.go

+		err = p.writeMetastores(backoff, dataobjPath)
+		if err != nil {
+			level.Error(p.logger).Log("msg", "failed to write metastores", "err", err)
+			return
+		}
+
+		// Reset builder after flushing & storing in metastore
+		p.builder.Reset()


It seems like we made a few changes where logic gets moved to the processor:

Resetting the state after a flush

Digging into the state of the builder before writing the metastore index

Splitting out flushing to object storage and flushing to a buffer

I think we can allow the metastore builder to have access to everything it needs without changing any of the above by returning some kind of report/summary on a successful call to Flush:

package dataobj // Flush flushes all buffered data to object storage. Calling Flush can result // in a no-op if there is no buffered data to flush. // // If Flush builds an object but fails to upload it to object storage, the // built object is cached and can be retried. [Builder.Reset] can be called to // discard any pending data and allow new data to be appended. // // On a successful flush, a summary describing what was flushed is included. func (b *Builder) Flush() (Summary, error) // A Summary summarizes the data included in a flush from a [Builder]. type Summary struct { ObjectPath string // Object storage path that was flushed to. Streams []Stream // Streams included in the flush. } // A Stream is an individual stream within a data object. type Stream struct { // (Copy or subset of streams.Stream to avoid exposing internal API in an external package) }

Ah, this is a nice idea, thank you! This logic evolved from trying to figure out what I needed from the dataobj so I never stepped back to figure out a nicer way to organise it. I'll give this a go!

rfratto · 2025-01-22T21:47:20Z

pkg/dataobj/consumer/partition_processor.go

@@ -184,3 +202,78 @@ func (p *partitionProcessor) processRecord(record *kgo.Record) {
 		}
 	}
 }
+
+func (p *partitionProcessor) writeMetastores(backoff *backoff.Backoff, dataobjPath string) error {


Should we have some kind of dataobj/metastore package that's responsible for building and operating on metastores?

…euse stream objects

benclive · 2025-01-27T14:25:19Z

pkg/dataobj/dataobj.go

-		}
-		b.state = builderStateFlush
+func (b *Builder) Flush(ctx context.Context) (string, error) {
+	_, err := b.FlushToBuffer()


I've moved Reset back into this method now I'm returning the FlushResult summary. Does that fix the issue?

benclive · 2025-01-27T14:35:40Z

pkg/dataobj/internal/sections/streams/streams.go

+	s.Rows = 0
+}
+
+var streamPool = sync.Pool{


this pool added about 10% more ops in the benchmark when re-using dataobjs between metastores. In reality we won't be reusing Streams very often (once per flush), so it might not be worth keeping this pool as the Stream objects will likely be deallocated between runs.
WDYT?

benclive · 2025-01-27T14:37:08Z

pkg/dataobj/internal/sections/streams/streams.go

@@ -61,10 +78,36 @@ func New(metrics *Metrics, pageSize int) *Streams {
 	return &Streams{
 		metrics:  metrics,
 		pageSize: pageSize,
-		lookup:   make(map[uint64][]*Stream),
+		lookup:   make(map[uint64][]*Stream, 1024),


This yielded a 10-20% speed up over the baseline. We're not likely to have this many Streams in a metastore but it would be worth it for the logs dataobj, I think.

rfratto added 15 commits January 20, 2025 12:37

Merge branch 'dataobj-compression-ratio-and-final-size' into dataobj-…

b4278d6

…consumer

Merge branch 'dataobj-compression-ratio-and-final-size' into dataobj-…

9fd28d1

…consumer

Merge branch 'dataobj-logs-sort' into dataobj-consumer

223df08

Revert "Merge branch 'dataobj-logs-sort' into dataobj-consumer"

3769350

This reverts commit 223df08, reversing changes made to 9fd28d1.

Merge branch 'main' into dataobj-consumer

e68af68

chore(dataobj): reintroduce sorting [wip]

a643a04

Merge branch 'dataobj-log-batches' into dataobj-consumer

7997dcb

chore(dataobj): pool zstd readers

13b6404

chore(dataobj): compress intermediate stripes faster

684010d

chore(dataobj): pass compression options

af79870

chore(dataobj): stop mergeStripes iter on return

3bfef7d

chore(datobj): stop pull iter on defer in dataset.Iter

aad0c8e

chore(dataobj): fix memory leak in slice usage

d3f7d90

chore(dataobj): provide package for bucketed buffer pools

0ad0d5e

benclive requested a review from a team as a code owner January 22, 2025 16:47

pull-request-size bot added the size/L label Jan 22, 2025

benclive commented Jan 22, 2025

View reviewed changes

cyriltovena and others added 5 commits January 22, 2025 16:50

Build and update metastore objects on flush

ecb5d86

Sort labels when recreating Builder

3fe3fd4

Instrument & improve interface

226f515

method names and error cases

b1825d7

naming

1800951

benclive force-pushed the dataobj-comsumer-metastore branch from 772f4b1 to 1800951 Compare January 22, 2025 16:51

rfratto force-pushed the dataobj-consumer branch 2 times, most recently from ad68a80 to 4961d28 Compare January 22, 2025 20:57

rfratto requested review from periklis, xperimental and JoaoBraveCoding as code owners January 22, 2025 20:57

rfratto force-pushed the dataobj-comsumer-metastore branch from 1800951 to be09cfb Compare January 22, 2025 21:07

rfratto removed request for xperimental, periklis and JoaoBraveCoding January 22, 2025 21:08

rfratto reviewed Jan 22, 2025

View reviewed changes

benclive added 5 commits January 23, 2025 18:04

Optimize metastore builder: Reuse builder, efficiently pass labels, r…

3e59b90

…euse stream objects

Update to latest objstore fork

16c29be

Set correct size of buffers

d1c8ee2

Merge branch 'dataobj-consumer' into dataobj-comsumer-metastore

82c2c31

cleanup

1b4633a

benclive force-pushed the dataobj-comsumer-metastore branch from be09cfb to 1b4633a Compare January 24, 2025 10:49

pull-request-size bot added size/XXL and removed size/L labels Jan 24, 2025

cleanup merge artifacts

b978510

pull-request-size bot added size/L and removed size/XXL labels Jan 24, 2025

Refactor

8188ee7

pull-request-size bot added size/XL and removed size/L labels Jan 24, 2025

benclive added 7 commits January 24, 2025 15:02

Adjust memory usage

e49df95

syntax

d3eaabb

Lazy initialise the builder

ee473a0

Fix init bug

0b5be06

Only store dataobj min/max timestamp & path in metastore

80ee08e

adjust optimizations

abcf9c7

remove unused func

a829763

benclive commented Jan 27, 2025

View reviewed changes

benclive changed the title ~~Dataobj comsumer metastore~~ chore(metastore): Update metastores after Flushing a dataobj Jan 27, 2025

benclive merged commit 6a3c455 into dataobj-consumer Jan 28, 2025
43 of 59 checks passed

benclive deleted the dataobj-comsumer-metastore branch January 28, 2025 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(metastore): Update metastores after Flushing a dataobj #15883

chore(metastore): Update metastores after Flushing a dataobj #15883

benclive commented Jan 22, 2025

benclive Jan 22, 2025

rfratto left a comment

rfratto Jan 22, 2025

benclive Jan 27, 2025

rfratto Jan 22, 2025 •

edited

Loading

benclive Jan 24, 2025

rfratto Jan 22, 2025

cyriltovena Jan 23, 2025

benclive Jan 27, 2025

benclive Jan 27, 2025

benclive Jan 27, 2025

chore(metastore): Update metastores after Flushing a dataobj #15883

chore(metastore): Update metastores after Flushing a dataobj #15883

Conversation

benclive commented Jan 22, 2025

Choose a reason for hiding this comment

rfratto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratto Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratto Jan 22, 2025 •

edited

Loading