Scalability and limits #1395
Replies: 2 comments 3 replies
-
Hi Alexander, thanks for dropping us a line! Unless your retention period is very very short, 2 TB/day will be uncomfortable on a single Seq node. The current version does not scale out across nodes, as you noted, though we're actively working on this. If you have existing infrastructure that's using Kibana for metrics-heavy visualization, Seq may not be a great fit - Seq's strength is in diagnostics with structured application logs, where levelling and filtering mean many more use cases come in well under multiple TB/day rates of ingest. RE your other questions, though - data that hasn't been indexed will still be written to disk - the 85 GB won't be held in RAM only. (Seq separately buffers recent data in RAM to speed up queries until indexing is applied.) Indexing in Seq is applied through signals only (it's not inferred by the query planner), so to speed up your Seq and Kibana have some areas of overlap, but they're quite different products, so things might seem quite awkward mapping one across onto the other; I'd still be interested to dig in and understand your use case better, if you think there might be some value in exploring it further (I'm |
Beta Was this translation helpful? Give feedback.
-
Thanks for responding. To be clear, we are mostly not using Kibana for metrics, only structured logging. Metrics go in Prometheus, and we only extract a couple of things from logs, which also feed into Prometheus. We are looking to replace Kibana for structure logging. That is, doing ad-hoc queries against logs for diagnostic purposes. This means the variables (log fields) can be anything, and the user wants to be able to cross-cut against any dimension at query time and get great performance while doing so. Does this match the primary use case of Seq? I'm not sure I understand Seq's indexing. Does this mean that when you create a "signal", Seq goes and retroactively indexes all the data matching that signal? Doesn't that mean there will be a delay unless you assiduously pre-index by creating signals for things like one per application? (We have a few dozen microservices and many cross-cutting fields like host names, shards, etc.) And how will it perform if you don't have a signal? For example, let's say we are experiencing an application problem. A typical Kibana query a user might run would be something like:
Then the user would typically start filtering out noise (like irrelevant debug messages), tweak the timeframe, etc. to get at the problem. Often, Kibana's "View surrounding documents" is super useful to get a window across all all log statements, which can then be filtered filtered to eliminate noise. Being able to chart the data more easily is something we'd also like. Kibana is frankly awful at this, and Seq's ability to easily view any query as a chart looks really useful. Same goes for exporting slices of the log. Edit: Sounds like Seq wouldn't actually be able to deal with the load right now. Do you have a planned release timeline for clustering? |
Beta Was this translation helpful? Give feedback.
-
I'm trying to find out how well Seq scales to large data sizes. We are currently logging roughly 2TB/day (about 85 GB per hour) to Kibana. I can't find any documentation on Seq's limitations or expected performance in such scenarios.
The indexing page says:
Does this really mean that in our case it would log 85GB to RAM before flushing?
I would also like to know more about exactly how Seq indexes the data. If I do a search such as
module = 'ingest' and elapsed > 1000
, how does the query planner break down this search? How are the fields indexed? Does Seq us columnar storage?The documentation also seems to indicate that Seq does not have replication. So it sounds like Seq requires that all data fit on a single node, and queries are never sharded or distributed?
Beta Was this translation helpful? Give feedback.
All reactions