Skip to content

Commit

Permalink
Size your shards rewrite
Browse files Browse the repository at this point in the history
  • Loading branch information
thekofimensah committed Jan 17, 2025
1 parent 393ea61 commit b0c1bf3
Showing 1 changed file with 16 additions and 14 deletions.
30 changes: 16 additions & 14 deletions docs/reference/how-to/size-your-shards.asciidoc
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
[[size-your-shards]]
== Size your shards

Each index in {es} is divided into one or more shards, each of which may be
replicated across multiple nodes to protect against hardware failures. If you
are using <<data-streams>> then each data stream is backed by a sequence of
indices. There is a limit to the amount of data you can store on a single node
so you can increase the capacity of your cluster by adding nodes and increasing
the number of indices and shards to match. However, each index and shard has
some overhead and if you divide your data across too many shards then the
overhead can become overwhelming. A cluster with too many indices or shards is
said to suffer from _oversharding_. An oversharded cluster will be less
efficient at responding to searches and in extreme cases it may even become
unstable.
Proper shard sizing is crucial for maintaining the performance and stability of an {es} cluster. _Oversharding_ occurs when data is distributed across an excessive number of shards, which can degrade search performance and make the cluster unstable. Conversely, very large shards may slow down search operations and prolong recovery times after failures. To strike the right balance, the <<shard-size-recommendation,general guidelines>> are to aim for shard sizes between 10GB and 50GB, keeping the per-shard document count below 200 million. To ensure that each node is working optimally, it's important to distribute shards evenly across nodes. Having uneven shard distribution will make some nodes work harder than others leading to performance degradation and potential instability.

Despite these general guidelines, developing a tailored <<create-a-sharding-strategy, sharding strategy>> that considers your specific infrastructure, use case, and performance expectations is essential. Regular monitoring and adjustments based on real-world performance will help ensure optimal cluster health.

[discrete]
[[what-is-a-shard]]
=== What is a shard?

A shard is a basic unit of storage in {es}. Every index is divided into one or more shards to help distribute data and workload across nodes in a cluster. This division allows {es} to handle large datasets and perform operations like searches and indexing efficiently. Here’s how shards work:


* *Data Distribution:* Each shard contains a portion of the data from the index. When you add more nodes to your cluster, {es} will spread the shards across the nodes, balancing the workload between them.
* *Replication:* Shards can have replicas which are copies of the original shard. Replicas ensure data availability and improve search performance by allowing multiple nodes to handle requests.
* *Parallel Processing:* Shards enable {es} to process queries in parallel across nodes, making searches faster and more efficient.

By effectively using shards, {es} can scale horizontally and provide fault tolerance, ensuring your data is distributed and queries are processed efficiently.

[discrete]
[[create-a-sharding-strategy]]
Expand Down Expand Up @@ -208,7 +213,6 @@ index can be <<indices-delete-index,removed>>. You may then consider setting
<<indices-add-alias,Create Alias>> against the destination index for the source
index's name to point to it for continuity.

See this https://www.youtube.com/watch?v=sHyNYnwbYro[fixing shard sizes video] for an example troubleshooting walkthrough.

[discrete]
[[shard-count-recommendation]]
Expand Down Expand Up @@ -572,8 +576,6 @@ PUT _cluster/settings
}
----

See this https://www.youtube.com/watch?v=tZKbDegt4-M[fixing "max shards open" video] for an example troubleshooting walkthrough. For more information, see <<troubleshooting-shards-capacity-issues,Troubleshooting shards capacity>>.

[discrete]
[[troubleshooting-max-docs-limit]]
==== Number of documents in the shard cannot exceed [2147483519]
Expand Down

0 comments on commit b0c1bf3

Please sign in to comment.