Skip to content

Scalability Sharding

Explorer edited this page Jun 5, 2018 · 1 revision

Scalability Sharding based on on YouTube: https: www.youtube.com watch?v=p3ytSdUQZzA

Persistence/Storage

Key questions to ask yourself?

MySQL vs NoSQL? Depends on what is the Application requirement?

Data storage is it read optimized or write optimized?

MongoDB is not optimized for write. Only on person can be writing at a time. Changed recently to one write per DB at a time. In general concurrent writes are not allowed and this will result in high latency for writes.

Size of data? Does the data fit in memory or not. Is it a permanent store or is it a copy? REDIS is in memory database. it requires 2 times the data size. REDIS does persist to disk, by taking snapshot every so often. It achieves this by forking a process to copy memory from one process to another. This results in 2 times the data size.

Application Requirements: What can you tolerate? a) Performance ie Latency tolerance? b) Durability i.e data loss tolerance? When a server crashes data may be lost if not written to disk. c) Consistency tolerance? ie weird behavior tolerance d) Availability ie Downtime tolerance? For example lets say u need to update your application can you take down your website while doing this? If you need your website to be up all the time then this will affect your choice for storage.

MySQL has overhead but gives flexibility in how you query

Key Based query has limitation in how you can query, as you can only use the key that you used to insert. If this needs to be changed then you have to redo your whole database.

Safety is slow ( ie durability, Consistency) Consistency - Transactions are needed for consistency, but this is slow. Durability - persisting to disk increases latency. Server does fSync before confirming to client that his data has been saved. This would guarantee that data is saved. REDIS is not a durable data store. It occasionally snapshot to disk, but is not reliable. So should only be used as a cache

How do handle deletions to data???

Be aware of the laws of the country ? For instance Europe is much stricter than US. You need to make sure data is deleted everywhere and there are no copies sitting around. Scaling

You can scale any data store 3 ways to think about scaling a) Replication(think about who is the master, who is the slave in the replica) b) Cache, for example memcached. Downside is we need to think about how stale the data is ? What should be the expiration timer? How should the consistency be, since its a copy of original data? There are also caches on client. What you want to cache and where you want to cache? c) Sharding i.e data across many databases. How are you going to shard date? W c) We could use multiple types of data store like, memcached for static non changing data REDIS for inmemory cache Transaction based database to take care of critical stuff like credit card billing etc.