Use of Sharding in MongoDB
Sharding is the method of storing data records across several machines and it is the approach of MongoDB to meet the data growth requirements. A single machine can not be adequate to store the data or provide a reasonable read and write throughput as the size of the data increases. The problem with horizontal scaling is solved by Sharding. You add more machines to support data growth and the requirements of reading and write operations with sharding.
Why Sharding?
- All writes go to master node in replication
- Sensitive latency queries also go to the master.
- The single replica set is limited to 12 nodes.
- When the active dataset is large, memory can't be big enough
- The local disk is not big enough,
- Vertical scaling is too expensive,
Sharding in MongoDB
There are three main components –
- Shards −Shards are used to store information. They have high availability and accuracy of information. Each shard is a separate replica set within the production environment.
- Config Servers − Config servers store metadata for the cluster. This data includes a mapping of the data collection of the cluster to the shards. This metadata is used by the query router to target operations to individual shards. Sharing clusters have exactly 3 config servers in the production environment.
- Query Routers- Query routers are essentially Mongo instances, client application interface, and direct operations to the required shard. The query router processes and targets shard for the operations and then returns the clients with results. In order to break the client request load, a mutual cluster may contain more than one query router. A client sends requests to a router with one query. Generally, there are many query routers in a shared cluster.