Shard cluster setup

Sharding is a way to horizontally partition a single table across multiple database servers in a shard cluster. Enterprise Replication moves the data from the source server to the appropriate target server as specified by the sharding method. You query a sharded table as if the entire table is on the local server. You do not need to know where the data is. Queries that are performed on one shard server retrieve the relevant data from other servers in a shard cluster. Sharding reduces the index size on each shard server and distributes performance across hardware. You can add shard servers to the shard cluster as your data grows.

Prerequisites

Before you create a shard cluster, the following system must be in place:
  • You must have an Enterprise Replication domain that is composed of two or more nodes.
  • On one of the Enterprise Replication nodes, you must have a table or collection to shard that conforms to the following requirements:
    • The table must have a dedicated column or field for tracking row or document versions.
    • The table cannot include data types that are not supported in sharded queries.
    • The databases on all shard servers must have same locale type.

Shard cluster architecture

Shard servers are uniquely identified by the SHARD_ID configuration parameter that you must set on each shard server. Because shard servers have unique IDs, Enterprise Replication can efficiently communicate between shard servers:

  • Client connections are multiplexed over a common pipe and authenticated only on the local shard server.
  • Sharded queries are run in parallel on all shard servers and their high-availability secondary servers.
  • For insert, update, and delete operations, if you set the USE_SHARDING session environment option, transactions use the two-phase commit protocol to move data to appropriate shard server. Otherwise, changes are moved to appropriate shard server using the eventually consistent model after the transaction is committed. For select operations, if you set the USE_SHARDING session environment option, queries are run on all shard servers in the cluster instead of on only the local database server.
  • The consistency of the sharded table is enforced on all shard servers. Shard servers do not need to transfer table information between each other. Data definition language statements that you run on a sharded table are propagated to all shard servers.