Life of a Partition

Coherence and Partitioned services (distributed/near caches)

Coherence caches allow you to store data in a highly available manner in the data grid. It provides (among others) replicated and distributed cache topologies. Replicated cache topology stores a replica of the entire data-set on each cluster node running that replicated cache service. While this is good for the latency of read operations as all data access is local, but it comes with tradeoffs. The biggest of this is that the full data-set has to fit into each service member, thus it is not able to scale the total data-set with the number of nodes. The distributed cache topology provides a remedy for this problem.

Distributed cache services organize their data in partitions. Each key is algorithmically mapped to a partition in a way that the mapping algorithm does not depend on the runtime state of the cluster, it only depends on the key and the number of partitions in that partitioned service (partition-count). The algorithm itself is pluggable (you can provide your own key-partitioning strategy). Because of this the partition a key belongs to can only change if the configured partition count or key-partitioning strategy changes, so data in a partition cannot move to another partition. Entries from all caches in the distributed cache service having the same partition id form a partition together.

Each of these partitions has a single owning node, but the ownership of any partitions may change over time, meaning that the partition data is atomically transferred to another storage-enabled member of that distributed cache service, and while a parittion is transferred operations cannot be executed on it.

Now the question is, how much of that occasoinal transfer is visible to our code, can our code react to it or influence it?


Starting with Coherence 3.3, Coherence provided a way to register a PartitionListener. In that version, and in 3.4 thereafter, that PartitionListener was fairly lean, it did not provide too much information, it only was notified, when all copies (primary and all backups) of a partition have been lost due to their containing nodes leaving the service for whatever reason (e.g. due to node or box death or just being kicked out of the cluster).