Cache series articles – no bottom problem

I, background

1. What is a cache no bottom problem:

Facebook’s staff responded to 3,000 Memcached nodes in 2010, saving thousands of cache. They found a problem -Memcached’s connection efficiency dropped, and then add a memcached node, after adding, did not improve. “No bottom hole” phenomenon

2. Causes of cache no bottom:

Key value database or cache system, since the Hash function is usually mapped to the corresponding instance, causing the distribution of Key to the corresponding instance, but due to the amount of data, the amount of access, it is necessary to use distributed (whether client consistency¹þ ÐÔ, redis-cluster, CODIS, batch operations, such as batch acquisition, multiple Key (such as Redis MGET operations), typically need to get a KEY value from different instances, only related to a single network operation, distributed Batch operations involve multiple network IOs.

3. The hazards brought by no bottomless problems:

(1) The client once a batch operation involves multiple network operations, which means that the bulk operation will increase as the example increases, and it will grow increasing.

(2) The number of server networks has changed, and the performance of the instance has a certain impact.

4 Conclusion:

Summary with a popularity: More machines do not represent more performance, the so-called “no bottom” means that the more investment does not necessarily output.

Distributed is not avoidable, because our website visits and data volume are getting bigger and bigger, and an instance is unable to pit, so how to efficiently obtain data in distributed cache and storage batch is a difficult point.

Second, hash store and sequential storage

In distributed storage products, have two important data storage and distribution methods in distributed storage products. Both ways are different, and the batch acquisition data is different, so the distribution of these two data is required. Significant description:

Hash distribution:

Hash distribution is applied to most Key-Value systems, such as Memcache, Redis-Cluster, TWEMPROXY, even if Mysql is scheduled to branch in the library, often use User% 100.

The main role of the HASH distribution is to distribute the key to each machine, so a feature is that the data dispersion is high, and the implementation method is usually the integer of Hash (key) and the distributed node is mapped. As an REDIS-Cluster as an example:

Question: There is no relationship with the business, and no range query is supported.

2. Ordinance

3. Comparison of two distribution methods:

Distribution method Features typical product hash distribution 1. Data dispersion 2. Key value distribution and business unrelated 3. Unable to access

4. Support bulk operation

Consistency Haschi MemcachereDisCluster Other Cache Product Sequence Distribution 1. Data Dispatching Easy Tilting 2. Key Value Distribution and Business Related 3. Can be accessed

4. Support bulk operation


Third, distributed cache / storage four MGET solutions

1. Optimized ideas of IO:

(1) The efficiency of the command itself: such as SQL optimization, command optimization

(2) Number of networks: Reduce the number of communication

(3) Reduce access costs: long connected / connection pool, NIO, etc.

(4) IO Access Merge: O (N) to O (1) Process: Batch Interface (MGET),

2. If only consider reducing the number of networks, MGET will have the following model:

3. Four solutions:

(1). Serial MGET

Split MGET Operation (N Key) is successively executed by the N times, it is clear that this operation time is high, its operation time n times network time + n times command time, the number of networks is N, which is obvious This solution is not optimal, but it is simple enough.

(2). Serial IO

To operate the MGET operation (N Key), use the known Hash function to calculate the node corresponding to the key, so you can get such a relationship: Map , is some Keys corresponding to each node.

Its operating time NODE network time + n times command time, the number of networks is Node number, it is obvious that this solution is much better than the first, but if the number of nodes is enough, there is a certain performance problem.

(3). Parallel IO

This solution is to change the last step in the program (2), although the number of networks is still nodes.size (), but the network time becomes O (1), but this solution increases the complexity of the programming.

Its operation time 1 network time + n times command time

(4). Hash-tag implementation.

The second section mentioned that because the Hash function caused to burn into each node randomly, it can force some keys to specify the node to the specified node?

Redis provides such a function called haveh-tag. What does that mean? If we use the Redis-Cluster (10 REDIS nodes), we now have 1000 kV, then follow the Hash function (CRC16) rules, this 1000 key will be dispersed to 10 nodes, then time complicated Degree or the above (1) ~ (3), then we can not take all the keys like the use of stand-alone Redis? Hash-tag provides such a function if the above-described Key is changed as the following, that is, the same content is enclosed in braces, then these keys will go to a specified node.


User1, user2, user3 … user1000 {user} 1, {user} 2, {user} 3 ……. {user} 1000

For example, the following figure: Its operation time 1 network time + n times command time

3. Four batch operation solutions comparison:

Program Advantages Disadvantages Network IO Serial MGET1. Programming Simple 2. Small KEYS, Performance Meeting Requirements Requirements Delayed O (Keys) Serial IO1. Programming Simple 2. Luminous Node, Performance Meeting Durable Node Delayed O (Nodes) Parallel IO1. Using Parallel Features 2. Delay depends on the slowest node 1. Programming complex 2. Difficult O (Max_Slow (Node)) Hash Tags performance The highest performance is high. TAG-Key business maintenance costs 2.TAG distribution Easy data tilt o (1)

Fourth, summary and suggestion

There is a certain impact on resources and performance, but most systems don’t need to consider this problem, because

1. 99% of the company’s data and traffic cannot be compared to Facebook.

2. Disdis / Memcache distributed cluster usually in accordance with the project group, with our experience, generally does not exceed 50 pairs of main.

So here just provides an optimized idea, open a vision.

Five, reference

Facebook’s Memcached MultiGet Hole: More Machines! More Capacity

MultiGet’s non-bottom problem

Again MultiGet Hole (no bottom)