环境准备

Ubuntu-18.04 \
redis-5.0.0 \
make

集群分布

redis-1: 172.16.10.124 master:7001 7002 7003 slave: 7004 7005 7006 \
redis-2: 172.16.10.125 slave:7001 7002 7003

编译安装

以下步骤在redis-1进行

1. 新建编译安装目录

mkdir -p /home/dsp/redis/
cd /home/dsp/redis
mkdir redis-cluster redis-dist
cd redis-cluster
mkdir 7001 7002 7003 7004 7005 7006

2. 下载redis5.0源码修改编译

cd /home/dsp/redis/
wget http://download.redis.io/releases/redis-5.0.0.tar.gz
tar xzvf redis-5.0.0.tar.gz 
mv redis-5.0.0 redis-build
cd redis--build
vim src/Makefile
# PREFIX?=/usr/local
PREFIX?=/home/dsp/redis/redis-dist

3. 编译安装

cd /home/dsp/redis/redis-build
make && make install

节点配置

4. 修改配置文件

cd /home/redis/redis-cluster/7001/
cp ../../redis-build/redis.conf .
vim redis.conf
bind 172.16.10.124                              ##  改为本机ip,其他机器可以访问,否则无法创建集群
#protected-mode yes                             ##  保护模式,限制从其它机器登录redis-server,只能从127.0.0.1
port 7001                                       ##  redis客户端连接端口,同时会启动一个大于10000的端口用于主从复制和集群内部通信
daemonize yes                                   ##  redis后台运行
pidfile /var/run/redis_7001.pid                 ##  pidfile位置,daemonize为yes时才会生效
logfile "redis_7001.log"                        ##  redis日志文件,可包含目录和文件名,redis不会自动滚动日志文件
save 900 1                                      ##  刷新快照(RDB)到磁盘的策略,根据实际调整值,“save 900 1”表示900秒后至少有一个key被修改才触发save操作,其他类推
#save 300 10                                    ##
#save 60 10000                                  ##
appendonly yes                                  ##  当同时写AOF或RDB,则redis启动时只会加载AOF,AOF包含了全量数据。如果当队列使用,入队压力过大的话,可设置为no
cluster-enabled yes                             ##  以集群方式运行,no,表示以非集群方式运行
cluster-config-file nodes-7001.conf             ##  不能包含目录,存放在对应的节点目录(7001,7002....)下,纯文件名,为redis-server进程自动维护,不能手动修改
cluster-node-timeout 15000                      ##  毫秒,判断节点失效(fall)之前,允许不可用的最大时长,如果master不可用时长超过此值,则会被failover,这个值不能太小

注意: 将7001下刚修改的配置文件分别复制到7002,7003,7003,7004,7005,7006,注意一定要修改对应端口号

启动节点

5. 须切换到各节点目录然后启动

cd /home/dsp/redis/redis-cluster/7001
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7001节点
cd ../7002
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7002节点
cd ../7003
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7003节点
cd ../7004
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7004节点
cd ../7005
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7005节点
cd ../7006
../../redis-dist/bin/redis-server redis.conf    ##  启动redis-7006节点

检查redis启动情况

6. 查看redis线程

ps -ef | grep redis

[email protected]:~/redis/redis-cluster/7006$ ps -ef | grep redis
dsp        1391      1  0 Mar14 ?        00:08:18 ../../redis-dist/bin/redis-server 172.16.10.124:7003 [cluster]
dsp        1407      1  0 Mar14 ?        00:08:18 ../../redis-dist/bin/redis-server 172.16.10.124:7005 [cluster]
dsp       58160      1  0 09:27 ?        00:00:00 ../../redis-dist/bin/redis-server 172.16.10.124:7001 [cluster]
dsp       58170      1  0 09:28 ?        00:00:00 ../../redis-dist/bin/redis-server 172.16.10.124:7002 [cluster]
dsp       58178      1  0 09:28 ?        00:00:00 ../../redis-dist/bin/redis-server 172.16.10.124:7004 [cluster]
dsp       58195      1  0 09:28 ?        00:00:00 ../../redis-dist/bin/redis-server 172.16.10.124:7006 [cluster]
dsp       58204  57762  0 09:28 pts/0    00:00:00 grep --color=auto redis

7. 查看redis端口监听

netstat -tunpl | grep redis

[email protected]:~/redis/redis-cluster/7006$ netstat -tunpl | grep redis
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 172.16.10.124:17001     0.0.0.0:*               LISTEN      58160/../../redis-d 
tcp        0      0 172.16.10.124:17002     0.0.0.0:*               LISTEN      58170/../../redis-d 
tcp        0      0 172.16.10.124:17003     0.0.0.0:*               LISTEN      1391/../../redis-di 
tcp        0      0 172.16.10.124:17004     0.0.0.0:*               LISTEN      58178/../../redis-d 
tcp        0      0 172.16.10.124:17005     0.0.0.0:*               LISTEN      1407/../../redis-di 
tcp        0      0 172.16.10.124:17006     0.0.0.0:*               LISTEN      58195/../../redis-d 
tcp        0      0 172.16.10.124:7001      0.0.0.0:*               LISTEN      58160/../../redis-d 
tcp        0      0 172.16.10.124:7002      0.0.0.0:*               LISTEN      58170/../../redis-d 
tcp        0      0 172.16.10.124:7003      0.0.0.0:*               LISTEN      1391/../../redis-di 
tcp        0      0 172.16.10.124:7004      0.0.0.0:*               LISTEN      58178/../../redis-d 
tcp        0      0 172.16.10.124:7005      0.0.0.0:*               LISTEN      1407/../../redis-di 
tcp        0      0 172.16.10.124:7006      0.0.0.0:*               LISTEN      58195/../../redis-d

现在六个节点是相互独立,下一步创建集群

创建集群

8. 使用redis-cli命令行工具,将6个节点添加到集群

/home/dsp/redis/redis-dist/bin/redis-cli -h 172.16.10.124 -p 7001
172.16.10.124:7001>cluster meet 172.16.10.124 7002   ##  发现节点7002
ok
172.16.10.124:7001>cluster meet 172.16.10.124 7003   ##  发现节点7003
ok
172.16.10.124:7001>cluster meet 172.16.10.124 7004   ##  发现节点7004
ok
172.16.10.124:7001>cluster meet 172.16.10.124 7005   ##  发现节点7005
ok
172.16.10.124:7001>cluster meet 172.16.10.124 7006   ##  发现节点7006
ok
172.16.10.124:7001> cluster nodes                    ##  查看各节点(这里是配置完成后的状态)
0d5f60ff7924fa50481be549468c5573fdfcec61 172.16.10.124:[email protected] master - 0 1552818771245 33 connected 0-332 911-1465 6371-6412 8213-10922 11832-13653
36e9750c9a96d4f7d7ce426c33e480c2e9495556 172.16.10.124:[email protected] slave 6d76523a04caf087b10ce7a9a142e61226160706 0 1552818768000 31 connected
35f038cc759359c5b667ff0eec340ae6fb74281b 172.16.10.125:[email protected] master - 0 1552818767236 38 connected 333-910 1466-3617 4549-6370 10923-11831
ce9fdb802e79b267cd53ceabdd5f6eebc9c8c62b 172.16.10.124:[email protected] myself,slave 35f038cc759359c5b667ff0eec340ae6fb74281b 0 1552818768000 28 connected
93f1c6aad0638d18afabf6aab359ca7a2bb67be0 172.16.10.124:[email protected] slave 0d5f60ff7924fa50481be549468c5573fdfcec61 0 1552818770243 33 connected
9e1791317caf8f73ee5806c33050b8e2d6c8a35d 172.16.10.125:[email protected] slave 0d5f60ff7924fa50481be549468c5573fdfcec61 0 1552818769241 33 connected
6d76523a04caf087b10ce7a9a142e61226160706 172.16.10.124:[email protected] master - 0 1552818768000 31 connected 3618-4548 6413-8212 13654-16383
1f2fb0de1f9c228854b54160bd6a802dac22b4a3 172.16.10.124:[email protected] slave 35f038cc759359c5b667ff0eec340ae6fb74281b 0 1552818766000 38 connected
2fec3d068a518897769a4d3074ae3dadedf8b73c 172.16.10.125:[email protected] slave 6d76523a04caf087b10ce7a9a142e61226160706 0 1552818768239 31 connected

9. master节点分配槽位

## 0~16383,一共16384个槽位,这里设置三个master均分
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7001 cluster addslots {0..5461}     
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7002 cluster addslots {5462..10922}
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7003 cluster addslots {10923..16383}

10. 分配slave节点

../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7004 cluster replicate "master-node-id"
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7004 cluster replicate ce9fdb802e79b267cd53ceabdd5f6eebc9c8c62b   ## 设置7004为7001的slave节点
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7005 cluster replicate 93f1c6aad0638d18afabf6aab359ca7a2bb67be0   ## 设置7005为7002的slave节点
../../redis-dist/bin/redis-cli -h 172.16.10.124 -p 7006 cluster replicate 6d76523a04caf087b10ce7a9a142e61226160706   ## 设置7006为7003的slave节点

至此,集群创建完成,节点对应关系如下

master-7001→slave-7004
master-7002→slave-7005
master-7003→slave-7006

验证集群

11. 使用redis-cli --cluster check查看集群状态

../../redis-dist/bin/redis-cli --cluster check 172.16.10.124:7004

12. 如下,三个master,六个slave,以及每个master槽位分配情况(这里是故障模拟以及动态增加节点后的状态,仅供参考)

[email protected]:~/redis/redis-cluster/7006$ ../../redis-dist/bin/redis-cli --cluster check 172.16.10.124:7004
172.16.10.124:7003 (6d76523a...) -> 6 keys | 5461 slots | 2 slaves.
172.16.10.125:7001 (35f038cc...) -> 4 keys | 5461 slots | 2 slaves.
172.16.10.124:7005 (0d5f60ff...) -> 0 keys | 5462 slots | 2 slaves.
[OK] 10 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 172.16.10.124:7004)
S: 1f2fb0de1f9c228854b54160bd6a802dac22b4a3 172.16.10.124:7004
   slots: (0 slots) slave
   replicates 35f038cc759359c5b667ff0eec340ae6fb74281b
M: 6d76523a04caf087b10ce7a9a142e61226160706 172.16.10.124:7003
   slots:[3618-4548],[6413-8212],[13654-16383] (5461 slots) master
   2 additional replica(s)
S: 36e9750c9a96d4f7d7ce426c33e480c2e9495556 172.16.10.124:7006
   slots: (0 slots) slave
   replicates 6d76523a04caf087b10ce7a9a142e61226160706
S: 9e1791317caf8f73ee5806c33050b8e2d6c8a35d 172.16.10.125:7002
   slots: (0 slots) slave
   replicates 0d5f60ff7924fa50481be549468c5573fdfcec61
S: ce9fdb802e79b267cd53ceabdd5f6eebc9c8c62b 172.16.10.124:7001
   slots: (0 slots) slave
   replicates 35f038cc759359c5b667ff0eec340ae6fb74281b
S: 2fec3d068a518897769a4d3074ae3dadedf8b73c 172.16.10.125:7003
   slots: (0 slots) slave
   replicates 6d76523a04caf087b10ce7a9a142e61226160706
M: 35f038cc759359c5b667ff0eec340ae6fb74281b 172.16.10.125:7001
   slots:[333-910],[1466-3617],[4549-6370],[10923-11831] (5461 slots) master
   2 additional replica(s)
M: 0d5f60ff7924fa50481be549468c5573fdfcec61 172.16.10.124:7005
   slots:[0-332],[911-1465],[6371-6412],[8213-10922],[11832-13653] (5462 slots) master
   2 additional replica(s)
S: 93f1c6aad0638d18afabf6aab359ca7a2bb67be0 172.16.10.124:7002
   slots: (0 slots) slave
   replicates 0d5f60ff7924fa50481be549468c5573fdfcec61
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

13. 通过set/get检测集群

登陆不同的节点进行set get检验,这里略过

动态增加slave节点

**14. 重复步骤1,2,3,4,5,在redis-2服务器新建节点并启动 \
注意配置文件的bind设置,启动后,根据步骤8 将新增的节点添加到cluster,最后步骤10,设置slave**

15. 检测集群状态

../../redis-dist/bin/redis-cli --cluster check 172.16.10.124:7004

redis-check

动态增加master节点(redis-2服务器增加7004节点作为master)

16. 重复步骤1、2、3、4、5、8

17. 槽位重分配

../../redis-cli --cluster reshard 172.16.10.125:7004

新加入的master由于没有分配槽位,是没有任何数据的

删除节点

"172.16.10.124:7001" 为集群中任意一个非待删除节点,"node-id"为待删除节点,如果删除节点为master节点,则在删除前需要将该master下负责的slot先行迁移,具体参考步骤17
../../redis-dist/bin/redis-cli --cluster del-node 172.16.10.124:7001 2fec3d068a518897769a4d3074ae3dadedf8b73c

如果删除后,其它节点还看得到这个节点,则通过“forget”命令解决,需要在所有看的到的节点执行

cluster forget "node-id"

故障模拟以及灾难恢复