干货收藏!Calico的BGP RouteReflector策略实践
本文分享自华为云社区《Calico BGP RouteReflector策略实践》,作者:可以交个朋友。
一 背景
容器网络组件Calico支持多种后端模式,有Overlay的IPIP、Vxlan模式,也有Underlay纯路由的BGP模式。
相比于Overlay网络模型,Underlay网络具有更高的数据面转发性能。同时在纯路由模式下,也有两种方案:Calico BGP的fullmesh方案,该方案存在一些限制,适用于小规模kubernetes集群,集群节点越多,BGP连接就越多,需要建立大量连接来保证网络的互通性,每增加一个节点就要成倍的增加连接保证网络的互通性,这样的话就会使用大量的网络消耗。所以这时就可以使用Route Reflector模式,也称为RR模式。RR模式
中会指定一个或多个BGP Speaker为RouterReflecor,它与网络中其他Speaker建立连接,每个Speaker只要与Router Reflector建立BGP就可以获得全网的路由信息。
二 Calico BGP RouteReflector模式组网架构
在不改变IDC机房内部网络拓扑的情况下,接入层交换机和核心层交换机建立BGP连接,借助于机房内部已有的路由策略实现,针对Node所处的物理位置分配Pod CIDR,并在每个节点上将Pod CIDR通过BGP协议宣告给接入层交换机,实现全网通信的能力。下图基于Leaf-Spine架构做详细说明。
组网原则:
- 每个接入层交换机与其管理的Node二层联通,共同构成一个AS。每个节点上跑BGP服务,用于宣告本节点路由信息。
- 核心层交换机和接入层交换机之间的每个路由器单独占用一个AS,物理直连,跑BGP协议。核心层交换机可以感知到全网的路由信息,接入层交换机可以感知与自己直连的Node上的路由信息。
- 同一个主机上的pod互访通过宿主机路由器。(将linux主机当成一个路由器)
- 同一个机架上不同node上的pod通信通过TOR(leaf)交换机
- 不同机架上pod通信走核心交换机
三 模拟生产场景组网搭建环境
提前准备一台Ubuntu2204操作系统的机器(规格8U16G即可)。需要在虚拟机上安装如下软件工具:
- Docker
- go开发环境
- Kind(kubernetes兴趣小组开发的一款kuberntes in docker软件,可用来快速搭建k8s测试环境,kind安装需要主机上先安装go,kind安装版本可选v0.20.0版本)
- ContainerLab(使用容器技术构建的虚拟网络平台,可以使用vyos镜像构建虚拟的交换机路由器。建议安装v0.42.0版本的containerlab)
3.1 kubernetes 环境搭建
kubernetes集群版本为: 1.27.3
集群规模为1 master,3 work node
集群构建脚本如下: 1-setup-env.sh
#!/bin/bash date set -v # 1.prep noCNI env cat <<EOF | kind create cluster --name=calico-bgp-rr --image=kindest/node:v1.27.3 --config=- kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 networking: disableDefaultCNI: true podSubnet: "10.244.0.0/16" nodes: - role: control-plane kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.10 node-labels: "rack=rack0" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.5.11 node-labels: "rack=rack0" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.10 node-labels: "rack=rack1" - role: worker kubeadmConfigPatches: - | kind: JoinConfiguration nodeRegistration: kubeletExtraArgs: node-ip: 10.1.8.11 node-labels: "rack=rack1" EOF # 2.remove taints kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule- kubectl get nodes -o wide # 3. install tools for i in $(docker ps -a --format "table {{.Names}}" |grep calico-bgp-rr) do echo $i docker cp /usr/bin/ping $i:/usr/bin/ping docker cp /usr/local/bin/calicoctl $i:/usr/local/bin/ # docker exec -it $i bash -c "apt-get -y update > /dev/null && apt-get -y install net-tools tcpdump lrzsz > /dev/null 2>&1" done
执行脚本创建集群,由于未安装cni组件,集群部分pod会出现pending等状态,集群node 也会处于NotReady状态,这是正常现象。后面安装calico cni组件后,就可以解决。
3.2 创建网桥
在主机上创建网桥,主要作用是为了连通kind创建的K8s node和containerlab构建的交换机之间的网络。
brctl addbr br-leaf0;ifconfig br-leaf0 up;brctl addbr br-leaf1;ifconfig br-leaf1 up
3.3 借助containerLab搭建三层交换机并配置BGP规则
containerlab构建交换机脚本如下:2-setup-clab.sh
#!/bin/bash set -v cat <<EOF>clab.yaml | clab deploy -t clab.yaml - name: calico-bgp-rr topology: nodes: spine0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine0-boot.cfg:/opt/vyatta/etc/config/config.boot spine1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/spine1-boot.cfg:/opt/vyatta/etc/config/config.boot leaf0: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf0-boot.cfg:/opt/vyatta/etc/config/config.boot leaf1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9 cmd: /sbin/init binds: - /lib/modules:/lib/modules - ./startup-conf/leaf1-boot.cfg:/opt/vyatta/etc/config/config.boot br-leaf0: kind: bridge br-leaf1: kind: bridge server1: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-control-plane exec: - ip addr add 10.1.5.10/24 dev net0 - ip route replace default via 10.1.5.1 server2: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker exec: - ip addr add 10.1.5.11/24 dev net0 - ip route replace default via 10.1.5.1 server3: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker2 exec: - ip addr add 10.1.8.10/24 dev net0 - ip route replace default via 10.1.8.1 server4: kind: linux image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool network-mode: container:calico-bgp-rr-worker3 exec: - ip addr add 10.1.8.11/24 dev net0 - ip route replace default via 10.1.8.1 links: - endpoints: ["br-leaf0:br-leaf0-net0", "server1:net0"] - endpoints: ["br-leaf0:br-leaf0-net1", "server2:net0"] - endpoints: ["br-leaf1:br-leaf1-net0", "server3:net0"] - endpoints: ["br-leaf1:br-leaf1-net1", "server4:net0"] - endpoints: ["leaf0:eth1", "spine0:eth1"] - endpoints: ["leaf0:eth2", "spine1:eth1"] - endpoints: ["leaf0:eth3", "br-leaf0:br-leaf0-net2"] - endpoints: ["leaf1:eth1", "spine0:eth2"] - endpoints: ["leaf1:eth2", "spine1:eth2"] - endpoints: ["leaf1:eth3", "br-leaf1:br-leaf1-net2"] EOF
可以看到containerlab组网成功,vyos对应的交换机上的bgp路由协议配置参照文档末尾。
3.4 Calico cni插件部署安装
由于Calico默认安装的是ipip模式,需要手动进行关闭,不通过ipip/vxlan封装就会开启bgp模式。
kubectl apply -f calico.yaml
#kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.23/manifests/calico.yaml
Calico组件安装完成后,节点之间建立的BGP连接是fullmesh全连接的形式
3.5 Calico BGP RR模式开启
fullmesh全连接形式在大规模集群中并不适用,我们需要关闭bgp fullmesh的模式,采取bgp route reflector
方法如下: 3-disable-bgp-full-mesh.sh
#!/bin/bash set -v # 1. disable bgp fullmesh cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 items: - apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: false kind: BGPConfigurationList metadata: EOF
3.6 Calico node 配置BGP RR规则
kubernetes 集群中的节点作为BGP 路由反射器的客户端,需要和BGP路由反射器配置peer信息以达到同步路由的功能。
#!/bin/bash set -v # 1.3. add() bgp configuration for the nodes cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: annotations: labels: rack: rack0 name: calico-bgp-rr-control-plane spec: addresses: - address: 10.1.5.10 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.10/24 orchRefs: - nodeName: calico-bgp-rr-control-plane orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack0 name: calico-bgp-rr-worker spec: addresses: - address: 10.1.5.11 type: InternalIP bgp: asNumber: 65005 ipv4Address: 10.1.5.11/24 orchRefs: - nodeName: calico-bgp-rr-worker orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack1 name: calico-bgp-rr-worker2 spec: addresses: - address: 10.1.8.10 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.10/24 orchRefs: - nodeName: calico-bgp-rr-worker2 orchestrator: k8s EOF cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: Node metadata: labels: rack: rack1 name: calico-bgp-rr-worker3 spec: addresses: - address: 10.1.8.11 type: InternalIP bgp: asNumber: 65008 ipv4Address: 10.1.8.11/24 orchRefs: - nodeName: calico-bgp-rr-worker3 orchestrator: k8s EOF # 1.4. peer to leaf0 switch cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: rack0-to-leaf0 spec: peerIP: 10.1.5.1 asNumber: 65005 nodeSelector: rack == 'rack0' EOF # 1.5. peer to leaf1 switch cat <<EOF | calicoctl apply -f - apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: rack1-to-leaf1 spec: peerIP: 10.1.8.1 asNumber: 65008 nodeSelector: rack == 'rack1' EOF
登录到集群中任意节点,查看BGP信息: 发现已经不再是BGP full mesh的形式了。node specific 表示该节点是路由反射器的客户端,对端即路由反射器是10.1.5.1这个地址
四 集群外访问Pod进行BGP验证测试
部署测试业务
apiVersion: apps/v1 kind: DaemonSet #kind: Deployment metadata: labels: app: app name: app spec: #replicas: 2 selector: matchLabels: app: app template: metadata: labels: app: app spec: containers: - image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool name: nettoolbox --- apiVersion: v1 kind: Service metadata: name: app spec: type: NodePort selector: app: app ports: - name: app port: 8080 targetPort: 80 nodePort: 32000
登录集群任意节点查看路由规则
例如: 10.244.210.64/26 via 10.1.5.1 dev net0 proto bird
, 就是表示通过BGP协议学习的路由,bird则是calico中的BGP客户端
登录leaf0交换机查看BGP信息和路由规则
查看路由表:
可以发现leaf0交换机上存在k8s集群中的pod路由信息,也就是说可以访问集群中的pod
查看BGP信息:show ip bgp
可以明显看到:
前往地址为: 10.1.8.0/24
|| 10.244.192.0/26
|| 10.244.210.64
的设备 下一跳有两个10.1.12.2
和10.1.10.2
属于EBGP路由,包含ECMP策略
前往地址为: 10.244.81.64/26
|| 10.244.205.64/26
下一跳分别为10.1.5.10
||10.1.5.11
属于IBGP路由
访问测试
集群中pod互访
核心交换机访问集群pod
如果说核心交换机和公网配置ebgp规则同步路由后,公网流量也就能进入kubernetes集群中了。五 Containerlab中的vyos容器镜像模拟交换机的配置文件
spine0-boot.cfg如下:interfaces { ethernet eth1 { address 10.1.10.2/24 duplex auto speed auto } ethernet eth2 { address 10.1.34.2/24 duplex auto speed auto } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.10.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.10.1 { address-family { ipv4-unicast { } } remote-as 65005 } neighbor 10.1.34.1 { address-family { ipv4-unicast { } } remote-as 65008 } parameters { bestpath { as-path { multipath-relax } } } system-as 500 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name spine0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address "10.1.12.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } ethernet eth2 { address "10.1.11.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.11.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.11.1 { address-family { ipv4-unicast { } } remote-as "65008" } neighbor 10.1.12.1 { address-family { ipv4-unicast { } } remote-as "65005" } parameters { bestpath { as-path { multipath-relax { } } } router-id "10.1.8.1" } system-as "800" } } system { config-management { commit-revisions "100" } conntrack { modules { ftp { } h323 { } nfs { } pptp { } sip { } sqlnet { } tftp { } } } console { device ttyS0 { speed "9600" } } host-name "spine1" login { user vyos { authentication { encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/" plaintext-password "" } } } time-zone "UTC" } // Warning: Do not remove the following line. // // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address 10.1.10.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.12.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.5.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.5.0/24 { } network 10.1.10.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.5.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.5.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.10.2 { address-family { ipv4-unicast { } } remote-as 500 } neighbor 10.1.12.2 { address-family { ipv4-unicast { } } remote-as 800 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.5.1 } system-as 65005 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
interfaces { ethernet eth1 { address 10.1.34.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.11.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.8.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.8.0/24 { } network 10.1.11.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.8.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.8.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.11.2 { address-family { ipv4-unicast { } } remote-as 800 } neighbor 10.1.34.2 { address-family { ipv4-unicast { } } remote-as 500 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.8.1 } system-as 65008 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf1 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317