Notes on Learning Open VSwitch in Openshift OKD

July 31, 2020

Background

Open VSwitch or ovs is the software that openshift used to ensure pods in the openshift cluster could talk to each other using cluster-internal IP addresses. In default settings, hosts outside the cluster are not be able to connect directly to pods; and haproxy (running in 'openshift router' pods) sofware listening on host port is doing the work of distributing http request into each and every application's pods.

In order to follow this blog post you will need root-level OS access on the hosts / openshift nodes, so it is geared towards system administrator of the openshift platform, not the casual application developer.

Examining OVS bridge

[root@node3 ~]# ovs-vsctl list-br

br0

So we know the OVS bridge is named br0.

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-ports-desc br0

OFPST_PORT_DESC reply (OF1.3) (xid=0x2):

1(vxlan0): addr:62:b2:af:23:90:ea

config: 0

state: 0

speed: 0 Mbps now, 0 Mbps max

2(tun0): addr:b2:23:c3:0f:e4:56

config: 0

state: 0

speed: 0 Mbps now, 0 Mbps max

4(vetheffe37e7): addr:ae:2d:49:6f:63:e5

config: 0

state: 0

current: 10GB-FD COPPER

speed: 10000 Mbps now, 0 Mbps max

5(vethf5f23dab): addr:1e:50:3b:2b:c2:66

config: 0

state: 0

current: 10GB-FD COPPER

speed: 10000 Mbps now, 0 Mbps max

...<redacted>..

This command describes multiple ports connected to br0.

[root@node1 ~]# ovs-vsctl list Interface

_uuid : 6a087201-8904-42b1-98e3-25b0cc705be3

admin_state : up

bfd : {}

bfd_status : {}

cfm_fault : []

cfm_fault_status : []

cfm_flap_count : []

cfm_health : []

cfm_mpid : []

cfm_remote_mpids : []

cfm_remote_opstate : []

duplex : full

error : []

external_ids : {}

ifindex : 40019

ingress_policing_burst: 0

ingress_policing_rate: 0

lacp_current : []

link_resets : 0

link_speed : 10000000000

link_state : up

lldp : {}

mac : []

mac_in_use : "a6:2a:94:13:6e:2c"

mtu : 1450

mtu_request : []

name : "vethc20c6e49"

ofport : 5747

ofport_request : []

options : {}

other_config : {}

statistics : {collisions=0, rx_bytes=113334297, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=1286765, tx_bytes=134317929, tx_dropped=0, tx_errors=0, tx_packets=1963619}

status : {driver_name=veth, driver_version="1.0", firmware_version=""}

type : ""

...<redacted>..

This command also shows more detail for all interfaces registered in the system.

Determining pod IP address

oc get pods dev4-prod-120-58wwz -o yaml

...<redacted>

phase: Running

podIP: 10.130.0.226

startTime: 2020-07-31T15:40:42Z

This command, usually run in client, shows the pod information available in openshift, including podIP.

We could also use jsonpath formatting to extract the podIP.

oc get pods dev4-prod-120-58wwz -o jsonpath={.status.podIP}

10.130.0.226

Tracing OVS Flow

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-flows br0 | grep 10.130.0.226

cookie=0x0, duration=35168.268s, table=20, n_packets=5811, n_bytes=244062, priority=100,arp,in_port=7854,arp_spa=10.130.0.226,arp_sha=ba:f5:f5:e6:e5:39 actions=load:0->NXM_NX_REG0[],goto_table:21

cookie=0x0, duration=35168.265s, table=20, n_packets=98642, n_bytes=18472691, priority=100,ip,in_port=7854,nw_src=10.130.0.226 actions=load:0->NXM_NX_REG0[],goto_table:21

cookie=0x0, duration=35168.262s, table=40, n_packets=5823, n_bytes=244566, priority=100,arp,arp_tpa=10.130.0.226 actions=output:7854

cookie=0x0, duration=35168.262s, table=70, n_packets=119641, n_bytes=225619754, priority=100,ip,nw_dst=10.130.0.226 actions=load:0->NXM_NX_REG1[],load:0x1eae->NXM_NX_REG2[],goto_table:80

Dump-flows shows flow rules installed (by openshift). By using 'grep' we filter rules that related to one IP address. In the resulting flow we see that with destination 10.130.0.22 will be processed first by loading 0x1eae (= 7854 in decimal) value first, and the logic jumps to table 80.

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-flows br0 | grep "table=80"

cookie=0x0, duration=16006044.135s, table=80, n_packets=388568503, n_bytes=32628601914, priority=300,ip,nw_src=10.130.0.1 actions=output:NXM_NX_REG2[]

cookie=0x0, duration=16006044.132s, table=80, n_packets=0, n_bytes=0, priority=0 actions=drop

cookie=0x0, duration=7894349.775s, table=80, n_packets=3837887096, n_bytes=2263560750277, priority=200 actions=output:NXM_NX_REG2[]

Next we grep using table condition shown before (table=80). We find that the action is to set the output port according the previously stored REG2 (which is 7854).

[root@node1 ~]# ovs-vsctl list Interface | grep -C 1 7854

name : "veth819d4ff6"

ofport : 7854

ofport_request : []

By searching for keyword '7854' in the interface list, we found that virtual ethernet interface veth819d4ff6

is related to previous pod IP address. Armed with this information we could do tcpdump on the interface to troubleshoot application issues.

Note the role mismatch, using this method we need system-administrator access to troubleshoot application issues. Other methods to do tcpdump might be more appropriate (such as the approach using sidecars in https://developers.redhat.com/blog/2019/02/27/sidecars-analyze-debug-network-traffic-kubernetes-pod/ ).

Inventor's Paradox