Notes on Learning Open VSwitch in Openshift OKD

Background

Open VSwitch or ovs is the software that openshift used to ensure pods in the openshift cluster could talk to each other using cluster-internal IP addresses. In default settings, hosts outside the cluster are not be able to connect directly to pods; and haproxy (running in  'openshift router' pods) sofware listening on host port is doing the work of distributing http request into each and every application's pods. 
In order to follow this blog post you will need root-level OS access on the hosts / openshift nodes, so it is geared towards system administrator of the openshift platform, not the casual application developer.

Examining OVS bridge

[root@node3 ~]# ovs-vsctl list-br
br0

So we know the OVS bridge is named br0.

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-ports-desc br0
OFPST_PORT_DESC reply (OF1.3) (xid=0x2):
 1(vxlan0): addr:62:b2:af:23:90:ea
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(tun0): addr:b2:23:c3:0f:e4:56
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 4(vetheffe37e7): addr:ae:2d:49:6f:63:e5
     config:     0
     state:      0
     current:    10GB-FD COPPER
     speed: 10000 Mbps now, 0 Mbps max
 5(vethf5f23dab): addr:1e:50:3b:2b:c2:66
     config:     0
     state:      0
     current:    10GB-FD COPPER
     speed: 10000 Mbps now, 0 Mbps max
...<redacted>..


This command describes multiple ports connected to br0.

[root@node1 ~]# ovs-vsctl list Interface  
_uuid               : 6a087201-8904-42b1-98e3-25b0cc705be3
admin_state         : up
bfd                 : {}
bfd_status          : {}
cfm_fault           : []
cfm_fault_status    : []
cfm_flap_count      : []
cfm_health          : []
cfm_mpid            : []
cfm_remote_mpids    : []
cfm_remote_opstate  : []
duplex              : full
error               : []
external_ids        : {}
ifindex             : 40019
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current        : []
link_resets         : 0
link_speed          : 10000000000
link_state          : up
lldp                : {}
mac                 : []
mac_in_use          : "a6:2a:94:13:6e:2c"
mtu                 : 1450
mtu_request         : []
name                : "vethc20c6e49"
ofport              : 5747
ofport_request      : []
options             : {}
other_config        : {}
statistics          : {collisions=0, rx_bytes=113334297, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=1286765, tx_bytes=134317929, tx_dropped=0, tx_errors=0, tx_packets=1963619}
status              : {driver_name=veth, driver_version="1.0", firmware_version=""}
type                : ""
...<redacted>..

This command also shows more detail for all interfaces registered in the system.

Determining pod IP address

oc get pods dev4-prod-120-58wwz -o yaml
...<redacted>
  phase: Running
  podIP: 10.130.0.226
  startTime: 2020-07-31T15:40:42Z

This command, usually run in client, shows the pod information available in openshift, including podIP.
We could also use jsonpath formatting to extract the podIP.

oc get pods dev4-prod-120-58wwz -o jsonpath={.status.podIP}
10.130.0.226

Tracing OVS Flow 

[root@node1 ~]#  ovs-ofctl -O OpenFlow13 dump-flows br0  | grep 10.130.0.226
 cookie=0x0, duration=35168.268s, table=20, n_packets=5811, n_bytes=244062, priority=100,arp,in_port=7854,arp_spa=10.130.0.226,arp_sha=ba:f5:f5:e6:e5:39 actions=load:0->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=35168.265s, table=20, n_packets=98642, n_bytes=18472691, priority=100,ip,in_port=7854,nw_src=10.130.0.226 actions=load:0->NXM_NX_REG0[],goto_table:21
 cookie=0x0, duration=35168.262s, table=40, n_packets=5823, n_bytes=244566, priority=100,arp,arp_tpa=10.130.0.226 actions=output:7854
 cookie=0x0, duration=35168.262s, table=70, n_packets=119641, n_bytes=225619754, priority=100,ip,nw_dst=10.130.0.226 actions=load:0->NXM_NX_REG1[],load:0x1eae->NXM_NX_REG2[],goto_table:80

Dump-flows shows flow rules installed (by openshift). By using 'grep' we filter rules that related to one IP address. In the resulting flow we see that with destination 10.130.0.22 will be processed first by loading 0x1eae (= 7854 in decimal) value first, and the logic jumps to table 80.

[root@node1 ~]#  ovs-ofctl -O OpenFlow13 dump-flows br0  | grep "table=80"
 cookie=0x0, duration=16006044.135s, table=80, n_packets=388568503, n_bytes=32628601914, priority=300,ip,nw_src=10.130.0.1 actions=output:NXM_NX_REG2[]
 cookie=0x0, duration=16006044.132s, table=80, n_packets=0, n_bytes=0, priority=0 actions=drop
 cookie=0x0, duration=7894349.775s, table=80, n_packets=3837887096, n_bytes=2263560750277, priority=200 actions=output:NXM_NX_REG2[]

Next we grep using table condition shown before (table=80). We find that the action is to set the output port according the previously stored REG2 (which is 7854).

[root@node1 ~]# ovs-vsctl list Interface | grep -C 1 7854
name                : "veth819d4ff6"
ofport              : 7854
ofport_request      : []


By searching for keyword '7854' in the interface list, we found that virtual ethernet interface veth819d4ff6
 is related to previous pod IP address. Armed with this information we could do tcpdump on the interface to troubleshoot application issues.

Note the role mismatch, using this method we need system-administrator access to troubleshoot application issues. Other methods to do tcpdump might be more appropriate (such as the approach using sidecars in https://developers.redhat.com/blog/2019/02/27/sidecars-analyze-debug-network-traffic-kubernetes-pod/ ).



Comments

Popular posts from this blog

Long running process in Linux using PHP

Reverse Engineering Reptile Kernel module to Extract Authentication code

SAP System Copy Lessons Learned