The mystery of TCP segmentation offload bug

There are incidents that have a generic description 'TCP segmentation offload bug' that affects multiple virtualization platforms. The workaround is the same, by disabling this feature.

Case one

Virtualization Platform : KVM/QEMU
Symptom : Periodically, guest would lose network connectivity after heavy load. Restarting the guest network doesn't fix the problem. Guest will be ok after rebooting.
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/134
Workaround :  ethtool -K eth0 tx off sg off tso off ufo off gso off gro off lro off

Case two

Virtualization Platform : Xen
Symptom : DomU hangs after network heavy load (@10 Mbyte/s).
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/132
Workaround : disable offloading using ethtool
ethtool --offload gso off tso off sg off gro off

Case three

Virtualization Platform : VMWare
Symptom :
1. Page could not be displayed after VM migration to the same ESX host. Cisco Nexus 1000V and F5 involved before reaching the IIS VM.
2. In other incident, Cisco Nexus 1000V sending a large TCP segment causing Purple Screen of Death of the ESXi host.
Reference : https://supportforums.cisco.com/discussion/11883926/tcp-segmentation-offload-tso-and-vmxnet31000v-bug
Workaround : Turn off TSO in VM

Case four

Virtualization Platform : VMWare
Symptom :
When enabling Traffic Shaping on a Distributed vSwitch (DVS), Linux virtual machines using the VMXNET3 driver experience network throughput degradation.
Reference : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2030927
Workaround : Disable TSO and LRO in Guest VM


Conclusion

Disabling TSO and LRO might fix your virtualization network problem, whatever it may be.

Comments

Popular posts from this blog

Long running process in Linux using PHP

Reverse Engineering Reptile Kernel module to Extract Authentication code

SAP System Copy Lessons Learned