The mystery of TCP segmentation offload bug

There are incidents that have a generic description 'TCP segmentation offload bug' that affects multiple virtualization platforms. The workaround is the same, by disabling this feature.

Case one

Virtualization Platform : KVM/QEMU
Symptom : Periodically, guest would lose network connectivity after heavy load. Restarting the guest network doesn't fix the problem. Guest will be ok after rebooting.
Reference :
Workaround :  ethtool -K eth0 tx off sg off tso off ufo off gso off gro off lro off

Case two

Virtualization Platform : Xen
Symptom : DomU hangs after network heavy load (@10 Mbyte/s).
Reference :
Workaround : disable offloading using ethtool
ethtool --offload gso off tso off sg off gro off

Case three

Virtualization Platform : VMWare
Symptom :
1. Page could not be displayed after VM migration to the same ESX host. Cisco Nexus 1000V and F5 involved before reaching the IIS VM.
2. In other incident, Cisco Nexus 1000V sending a large TCP segment causing Purple Screen of Death of the ESXi host.
Reference :
Workaround : Turn off TSO in VM

Case four

Virtualization Platform : VMWare
Symptom :
When enabling Traffic Shaping on a Distributed vSwitch (DVS), Linux virtual machines using the VMXNET3 driver experience network throughput degradation.
Reference :
Workaround : Disable TSO and LRO in Guest VM


Disabling TSO and LRO might fix your virtualization network problem, whatever it may be.


Popular posts from this blog

Reverse Engineering Reptile Kernel module to Extract Authentication code

Long running process in Linux using PHP

Repair Your Windows Store and Metro Apps After Windows Update