The mystery of TCP segmentation offload bug
There are incidents that have a generic description 'TCP segmentation offload bug' that affects multiple virtualization platforms. The workaround is the same, by disabling this feature.
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/132
Workaround : disable offloading using ethtool
ethtool --offload gso off tso off sg off gro off
Symptom :
1. Page could not be displayed after VM migration to the same ESX host. Cisco Nexus 1000V and F5 involved before reaching the IIS VM.
2. In other incident, Cisco Nexus 1000V sending a large TCP segment causing Purple Screen of Death of the ESXi host.
Reference : https://supportforums.cisco.com/discussion/11883926/tcp-segmentation-offload-tso-and-vmxnet31000v-bug
Workaround : Turn off TSO in VM
Symptom :
When enabling Traffic Shaping on a Distributed vSwitch (DVS), Linux virtual machines using the VMXNET3 driver experience network throughput degradation.
Case one
Virtualization Platform : KVM/QEMU
Symptom : Periodically, guest would lose network connectivity after heavy load. Restarting the guest network doesn't fix the problem. Guest will be ok after rebooting.
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/134
Workaround : ethtool -K eth0 tx off sg off tso off ufo off gso off gro off lro off
Case two
Virtualization Platform : Xen
Symptom : DomU hangs after network heavy load (@10 Mbyte/s).Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/132
Workaround : disable offloading using ethtool
ethtool --offload
Case three
Virtualization Platform : VMWareSymptom :
1. Page could not be displayed after VM migration to the same ESX host. Cisco Nexus 1000V and F5 involved before reaching the IIS VM.
2. In other incident, Cisco Nexus 1000V sending a large TCP segment causing Purple Screen of Death of the ESXi host.
Reference : https://supportforums.cisco.com/discussion/11883926/tcp-segmentation-offload-tso-and-vmxnet31000v-bug
Workaround : Turn off TSO in VM
Case four
Virtualization Platform : VMWareSymptom :
When enabling Traffic Shaping on a Distributed vSwitch (DVS), Linux virtual machines using the VMXNET3 driver experience network throughput degradation.
Reference : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2030927
Workaround : Disable TSO and LRO in Guest VM
Conclusion
Disabling TSO and LRO might fix your virtualization network problem, whatever it may be.
Comments