HTB traffic control burst calculation inaccuracy

My previous Kubuntu 7.04's Linux 2.6.20 kernel uses a 1000 Hz timer kernel (refer to HZ constant defininition and fourth value of kernel ABI /proc/net/psched routine psched_show in sch_api.c). HTB's burst is calculated by tc binary using the formula : bitrate/timer frequency + maximum transfer unit.
Ubuntu 8.04's Linux 2.6.24 kernel have an improved packet scheduler timer resolution. It now uses high resolution timer, with nanosecond accuracy. Refer to psched_show in sch_api.c, and Fourth value of /proc/net/psched now returns 1 G (10^9). HTB burst calculation is now bitrate/10^9 + mtu.. Somehow this is will always a small value, and as a side-effect HTB qos scheduler no longer capable of delivering high bitrates accurately to stations.
My analysis based on my limited knowledge of the packet scheduler : the linux's packet scheduler is driven by calls to dequeue function. This might be driven by tx complete interrupt or something else. So it is not timer-driven. But the packet scheduler routines (such as sch_htb) keeps track of passing time using the value of packet scheduler timer (previously, 1KHz timer, and now, the 1 GHz timer). Htb adds tokens into the token bucket based on time passed between previously recorded time of change in the class (cl->t_c) and the current time (q->now) (see sch_htb.c:660). Herein lies the problem, the current time q->now is no longer the current time, because the timer frequency is in nanoseconds and there are tens if not hundreds of instructions being executed between the assignment of q->now (see htb_dequeue at sch_htb.c:894) and the usage of q->now. And, maybe, some interrupt has occurred (I dont really know, dequeue is not being called in the context of NMI, is it ?), and maybe one milliseconds has passed.. The inaccuracy of the size of the added token, causes the htb to prevent packets being sent in timely manner -> it thinks that the class not eligible to send package because token remaining is less than packet size, the token count is actually less than what it supposed to be.
This is what I thought, for now. The cure seem to be changing tc source code to assume 1KHz timer (which amounts to 1ms accuracy) when tc finds out that a nanosecond timer is in use, so the token buffer is large enough to cater for problems caused by time inaccuracies said above.

Comments

Popular posts from this blog

Long running process in Linux using PHP

Reverse Engineering Reptile Kernel module to Extract Authentication code

SAP System Copy Lessons Learned