Friday, December 28, 2012

Nginx and Linux Kernel Update Blues


We have a three-server landscape that serves our internal applications. All of them were running nginx on RHEL linux. The nginx were installed from EPEL repository packages. Two days ago I updated the nginx in one server (lets say server A) in order to repackage it after enhancing it with HttpSubsModule, during the compilation process it complains about missing kernel functionality. For the compile to be success, I updated kernel-devel and kernel packages (in server A) to the latest one. After the compilation succeeded, I copied the resulting rpm to server B and installed it.

The problem

After reinstalling the rpm package in server B, I didn't remember to restart the nginx server. After installing new kernel in server A, I havent restarted it either. But the next day, because of one thing and another, server A got restarted, and the application on server A no longer works. Server B's nginx server also got restarted, and afterwards unable to serve any web pages (not even fixed pages). So we got two servers down out of three. Thats not good.

Investigation : Eventfd not implemented

Investigation in nginx error log shows that server B's nginx server complains about failed evenfd() calls (ref) :
eventfd() failed (38: Function not implemented)
I suddenly recall this problem already happens before on server C, that the cause was newer nginx package needs a backported kernel feature. The solution is install latest kernel and reboot.
yum update kernel
The root cause of this is that the nginx EPEL package doesn't correctly specify  the true minimum kernel version required for it to work.

Investigation: SELinux prevents text relocation

After got the kernel update, both server A and B still unable to serve the application pages.
Browsing nginx and php's error log shows that the php-oci8 extension is not loaded (needed to connect to oracle which is being used as database backend of our legacy applications).
The clue why the extension failed to load were found in /var/log/messages, it seems that SELinux complained that the oracle's dynamic library (libclntsh) is trying to do code relocation and failed (see redhat bug report here and article about PHP, SELinux, and Oracle here).
setroubleshoot: SELinux is preventing php from loading /opt/oracle/instantclient/ which requires text relocation. For complete SELinux messages. run sealert -l ac5924e8-3af4-4f26-8d79-e5eedf9d9d7a
The solution is by giving texrel_shlib_t SELinux type to all oracle instant client dynamic libraries.
chcon -t texrel_shlib_t *.so
chcon -t texrel_shlib_t *.so.10.1
You could see the resulting attributes on all files using
ls -lZ
 Why this haven't occured before the kernel update? Maybe SELinux permission is gotten more restricted in recent kernels. But I am not certain about that. I wonder whether SELinux attributes could be specified in rpm packages. If that is so, a better solution for the world is that to build a better rpm package containing instant client, oci8,  and required SELinux permissions. But I wonder would that violate Oracle's EULA..


Thinking in more generic frame of mind, these incidents occured because we update  the kernel and the nginx package. Is it wrong? No. The best practice on server administration is that all security updates is applied as soon as possible. But we better plan to do updates outside application's busy hours, so if problems crop out we have sufficient breathing room :)

No comments: