Murphy's law describe that if something could break, it would break in the worst time possible. Or something like that. Anyway, more often than not, our software doesn't behave as it should. And I often get web apps that waiting endlessly for something. It made us curious what on earth cause the app to wait. In this case, the app is openshift origin console. Being Ruby based, means there supposedly a way to dump stack from running threads.
At first I tried to borrow the openshift ruby cartridge method of thread dump. Upon reverse engineering the cartridge (ok, I just snoop in some files such as this) I am surprised to find out that all that the ruby cartridge does is to send signal ABRT to the process which has the title prefix of Rack: . Trying to apply the same procedure to the running openshift-console process, and the result is a killed process and a confusion.
Another reference, the Phusion Passenger user's guide, tells me that Ruby & Python process that received an ABRT would print backtrace and then aborts the process. The fact that the backtrace were missing in any of the known log files after sending the signal ABRT made me skeptical about usability of this technique. The user's guide also states that signal QUIT could also be send to Ruby processes, and supposed to have the same result without killing the process in cold blood. But sending QUIT to openshift-console's Rack process also have no solid result either.
kill -s QUIT <pid>
Knowing openshift-console's processes should contact openshift-broker's API, I starts sending QUIT to openshift-broker's Rack process. And this time, reviewing log files, I got several clues about the cause of the freezing openshift-console.
Another valuable reference in debugging frozen process is a new relic blog entry.
Unfortunately the post must end when the story is not quite finished yet. But the technique of sending QUIT to Rack process (or is it httpd process? forgot it) can shed some light about what region of ruby code currently executing right now.