Saturday, October 4, 2014

Debugging Ruby code - Mcollective server

In this post I record steps that I took to debug some ruby code. Actually the code was an ruby mcollective server code that were installed as part of openshift origin Node. The bug is that the server consistently fails to respond to client queries in my configuration. I documented the steps taken  even though I hadn't nailed the bug yet.

First thing first

First we need to identify the entry point. These commands would do the trick:
[root@broker ~]# service ruby193-mcollective status
mcollectived (pid  1069) is running...
[root@broker ~]# ps afxw | grep 1069
 1069 ?        Sl     0:03 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg
12428 pts/0    S+     0:00          \_ grep 1069

We found out that the service is :
  • running with pid 1069
  • running with configuration file /opt/rh/ruby193/root/etc/mcollective/server.cfg
  • service's source code is at /opt/rh/ruby193/root/usr/sbin/mcollectived

The most intrusive way yet the simplest

The simplest way is to insert 'puts' calls inside the code you want to debug. For objects, you want to call the inspect method.

But the code I am interested in is deep inside call graph of the mcollectived. I want to find out the details of activemq subscription. Skipping hours of skimming mcollective source (https://github.com/puppetlabs/marionette-collective/) and openshift origin mcollective server source (https://github.com/openshift/origin-server/tree/master/plugins/msg-node/mcollective), lets jump to the activemq.rb file :
[root@broker ~]# locate activemq.rb
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/connector/activemq.rb

Lets hack some code (if you're doing this for real, do some backup first):
[root@broker ~]# vi /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/connector/activemq.rb

Add some puts here and there ..
      # Subscribe to a topic or queue
      def subscribe(agent, type, collective)
        source = make_target(agent, type, collective)
        puts "XXXX subscribe to "
        puts agent
        puts type
        puts collective

And.. it doesnt work. Because the service reassigns standard output to /dev/null. Ah. And not far from the make_target call we have a Log.debug call, lets imitate it :
      def subscribe(agent, type, collective)
        source = make_target(agent, type, collective)
        Log.debug("XXXX subscribe to #{agent} - #{type} - #{collective}")
        unless @subscriptions.include?(source[:id])
          Log.debug("Subscribing to #{source[:name]} with headers #{source[:headers].inspect.chomp}")

And we need to know where the log goes.. check out the configuration file (or the /proc/[pid]/fd directory, if you want):

vi /opt/rh/ruby193/root/etc/mcollective/server.cfg

topicprefix = /topic/
main_collective = mcollective
collectives = mcollective
libdir = /opt/rh/ruby193/root/usr/libexec/mcollective
logfile = /var/log/openshift/node/ruby193-mcollective.log
loglevel = debug
daemonize = 1
direct_addressing = 1
registerinterval = 30

Restart the service :
service ruby193-mcollective restart

View the logs:
cat /var/log/openshift/node/ruby193-mcollective.log | grep XXX

[root@broker ~]# cat /var/log/openshift/node/ruby193-mcollective.log | grep XXX
D, [2014-10-04T09:59:22.392472 #17552] DEBUG -- : activemq.rb:371:in `subscribe' XXXX subscribe to discovery - broadcast - mcollective
D, [2014-10-04T09:59:26.049920 #17552] DEBUG -- : activemq.rb:371:in `subscribe' XXXX subscribe to openshift - broadcast - mcollective
D, [2014-10-04T09:59:26.095865 #17552] DEBUG -- : activemq.rb:371:in `subscribe' XXXX subscribe to rpcutil - broadcast - mcollective
D, [2014-10-04T09:59:26.191664 #17552] DEBUG -- : activemq.rb:371:in `subscribe' XXXX subscribe to mcollective - broadcast - mcollective
D, [2014-10-04T09:59:26.202263 #17552] DEBUG -- : activemq.rb:371:in `subscribe' XXXX subscribe to mcollective - directed - mcollective

There, I find what I came for, parameters for the subscribe method calls.

The nonintrusive way, but not yet successful

Actually, we are not supposed to hack source codes like that. Lets learn the real ruby debugger.
Check the command line and then stop the service.
[root@broker ~]# ps auxw | grep mcoll
root     17552  0.5  4.3 378212 44520 ?        Sl   09:59   0:03 ruby /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg
root     19873  0.0  0.0 103240   852 pts/0    S+   10:08   0:00 grep mcoll
[root@broker ~]# service ruby193-mcollective stop
Shutting down mcollective:                                 [  OK  ]
[root@broker ~]# ruby -rdebug  /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg
/usr/lib/ruby/1.8/tracer.rb:16: Tracer is not a class (TypeError)
        from /usr/lib/ruby/1.8/debug.rb:10:in `require'
        from /usr/lib/ruby/1.8/debug.rb:10

Oops. Something is wrong. I used the built in ruby, which is 1.8, not 1.9.3. Lets try again.

[root@broker ~]# scl enable ruby193 bash
[root@broker ~]# ruby -rdebug  /opt/rh/ruby193/root/usr/sbin/mcollectived --pid=/opt/rh/ruby193/root/var/run/mcollectived.pid --config=/opt/rh/ruby193/root/etc/mcollective/server.cfg
Debug.rb
Emacs support available.

/opt/rh/ruby193/root/usr/sbin/mcollectived:3:require 'mcollective'
(rdb:1)

Now we are in RDB, Ruby Debugger. What are the commands?
(rdb:1) help
Debugger help v.-0.002b
Commands
  b[reak] [file:|class:]
  b[reak] [class.]
                             set breakpoint to some position
  wat[ch]       set watchpoint to some expression
  cat[ch] (|off)  set catchpoint to an exception
  b[reak]                    list breakpoints
  cat[ch]                    show catchpoint
  del[ete][ nnn]             delete some or all breakpoints
  disp[lay]     add expression into display expression list
  undisp[lay][ nnn]          delete one particular or all display expressions
  c[ont]                     run until program ends or hit breakpoint
  s[tep][ nnn]               step (into methods) one line or till line nnn
  n[ext][ nnn]               go over one line or till line nnn
  w[here]                    display frames
  f[rame]                    alias for where
  l[ist][ (-|nn-mm)]         list program, - lists backwards
                             nn-mm lists given lines
  up[ nn]                    move to higher frame
  down[ nn]                  move to lower frame
  fin[ish]                   return to outer frame
  tr[ace] (on|off)           set trace mode of current thread
  tr[ace] (on|off) all       set trace mode of all threads
  q[uit]                     exit from debugger
  v[ar] g[lobal]             show global variables
  v[ar] l[ocal]              show local variables
  v[ar] i[nstance]  show instance variables of object
  v[ar] c[onst]     show constants of object
  m[ethod] i[nstance]  show methods of object
  m[ethod]    show instance methods of class or module
  th[read] l[ist]            list all threads
  th[read] c[ur[rent]]       show current thread
  th[read] [sw[itch]]  switch thread context to nnn
  th[read] stop        stop thread nnn
  th[read] resume      resume thread nnn
  p expression               evaluate expression and print its value
  h[elp]                     print this help
           evaluate

Lets checkout where we are (w).
(rdb:1) w
--> #1 /opt/rh/ruby193/root/usr/sbin/mcollectived:3
(rdb:1)

Ok, and list the source code (l):
(rdb:1) l
[-2, 7] in /opt/rh/ruby193/root/usr/sbin/mcollectived
   1  #!
   2
=> 3  require 'mcollective'
   4  require 'getoptlong'
   5
   6  opts = GetoptLong.new(
   7    [ '--help', '-h', GetoptLong::NO_ARGUMENT ],

Step to next line (n) :
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:4:require 'getoptlong'

The execution proceeds to the next line.
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:6:opts = GetoptLong.new(
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:12:if MCollective::Util.windows?
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/mcollective/util.rb:1:module MCollective

I found a little strange that the debugger steps in to another source file. 

(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:15:  configfile = "/opt/rh/ruby193/root/etc/mcollective/server.cfg"
(rdb:1) l
[10, 19] in /opt/rh/ruby193/root/usr/sbin/mcollectived
   10  )
   11
   12  if MCollective::Util.windows?
   13    configfile = File.join(MCollective::Util.windows_prefix, "etc", "server.cfg")
   14  else
=> 15    configfile = "/opt/rh/ruby193/root/etc/mcollective/server.cfg"
   16  end
   17  pid = ""
   18
   19  opts.each do |opt, arg|

But it quickly returns to the original source.

(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:17:pid = ""
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:19:opts.each do |opt, arg|
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:31:config = MCollective::Config.instance
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:1:module MCollective
(rdb:1) n
/opt/rh/ruby193/root/usr/sbin/mcollectived:33:config.loadconfig(configfile) unless config.configured
(rdb:1) n
warn 2014/10/04 10:16:16: config.rb:117:in `block in loadconfig' Use of deprecated 'topicprefix' option.  This option is ignored and should be removed from '/opt/rh/ruby193/root/etc/mcollective/server.cfg'
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:16: `' (NilClass)
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:3:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:2:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:1:in `'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `load'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `loadclass'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:142:in `loadconfig'
        from /opt/rh/ruby193/root/usr/sbin/mcollectived:33:in `
'
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:16:  remove_method :to_yaml rescue nil
(rdb:1)

This is a NilClass error, similar to NullPointerException, but I could proceeds further into another code by keeping doing (n):

(rdb:1)
n
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:17:  alias :to_yaml :psych_to_yaml
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:20:class Module
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:29: `' (NilClass)
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:3:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:2:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:1:in `'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `load'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `loadclass'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:142:in `loadconfig'
        from /opt/rh/ruby193/root/usr/sbin/mcollectived:33:in `
'
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:29:  remove_method :yaml_as rescue nil
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:30:  alias :yaml_as :psych_yaml_as
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych/core_ext.rb:33:if defined?(::IRB)
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych.rb:12:require 'psych/deprecated'
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych/deprecated.rb:79: `' (NilClass)
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/share/rubygems/rubygems/custom_require.rb:36:in `require'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:3:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:2:in `'
        from /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:1:in `'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `load'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/pluginmanager.rb:169:in `loadclass'
        from /opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:142:in `loadconfig'
        from /opt/rh/ruby193/root/usr/sbin/mcollectived:33:in `
'
/opt/rh/ruby193/root/usr/share/ruby/psych/deprecated.rb:79:  undef :to_yaml_properties rescue nil

(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/psych.rb:94:module Psych
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/yaml.rb:86:    engine = 'psych'
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/yaml.rb:96:module Syck # :nodoc:
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/yaml.rb:100:module Psych # :nodoc:
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/yaml.rb:104:YAML::ENGINE.yamler = engine
(rdb:1) n
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/facts/yaml_facts.rb:10:    class Yaml_facts
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/mcollective/facts/base.rb:1:module MCollective
(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:143:        PluginManager.loadclass("Mcollective::Connector::#{@connector}")
(rdb:1) l
[138, 147] in /opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb
   138          if @logger_type == "syslog"
   139            raise "The sylog logger is not usable on the Windows platform" if Util.windows?
   140          end
   141
   142          PluginManager.loadclass("Mcollective::Facts::#{@factsource}_facts")
=> 143          PluginManager.loadclass("Mcollective::Connector::#{@connector}")
   144          PluginManager.loadclass("Mcollective::Security::#{@securityprovider}")
   145          PluginManager.loadclass("Mcollective::Registration::#{@registration}")
   146          PluginManager.loadclass("Mcollective::Audit::#{@rpcauditprovider}") if @rpcaudit
   147          PluginManager << {:type => "global_stats", :class => RunnerStats.new}

It seems that the context of those errors were facts loading process. The connector is a particular interest, because :

(rdb:1) n
/opt/rh/ruby193/root/usr/share/ruby/mcollective/config.rb:144:        PluginManager.loadclass("Mcollective::Security::#{@securityprovider}")
(rdb:1) PluginManager["connector_plugin"]
#"unset", "activemq.pool.size"=>"1", "activemq.pool.1.host"=>"broker.openshift.local", "activemq.pool.1.port"=>"61613", "activemq.pool.1.user"=>"mcollective", "activemq.pool.1.password"=>"marionette", "yaml"=>"/opt/rh/ruby193/root/etc/mcollective/facts.yaml"}, @connector="Activemq", @securityprovider="Psk", @factsource="Yaml", @identity="broker.openshift.local", @registration="Agentlist", @registerinterval=30, @registration_collective=nil, @classesfile="/var/lib/puppet/state/classes.txt", @rpcaudit=false, @rpcauditprovider="", @rpcauthorization=false, @rpcauthprovider="", @configdir="/opt/rh/ruby193/root/etc/mcollective", @color=true, @configfile="/opt/rh/ruby193/root/etc/mcollective/server.cfg", @logger_type="file", @keeplogs=5, @max_log_size=2097152, @rpclimitmethod=:first, @libdir=["/opt/rh/ruby193/root/usr/libexec/mcollective"], @fact_cache_time=300, @loglevel="debug", @logfacility="user", @collectives=["mcollective"], @main_collective="mcollective", @ssl_cipher="aes-256-cbc", @direct_addressing=true, @direct_addressing_threshold=10, @default_discovery_method="mc", @default_discovery_options=[], @ttl=60, @mode=:client, @publish_timeout=2, @threaded=false, @logfile="/var/log/openshift/node/ruby193-mcollective.log", @daemonize=true>, @subscriptions=[], @msgpriority=0, @base64=false>

Yes, it loads the Activemq connector. Lets create a breakpoint in the subscribe method :
(rdb:1) b PluginManager["connector_plugin"].subscribe
Set breakpoint 1 at #.subscribe

Then continue (c)..

(rdb:1) c
[root@broker ~]#

Wait. It stops. Browsing some source code tells me that the code forks somewhere after that. And, the forked code seems to be unrelated to the debugger.. So it is a dead end for now.

Well thats all now for the record. I can't promise that there will be a more successful debugging session, but surely I hope that there will be.

No comments: