Wednesday, January 30, 2013

SE Linux Cure (mini mini post)

I think its only natural for admins to avoid SELinux. Having SELinux enabled can make your simplest changes resulting in system failures. Or maybe even no change at all (no changes that you remember..).

The Cure

In the past, I depend upon a few commands that might shed some light on SELinux troubles.
The commands are :
  1. ls -Z : this parameter shows additional column, named the security context, that is owned by each file
  2. chcon : this command changes a file's context to the given context argument. Example, chcon -t mysqld_db_t mysql - this command sets the security context of the mysql directory to mysqld_db_t
  3. restorecon : this command restores a file's context to default
But a recent trouble opened my mind that more tools are needed. For example, in Ubuntu systems, we might need to poke the directory /etc/apparmor.d and edit rule files there.
In recent CentOS trouble, these commands are handy :
  • yum install setroubleshoot - this installs sealert, semanage tools
  • sealert -a /var/log/audit/audit.log - this dumps audit log into readable messages
  • semanage fcontext - this changes many file context according to a wildcard path expression.
  • setenforce - it might help to temporarily allow the selinux violation during troubleshooting. This can be done by setenforce 0 to allow and setenforce 1 to disallow violations.
  • semodule -DB - disable dontaudit clauses. 
  • semodule -B - reenable dontaudit clauses
  • restorecon -vF filename_or_directory - reset SELinux context of the file or directory to default, normally enabling access to such filename/directory that otherwise denied by SElinux

Tuesday, January 29, 2013

MySQL Corrupt Tables and How to Avoid it

Once in a while MySQL's tables became corrupted. This post is not interested in repair process (you should see other posts, but the most general advice is to do a REPAIR TABLE table; )
In my humble opinion, a real life production database must not have any corruption, it must have sufficient failsafe mechanisms to avoid such corruption.

Causes of Corruption

MyISAM tables could became corrupted by  (refer http://dev.mysql.com/doc/refman/5.1/en/corrupted-myisam-tables.html) :
  • mysqld is being killed in the middle of write
  • unexpected computer shutdown occured
  • hardware failures
  • running an external program (example: myisamchk) while mysqld is running
  • software bug in Mysql/myISAM code

Tips to mitigate data corruption

Do not use MyISAM for long lasting data. Use InnoDB. InnoDB is less corruption prone than MyISAM. Use file per table option.
Check your disk space. Check database availability periodically. On one occasion, my mysql data partition is full, and noticed that queries have been stopped responding. I deleted some unused log files and MySQL returned to normal service, without any corruption. Somehow the engine pauses when it run out of disk space and resumes after space is available.
Turn on binary logging. It helps in disaster forensics (such as when your table has zero rows and you need to find out which app/person responsible).
Install a secondary mysql as slave server. If you only have virtual machines, it would be better if the slave server is ensured to be in another region or another physical server.
Ensure MySQL's memory usage is compatible with available memory. This means no other application are allowed to dynamically eat memory. Out of memory conditions will turn the Linux OS into a process killer, so the probability of which should be set as zero as possible.
Backup periodically. Daily automated backups will be perfect. Must think about where the backups will be stored outside the server.

Saturday, January 19, 2013

Learning Cloud Foundry - NATS

All Cloud Foundry components communicates with each other using NATS publish-subscribe mechanism.
Lets see in detail what is NATS like.

Cloud Foundry Router as NATS client

During startup of Cloud Foundry Router (router.rb), there is a call to start NATS client below :
NATS.start(:uri => config['mbus'])
It means that NATS client will be started with the uri parameter obtained from configuration parameter 'mbus'. 
config_path = ENV["CLOUD_FOUNDRY_CONFIG_PATH"] || File.join(File.dirname(__FILE__), '../config')
config_file = File.join(config_path, 'router.yml')
...
opts.on("-c", "--config [ARG]", "Configuration File") do |opt|
    config_file = opt
  end
...
We find that the configuration parameter will be read from a YAML (yet Another Markup Language) that whose location could came from multiple sources, in these order :
  1. specified from command line parameter -c or --config
  2. environment variable CLOUD_FOUNDRY_CONFIG_PATH (if exists) concatenated with '/router.yml'
  3. ../config/router.yml relative to location of Ruby file router.rb (in current condition, router/lib/router.rb)
Note that the code didn't check whether the file exists at all, but it only checks the existence of the configuration switch or environment variable.

After starting the NATS client, the router publishes router start events :

@router_id = VCAP.secure_uuid
@hello_message = { :id => @router_id, :version => Router::VERSION }.to_json.freeze
 
# This will check on the state of the registered urls, do maintenance, etc..
Router.setup_sweepers
 
# Setup a start sweeper to make sure we have a consistent view of the world.
EM.next_tick do
# Announce our existence
NATS.publish('router.start', @hello_message)
# Don't let the messages pile up if we are in a reconnecting state
EM.add_periodic_timer(START_SWEEPER) do
unless NATS.client.reconnecting?
NATS.publish('router.start', @hello_message)
end
end
end

This ensures other components realizes that there is a new router starting up (and might need situation update).

NATS client module

NATS client module starts an Eventmachine-based network connection to the NATS server given in the uri parameter. But it also has defaults (client.rb):
(line 13@client.rb)
DEFAULT_PORT = 4222
DEFAULT_URI = "nats://localhost:#{DEFAULT_PORT}".freeze
 ...
.. (line 102@client.rb)
opts[:uri] ||= ENV['NATS_URI'] || DEFAULT_URI
...
We see that the uri parameter are obtained from  (in such order) :
  1. uri parameter when calling NATS.start
  2. NATS_URI environment variable
  3. DEFAULT_URI which is nats://localhost:4222
Publish call is implemented by sending text line to the NATS server :

# Publish a message to a given subject, with optional reply subject and completion block
# @param [String] subject
# @param [Object, #to_s] msg
# @param [String] opt_reply
# @param [Block] blk, closure called when publish has been processed by the server.
def publish(subject, msg=EMPTY_MSG, opt_reply=nil, &blk)
return unless subject
msg = msg.to_s
# Accounting
@msgs_sent += 1
@bytes_sent += msg.bytesize if msg
 
send_command("PUB #{subject} #{opt_reply} #{msg.bytesize}#{CR_LF}#{msg}#{CR_LF}")
queue_server_rt(&blk) if blk
end

The NATS server takes care of messaging all the subscribers that are interested in the message.

NATS Server


The Nats server is also implemented in Ruby. The source code shows us that the startup sequence is as follows :
  1. do setup using given command line arguments
  2. start  eventmachine on given host and port parameters, using NATSD::Connection module to serve connections
  3. if given http_port parameter, starts http monitoring on such port
The server class (which does setup) is separated into a few files : core server (server.rb) and option handling (options.rb).
The startup options is first being read from the command-line (see parser method here), then the server would read a configuration file (if given) for additional parameters. The parameters given from command-line will not be changed, only missing options would be read from the YAML-formatted configuration file. The available parameters are :
  1. addr (listen network address)
  2. port
  3. http: net,port,user, password (for http monitoring port)
  4. authorization : user, password, auth_timeout
  5. ssl (will request ssl on connections)
  6. debug
  7. trace
  8. syslog (activate syslogging)
  9. pid_file
  10. log_file
  11. log_time
  12. ping : interval,max_outstanding
  13. max_control_line, max_payload, max_connections (setup limits)
The subscription list is being keep in the server class, along with route_to_subscribers method for sending message to registered parties. 
 The Connection module is the heart of  NATS server's operations. NATS server could be configured to require authentication or to require SSL connecition. The operation it recognizes are :
  1. PUB (publish), which would send a payload to registered parties
  2. SUB (subscribe), which would register the caller to a message subject.
  3. UNSUB (unsubscribe), unregister a subscribtion
  4. PING, which would make the server send a response message
  5. PONG - actually not an operation but a mechanism to ensure client health
  6. CONNECT, to reconfigure verbose and pedantic connection options
  7. INFO, which would make the server send an information json string
  8. CTRL-C/CTRL-D, both would make the server close the connection.
The server sends ping messages periodically to each client according to Server.ping_interval. If outstanding non-reply is above max_outstanding parameter, the server will tell the client that it is unresponsive and close the client connection afterwards.

Friday, January 11, 2013

Learning Cloud Foundry Router

Background

The Cloud Foundry PaaS platform serves web applications. The Cloud Foundry router components is different from a router in the plain IT-talk, the Cloud Foundry router is a Ruby-based software component that determines which backend droplet that should serve (each and every) http request that are requested to one of the applications deployed on the platform.

Starting Point

In order to understand the mechanism of the router, we start from nginx, the http server that serves as the primary gatekeeper. Nginx configuration is my primary concern. So lets see the vcap source code that describes nginx configuration on each and every Cloud Foundry server. In the vcap/dev_setup/cookbooks/nginx folder we found a templates directory that stores nginx configuration templates. 
The templates I suppose being used by the dev_setup installation procedure.

Configuration templates

We have cc-nginx.conf.erb Ruby template along with router-nginx.conf.erb template and some other files. Lets see the router configuration (router-nginx.conf.erb).

user root root;
worker_processes 1;

The first two lines says that there are only one worker processes. This should not be much of limitation because nginx use event-driven architecture that could process multiple concurrent connections from a single process/thread. But still, this means allocating two CPU to the VM hosting the router component would not be effective since only one CPU would run the router process.
 
error_log <%= node[:nginx][:log_home] %>/nginx_router_error.log debug;
pid /var/run/nginx_router.pid;

The next two describe error logging and pid file. This getting less interesting, so lets skip to the routing parts..
    location = /vcapuls {
      internal;
      # We should use rewrite_by_lua to scrub subrequest headers
      # as uls doesn't care those headers at all.
      # Given there are some exceptions to clear some headers,
      # we just leave them as is.
      proxy_pass http://unix:/tmp/router.sock:/;
    }
This means an nginx-internal url is defined with the path '/vcapuls'. Other comment says 'upstream locator server' that match the ULS acronym. So ULS is a server that locates upstream server, that is very much the same as the purpose of the Router component.
        set $backend_addr ''; # Backend server address returned from uls for this request
        set $uls_req_tags ''; # Request tags returned from uls for this request to catalog statistics
        set $router_ip '';
        set $timestamp 0;
        set $trace '';
        set $sticky '';
        access_by_lua '
          local uls = require ("uls")
          uls.pre_process_subrequest(ngx, "<%= node[:router][:trace_key] %>")
          local req = uls.generate_uls_request(ngx)
          -- generate one subrequest to uls for querying
          local res = ngx.location.capture(
            "/vcapuls", { body = req }
          )
          uls.post_process_subrequest(ngx, res)
        ';
We found that for generic url nginx would consult the /vcapuls url, which are described before as http pass-through to unix socket named /tmp/router.sock.
The LUA script inside nginx configuration file is something new to myself. It seems to call LUA module named ULS.  After calling the LUA script, nginx execute http pass-through to the backend_addr. No assignment to backend_addr are seen in the LUA script, so it must be set inside the ULS module.
        proxy_pass http://$backend_addr;
        # Handling response from backend servers
        header_filter_by_lua '
          local uls = require ("uls")
          uls.post_process_response(ngx)
        ';


ULS Module

The ULS module can be found in vcap router component source code. The Description confirms what ULS stands for.
 -- Description:         Helper for nginx talking to uls(Upstream Locator Server)

 
Two functions in the ULS module are most important in the routing mechanism, they are generate_uls_request and post_process_response.

function generate_uls_request(ngx)
  local uls_req_spec = {}
  -- add host in request
  uls_req_spec[uls.ULS_HOST_QUERY] = ngx.var.http_host
  -- add sticky session in request
  local uls_sticky_session = retrieve_vcap_sticky_session(
          ngx.req.get_headers()[COOKIE_HEADER])
  if uls_sticky_session then
    uls_req_spec[ULS_STICKY_SESSION] = uls_sticky_session
    ngx.log(ngx.DEBUG, "req sticks to backend session:"..uls_sticky_session)
  end
  -- add status update in request
  local req_stats = uls.serialize_request_statistics()
  if req_stats then
    uls_req_spec[ULS_STATS_UPDATE] = req_stats
  end
  return cjson.encode(uls_req_spec)
end
function post_process_subrequest(ngx, res)
  if res.status ~= 200 then
    ngx.exit(ngx.HTTP_NOT_FOUND)
  end
  local msg = cjson.decode(res.body)
  ngx.var.backend_addr = msg[ULS_BACKEND_ADDR]
  ngx.var.uls_req_tags = msg[ULS_REQEST_TAGS]
  ngx.var.router_ip = msg[ULS_ROUTER_IP]
  ngx.var.sticky = msg[ULS_STICKY_SESSION]
  ngx.var.app_id = msg[ULS_APP_ID]
  ngx.log(ngx.DEBUG, "route "..ngx.var.http_host.." to "..ngx.var.backend_addr)
end

Reading the code we understand that the LUA module encodes the requested server name (ngx.var.http_host, which I assume will contain DNS name of the deployed application) into a json structure . The post_process_subrequest function decode the result of calling /vcapuls using router's Unix socket into backend_addr nginx variable. I don't really understood the implication of using nginx variable for this, instead of using return values. I hope there are no race conditions here when processing high workloads. The nginx variable backend_addr will be used in the proxy_pass directive in the nginx configuration file to forward the request to backend servers.

ULS Server

The ULS server is written in Ruby using Sinatra (library or framework? I am noobz in Ruby). Lets jump to the source code.
get "/" do

Looks like the Ruby source code tells us that this section handles HTTP GET request at the / URL. 
    # Parse request body
    uls_req = JSON.parse(body, :symbolize_keys => true)
    raise ParserError if uls_req.nil? || !uls_req.is_a?(Hash)
    stats, url = uls_req[ULS_STATS_UPDATE], uls_req[ULS_HOST_QUERY]
    sticky = uls_req[ULS_STICKY_SESSION]
    if stats then
      update_uls_stats(stats)
    end

This part does request parsing and statistics update if the stats flag is set. It seems that the ULS module keeps track of statistics and forward it to the ULS server.
    if url then
      # Lookup a droplet
      unless droplets = Router.lookup_droplet(url)
        Router.log.debug "No droplet registered for #{url}"
        raise Sinatra::NotFound
      end

The ULS server try to find which droplets is responsible for the requested URL.
      # Pick a droplet based on original backend addr or pick a droplet randomly
      if sticky
        droplet = droplets.find { |d| d[:session] == sticky }
        Router.log.debug "request's __VCAP_ID__ is stale" unless droplet
      end
      droplet ||= droplets[rand*droplets.size]
      Router.log.debug "Routing #{droplet[:url]} to #{droplet[:host]}:#{droplet[:port]}"
      # Update droplet stats
      update_droplet_stats(droplet)
From the droplets, one is chosen randomly. But sticky session flag can be used to link a session to a certain droplet. Reading once more through the ULS module, we find that sticky session is stored in the client using VCAP cookie.
But how does the router does URL lookup ? Lets see the Router class.

Router

The router listens to the NATS bus. Judging from the API, it is a messaging bus using publish/subscribe model. We are very interested in the lookup_droplet method.
    def lookup_droplet(url)
      @droplets[url.downcase]
    end
    def register_droplet(url, host, port, tags, app_id, session=nil)
      return unless host && port
      url.downcase!
      tags ||= {}
      droplets = @droplets[url] || []
      # Skip the ones we already know about..
      droplets.each { |droplet|
        # If we already now about them just update the timestamp..
        if(droplet[:host] == host && droplet[:port] == port)
          droplet[:timestamp] = Time.now
          return
        end
      }
      tags.delete_if { |key, value| key.nil? || value.nil? }
      droplet = {
        :app => app_id,
        :session => session,
        :host => host,
        :port => port,
        :clients => Hash.new(0),
        :url => url,
        :timestamp => Time.now,
        :requests => 0,
        :tags => tags
      }
      add_tag_metrics(tags)
      droplets << droplet
      @droplets[url] = droplets
      VCAP::Component.varz[:urls] = @droplets.size
      VCAP::Component.varz[:droplets] += 1
      log.info "Registering #{url} at #{host}:#{port}"
      log.info "#{droplets.size} servers available for #{url}"
    end

The lookup is simple enough. The droplets field contains an associative array that keyed using lower-case URL. The register_droplet function tells us that each droplet that started by the cloud controller registers itself to the router, specifying application id, url, droplet host and droplet port. The Router store mappings from each lower-case URL to array of droplets.

In this blog post, we've explored the source of Cloud Foundry's router component and its methods of operation, following the flow from nginx server, to the Ruby module, and to the droplet running our applications. That's all for now..  

Thursday, January 10, 2013

Learning Cloud Foundry PHP staging and deployment

Background

VMWare Cloud Foudry is an open source Platform for PaaS (platform as a service). I see cloud foundry as a tool to simplify multiple application deployment in multiple servers. This post will describe things I found by reading source code of vcap (Vmware Cloud Application Platform) at github here and here.
My point of interest is php deployment capability that exists on vcap, which are contributed by phpfog developers.
My previous exploration of Cloud Foundry resulted in this picture below, which describe  my current understanding of the Cloud Foundry platform.


Starting point

The starting point of php support is given an interesting commit I had seen before, where phpfog team implements php functionality. At first it took me about 10 minutes browsing the vcap network graph (https://github.com/cloudfoundry/vcap/network), then I just realized there is a distinct phpfog branch in the vcap git.. 
The interesting commit is titled 'Support for deploying PHP applications through a standard Apache configuration with built-in support for APC, memcache, mongo and redis', authored by 'cardmagic' about 2 years ago (see the commit in github here). 

Staging

PHP applications are deployed using the vmc client. For now I just ignore the client part. The client communicates with the cloud controller, which in turn will command the DEA (Droplet Execution Agent) to deploy applications. DEA will execute the start_droplet function, which will invoke the correct staging plugin associated with the application's runtime.
[maybe I would research further on the relation between start_droplet and plugins]

The PHP plugin (ref) prepares the application by executing this ruby fragment  :
 def stage_application
    Dir.chdir(destination_directory) do
      create_app_directories
      Apache.prepare(destination_directory)
      copy_source_files
      create_startup_script
    end
  end

I am no Ruby programmer, so I sure hope I read this correctly..
At first, the plugin changes the current directory into droplet instance directory. In there, it calls the method create_app_directories (which I guess would create some required directories in there). Then it calls prepare method of the Apache class. Reading apache.rb, we know what the Apache::prepare does is that it copies apache.zip from the plugin directory and extracts it into the droplet instance directory. The apache.zip consists of configuration directories of an apache httpd server, with some modification so it honors several environment variable that would be injected in apache/envvars below. Generate_apache_conf script is also being copied from the plugin resource directory.
I guess the copy_source_files method would copy the application source codes into the droplet instance directory.
After that, startup script will be created using startup_script method :
 def startup_script
    vars = environment_hash
    generate_startup_script(vars) do
      <<PHPEOF
env > env.log
ruby resources/generate_apache_conf $VCAP_APP_PORT $HOME $VCAP_SERVICES
PHPEOF
    end
  end
  Which conveniently executes generate_apache_conf script, which in turn will create some apache configuration files and a shell script based on application parameters. The files are :
  1. apache/sites-available/default, which defines DocumentRoot, ErrorLog file, log format, and VCAP_SERVICES environment variable
  2. apache/envvars, which defines apache user, group, pid file, base directory
  3. apache/ports.conf, which define the port where apache listens
  4. apache/start.sh, which is the script that would start the apache server in the droplet directory with the created configuration files

 Running the App

 Two methods in php plugin tells us how the platform starts the application :
  # The Apache start script runs from the root of the staged application.
  def change_directory_for_start
    "cd apache"
  end
  def start_command
    "bash ./start.sh"
  end
So it starts the application by running apache/start.sh that is created by the previous generate_apache_conf script.

The Most Current Version 

I try to look for the latest lib/vcap/staging/plugin/php/plugin.rb file, at first  I found none, because it is already migrated from vcap repository to vcap-staging repository. Refer here to the newer version.
I notice an improvement which would allow us to define application memory allocated to the PHP application and also a stop command which invoke kill -9 :

  def stop_command
    cmds = []
    cmds << "CHILDPIDS=$(pgrep -P ${1} -d ' ')"
    cmds << "kill -9 ${1}"
    cmds << "for CPID in ${CHILDPIDS};do"
    cmds << " kill -9 ${CPID}"
    cmds << "done"
    cmds.join("\n")
  end
  private
  def startup_script
    generate_startup_script do
      <<- span="span">PHPEOF
env > env.log
ruby resources/generate_apache_conf $VCAP_APP_PORT $HOME $VCAP_SERVICES #{application_memory}m
PHPEOF
    end
  end

The kill -9 thing really handy because in numerous ocassions I am forced to do such command manually to stop a stuck/hung php process. The generate_apache_conf script is enhanced to create an additional php configuration file (apache/php/memory.ini) which impose a memory limit:
output_path = 'apache/php/memory.ini'
template = <<- span="span">ERB
memory_limit = <%= php_ram %>
ERB

That tells us that memory limitiation is for a single apache/PHP process. Collective application memory usage can be determined by n * (php_ram + x) where n is the amount of apache process running and x is the memory used by apache on its own. That makes me wonder about max client in apache's configuration (in apache.zip), here is the latest version :

    StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild   0

The configuration fragment above essentially says that running apache process could be between 5 to 150 child processes, and typically 10 during idle.
 
There is also an additional line in stage_application method to copy php configuration files too :
system "cp -a #{File.join(resource_dir, "conf.d", "*")} apache/php"
This environment variable export in generate_apache_conf script enables the apache/php directory to contain php configuration files :
export PHP_INI_SCAN_DIR=$APACHE_BASEDIR/php

 Finishing remarks

I hope by reading this will allow us to customize Cloud Foundry's PHP support as needed. I might need to add additional php extensions, that must be inserted into php's conf.d directory (shown above copied from the resource directory). And also it might be interesting to implement a method to change  MaxClients from the cloud API

Thursday, January 3, 2013

Interfacing Joget Workflow Engine with .NET and PHP

The Joget Workflow Engine have JSON API interface which could be used to query and control workflow processes. This will be useful if we need to implement a custom front-end UI for joget workflow. Especially if the development team have minimum or even zero Java development experience.

I have created two API test applications, one is using PHP and the other is written in C# using .NET framework.
The PHP version is written using the wonderful Yii Framework.
The C# version is a windows forms application, but the WorkflowService class theoretically could be used in ASP.NET application.

Feel free to download and try ...
To start using this test application, you must first configure joget to allow login using master login username and master password (click System Settings:General Setting, find the section System Administration Settings). Fill the master login & password (in my example, superuser and password00). This must match the password in top of protected/components/YREST.php or WorkflowService.cs.

Then you should create a workflow or  choose a sample workflow. Open the workflow page ( 2 Design Apps : [Application Name]) using Joget's Web Console Application, and click on the 'Show Additional Info' link.
Write down the Process Definition Id, but replace the '#' with ':' character.
The process definition ID could be used to start process in the tester application.

Fill in the process definition id in the Joget-start page. Fill in the userid with existing Joget User ID.
Afterwards, you could check that the workflow will be started in the Joget Web Console's Running Processes view.
To query the inbox for a certain user, fill in the userid in Joget-Inbox page.
To control the workflow, you could fill in the activity Id and userid in the Joget-Complete page. Up to 3 variables could be set from this page.
Of course this all could be done from joget Web Console 's UI. But the purpose of these two test apps is to demonstrate how do we access the Joget APIs from PHP platform and from .NET platform .