Friday, January 11, 2013

Learning Cloud Foundry Router

Background

The Cloud Foundry PaaS platform serves web applications. The Cloud Foundry router components is different from a router in the plain IT-talk, the Cloud Foundry router is a Ruby-based software component that determines which backend droplet that should serve (each and every) http request that are requested to one of the applications deployed on the platform.

Starting Point

In order to understand the mechanism of the router, we start from nginx, the http server that serves as the primary gatekeeper. Nginx configuration is my primary concern. So lets see the vcap source code that describes nginx configuration on each and every Cloud Foundry server. In the vcap/dev_setup/cookbooks/nginx folder we found a templates directory that stores nginx configuration templates. 
The templates I suppose being used by the dev_setup installation procedure.

Configuration templates

We have cc-nginx.conf.erb Ruby template along with router-nginx.conf.erb template and some other files. Lets see the router configuration (router-nginx.conf.erb).

user root root;
worker_processes 1;

The first two lines says that there are only one worker processes. This should not be much of limitation because nginx use event-driven architecture that could process multiple concurrent connections from a single process/thread. But still, this means allocating two CPU to the VM hosting the router component would not be effective since only one CPU would run the router process.
 
error_log <%= node[:nginx][:log_home] %>/nginx_router_error.log debug;
pid /var/run/nginx_router.pid;

The next two describe error logging and pid file. This getting less interesting, so lets skip to the routing parts..
    location = /vcapuls {
      internal;
      # We should use rewrite_by_lua to scrub subrequest headers
      # as uls doesn't care those headers at all.
      # Given there are some exceptions to clear some headers,
      # we just leave them as is.
      proxy_pass http://unix:/tmp/router.sock:/;
    }
This means an nginx-internal url is defined with the path '/vcapuls'. Other comment says 'upstream locator server' that match the ULS acronym. So ULS is a server that locates upstream server, that is very much the same as the purpose of the Router component.
        set $backend_addr ''; # Backend server address returned from uls for this request
        set $uls_req_tags ''; # Request tags returned from uls for this request to catalog statistics
        set $router_ip '';
        set $timestamp 0;
        set $trace '';
        set $sticky '';
        access_by_lua '
          local uls = require ("uls")
          uls.pre_process_subrequest(ngx, "<%= node[:router][:trace_key] %>")
          local req = uls.generate_uls_request(ngx)
          -- generate one subrequest to uls for querying
          local res = ngx.location.capture(
            "/vcapuls", { body = req }
          )
          uls.post_process_subrequest(ngx, res)
        ';
We found that for generic url nginx would consult the /vcapuls url, which are described before as http pass-through to unix socket named /tmp/router.sock.
The LUA script inside nginx configuration file is something new to myself. It seems to call LUA module named ULS.  After calling the LUA script, nginx execute http pass-through to the backend_addr. No assignment to backend_addr are seen in the LUA script, so it must be set inside the ULS module.
        proxy_pass http://$backend_addr;
        # Handling response from backend servers
        header_filter_by_lua '
          local uls = require ("uls")
          uls.post_process_response(ngx)
        ';


ULS Module

The ULS module can be found in vcap router component source code. The Description confirms what ULS stands for.
 -- Description:         Helper for nginx talking to uls(Upstream Locator Server)

 
Two functions in the ULS module are most important in the routing mechanism, they are generate_uls_request and post_process_response.

function generate_uls_request(ngx)
  local uls_req_spec = {}
  -- add host in request
  uls_req_spec[uls.ULS_HOST_QUERY] = ngx.var.http_host
  -- add sticky session in request
  local uls_sticky_session = retrieve_vcap_sticky_session(
          ngx.req.get_headers()[COOKIE_HEADER])
  if uls_sticky_session then
    uls_req_spec[ULS_STICKY_SESSION] = uls_sticky_session
    ngx.log(ngx.DEBUG, "req sticks to backend session:"..uls_sticky_session)
  end
  -- add status update in request
  local req_stats = uls.serialize_request_statistics()
  if req_stats then
    uls_req_spec[ULS_STATS_UPDATE] = req_stats
  end
  return cjson.encode(uls_req_spec)
end
function post_process_subrequest(ngx, res)
  if res.status ~= 200 then
    ngx.exit(ngx.HTTP_NOT_FOUND)
  end
  local msg = cjson.decode(res.body)
  ngx.var.backend_addr = msg[ULS_BACKEND_ADDR]
  ngx.var.uls_req_tags = msg[ULS_REQEST_TAGS]
  ngx.var.router_ip = msg[ULS_ROUTER_IP]
  ngx.var.sticky = msg[ULS_STICKY_SESSION]
  ngx.var.app_id = msg[ULS_APP_ID]
  ngx.log(ngx.DEBUG, "route "..ngx.var.http_host.." to "..ngx.var.backend_addr)
end

Reading the code we understand that the LUA module encodes the requested server name (ngx.var.http_host, which I assume will contain DNS name of the deployed application) into a json structure . The post_process_subrequest function decode the result of calling /vcapuls using router's Unix socket into backend_addr nginx variable. I don't really understood the implication of using nginx variable for this, instead of using return values. I hope there are no race conditions here when processing high workloads. The nginx variable backend_addr will be used in the proxy_pass directive in the nginx configuration file to forward the request to backend servers.

ULS Server

The ULS server is written in Ruby using Sinatra (library or framework? I am noobz in Ruby). Lets jump to the source code.
get "/" do

Looks like the Ruby source code tells us that this section handles HTTP GET request at the / URL. 
    # Parse request body
    uls_req = JSON.parse(body, :symbolize_keys => true)
    raise ParserError if uls_req.nil? || !uls_req.is_a?(Hash)
    stats, url = uls_req[ULS_STATS_UPDATE], uls_req[ULS_HOST_QUERY]
    sticky = uls_req[ULS_STICKY_SESSION]
    if stats then
      update_uls_stats(stats)
    end

This part does request parsing and statistics update if the stats flag is set. It seems that the ULS module keeps track of statistics and forward it to the ULS server.
    if url then
      # Lookup a droplet
      unless droplets = Router.lookup_droplet(url)
        Router.log.debug "No droplet registered for #{url}"
        raise Sinatra::NotFound
      end

The ULS server try to find which droplets is responsible for the requested URL.
      # Pick a droplet based on original backend addr or pick a droplet randomly
      if sticky
        droplet = droplets.find { |d| d[:session] == sticky }
        Router.log.debug "request's __VCAP_ID__ is stale" unless droplet
      end
      droplet ||= droplets[rand*droplets.size]
      Router.log.debug "Routing #{droplet[:url]} to #{droplet[:host]}:#{droplet[:port]}"
      # Update droplet stats
      update_droplet_stats(droplet)
From the droplets, one is chosen randomly. But sticky session flag can be used to link a session to a certain droplet. Reading once more through the ULS module, we find that sticky session is stored in the client using VCAP cookie.
But how does the router does URL lookup ? Lets see the Router class.

Router

The router listens to the NATS bus. Judging from the API, it is a messaging bus using publish/subscribe model. We are very interested in the lookup_droplet method.
    def lookup_droplet(url)
      @droplets[url.downcase]
    end
    def register_droplet(url, host, port, tags, app_id, session=nil)
      return unless host && port
      url.downcase!
      tags ||= {}
      droplets = @droplets[url] || []
      # Skip the ones we already know about..
      droplets.each { |droplet|
        # If we already now about them just update the timestamp..
        if(droplet[:host] == host && droplet[:port] == port)
          droplet[:timestamp] = Time.now
          return
        end
      }
      tags.delete_if { |key, value| key.nil? || value.nil? }
      droplet = {
        :app => app_id,
        :session => session,
        :host => host,
        :port => port,
        :clients => Hash.new(0),
        :url => url,
        :timestamp => Time.now,
        :requests => 0,
        :tags => tags
      }
      add_tag_metrics(tags)
      droplets << droplet
      @droplets[url] = droplets
      VCAP::Component.varz[:urls] = @droplets.size
      VCAP::Component.varz[:droplets] += 1
      log.info "Registering #{url} at #{host}:#{port}"
      log.info "#{droplets.size} servers available for #{url}"
    end

The lookup is simple enough. The droplets field contains an associative array that keyed using lower-case URL. The register_droplet function tells us that each droplet that started by the cloud controller registers itself to the router, specifying application id, url, droplet host and droplet port. The Router store mappings from each lower-case URL to array of droplets.

In this blog post, we've explored the source of Cloud Foundry's router component and its methods of operation, following the flow from nginx server, to the Ruby module, and to the droplet running our applications. That's all for now..  

No comments: