Saturday, January 19, 2013

Learning Cloud Foundry - NATS

All Cloud Foundry components communicates with each other using NATS publish-subscribe mechanism.
Lets see in detail what is NATS like.

Cloud Foundry Router as NATS client

During startup of Cloud Foundry Router (router.rb), there is a call to start NATS client below :
NATS.start(:uri => config['mbus'])
It means that NATS client will be started with the uri parameter obtained from configuration parameter 'mbus'. 
config_path = ENV["CLOUD_FOUNDRY_CONFIG_PATH"] || File.join(File.dirname(__FILE__), '../config')
config_file = File.join(config_path, 'router.yml')
opts.on("-c", "--config [ARG]", "Configuration File") do |opt|
    config_file = opt
We find that the configuration parameter will be read from a YAML (yet Another Markup Language) that whose location could came from multiple sources, in these order :
  1. specified from command line parameter -c or --config
  2. environment variable CLOUD_FOUNDRY_CONFIG_PATH (if exists) concatenated with '/router.yml'
  3. ../config/router.yml relative to location of Ruby file router.rb (in current condition, router/lib/router.rb)
Note that the code didn't check whether the file exists at all, but it only checks the existence of the configuration switch or environment variable.

After starting the NATS client, the router publishes router start events :

@router_id = VCAP.secure_uuid
@hello_message = { :id => @router_id, :version => Router::VERSION }.to_json.freeze
# This will check on the state of the registered urls, do maintenance, etc..
# Setup a start sweeper to make sure we have a consistent view of the world.
EM.next_tick do
# Announce our existence
NATS.publish('router.start', @hello_message)
# Don't let the messages pile up if we are in a reconnecting state
EM.add_periodic_timer(START_SWEEPER) do
unless NATS.client.reconnecting?
NATS.publish('router.start', @hello_message)

This ensures other components realizes that there is a new router starting up (and might need situation update).

NATS client module

NATS client module starts an Eventmachine-based network connection to the NATS server given in the uri parameter. But it also has defaults (client.rb):
(line 13@client.rb)
DEFAULT_URI = "nats://localhost:#{DEFAULT_PORT}".freeze
.. (line 102@client.rb)
opts[:uri] ||= ENV['NATS_URI'] || DEFAULT_URI
We see that the uri parameter are obtained from  (in such order) :
  1. uri parameter when calling NATS.start
  2. NATS_URI environment variable
  3. DEFAULT_URI which is nats://localhost:4222
Publish call is implemented by sending text line to the NATS server :

# Publish a message to a given subject, with optional reply subject and completion block
# @param [String] subject
# @param [Object, #to_s] msg
# @param [String] opt_reply
# @param [Block] blk, closure called when publish has been processed by the server.
def publish(subject, msg=EMPTY_MSG, opt_reply=nil, &blk)
return unless subject
msg = msg.to_s
# Accounting
@msgs_sent += 1
@bytes_sent += msg.bytesize if msg
send_command("PUB #{subject} #{opt_reply} #{msg.bytesize}#{CR_LF}#{msg}#{CR_LF}")
queue_server_rt(&blk) if blk

The NATS server takes care of messaging all the subscribers that are interested in the message.

NATS Server

The Nats server is also implemented in Ruby. The source code shows us that the startup sequence is as follows :
  1. do setup using given command line arguments
  2. start  eventmachine on given host and port parameters, using NATSD::Connection module to serve connections
  3. if given http_port parameter, starts http monitoring on such port
The server class (which does setup) is separated into a few files : core server (server.rb) and option handling (options.rb).
The startup options is first being read from the command-line (see parser method here), then the server would read a configuration file (if given) for additional parameters. The parameters given from command-line will not be changed, only missing options would be read from the YAML-formatted configuration file. The available parameters are :
  1. addr (listen network address)
  2. port
  3. http: net,port,user, password (for http monitoring port)
  4. authorization : user, password, auth_timeout
  5. ssl (will request ssl on connections)
  6. debug
  7. trace
  8. syslog (activate syslogging)
  9. pid_file
  10. log_file
  11. log_time
  12. ping : interval,max_outstanding
  13. max_control_line, max_payload, max_connections (setup limits)
The subscription list is being keep in the server class, along with route_to_subscribers method for sending message to registered parties. 
 The Connection module is the heart of  NATS server's operations. NATS server could be configured to require authentication or to require SSL connecition. The operation it recognizes are :
  1. PUB (publish), which would send a payload to registered parties
  2. SUB (subscribe), which would register the caller to a message subject.
  3. UNSUB (unsubscribe), unregister a subscribtion
  4. PING, which would make the server send a response message
  5. PONG - actually not an operation but a mechanism to ensure client health
  6. CONNECT, to reconfigure verbose and pedantic connection options
  7. INFO, which would make the server send an information json string
  8. CTRL-C/CTRL-D, both would make the server close the connection.
The server sends ping messages periodically to each client according to Server.ping_interval. If outstanding non-reply is above max_outstanding parameter, the server will tell the client that it is unresponsive and close the client connection afterwards.

No comments: