Wednesday, November 4, 2015

Setting Default Application in Openshift Nodes

Background

The default behavior of Openshift nodes is to redirect requests for unknown applications to host/getting_started.html, usually causing endless redirect loop.
On some cases we might want this behavior to be changed, for example when we want a default page (Application not found) to show up. Or when tools such as Acunetix scanning tool incorrectly detected such redirect as medium vulnerability because the redirect uses the injected host header.

The Openshift Origin platform in use for this article is Origin Release 4, with the nodes using apache-mod-rewrite frontent plugin (rubygem-openshift-origin-frontend-apache-mod-rewrite-0.7.1.1-1.el6.noarch). 

Mechanism

The default mechanism can be read in /etc/httpd/conf.d/000001_openshift_origin_node.conf :
As we can see, routes are loaded from openshift_route.include. 
The file is full of route rules, but the interesting part are the RewriteMap clauses in the top of the file :


Nodes and aliases are loaded from DBM files : nodes.db, aliases.db. These files are created from nodes.txt and aliases.txt. 

Solution Strategy

To implement default routes for unknown application, we are going to add a  __default__ route in the bottom of routes.txt, inspired from Openshift comprehensive deployment guide (https://docs.openshift.org/origin-m4/oo_deployment_guide_comprehensive.html#apache-mod-rewrite-plugin) :
[ Reference ]
cat < /tmp/nodes.broker_routes.txt
__default__ REDIRECT:/console
__default__/console TOHTTPS:127.0.0.1:8118/console
__default__/broker TOHTTPS:127.0.0.1:8080/broker
EOF

From the reference above, we understand that the __default__ keyword could be used in the left hand side. For the right hand side, seek the application that we are going to use as the default application. 
[ /etc/httpd/conf.d/openshift/nodes.txt ]
testmed54-test.aon.telkom.co.id 127.12.87.130:8080|550d80da98988b065b000002|550d80da98988b065b000002
testmed54-test.aon.telkom.co.id/health HEALTH|550d80da98988b065b000002|550d80da98988b065b000002
testmed54-test.aon.telkom.co.id/haproxy-status 127.12.87.131:8080/|550d80da98988b065b000002|550d80da98988b065b000002

Choose the line without health nor haproxy-status, which is the first line. Copy the right hand side.
 
Backup the nodes.txt and nodes.db file before changing anything.

In the last nodes.txt line, append a new line, combining __default__ in the left hand side with 127.12.87.130:8080|550d80da98988b065b000002|550d80da98988b065b000002 in the right hand side :
__default__ 127.12.87.130:8080|550d80da98988b065b000002|550d80da98988b065b00000

Convert nodes.txt to nodes.db :
httxt2dbm -f DB -i /etc/httpd/conf.d/openshift/nodes.txt -o /etc/httpd/conf.d/openshift/nodes.db
It might be necessary to restart httpd after the conversion.

Result

After the change outlined above, the node will direct requests with unknown applications to the default application (for example, testmed54-test).


Wednesday, October 14, 2015

Reclaiming Free Space in Centos 6 / RHEL 6 VirtualBox Disk Images

Background


Using virtualization has its benefits, but also has its shortfalls. For example, creating and deleting files in a virtual disk could make disk usage larger than what it is supposed to be. For example, disk usage of filesystem /dev/sdb1 in the VM is 45 GB. But the virtualbox disk image (VDI) size is 57 GB, as shown below :

The VDI (cdh-node1_secondary_hdd.vdi) is about 57 GBs, yet the usage in /dev/sdb1 is 45 GB. So there is about 12 GB of wasted space.

The cure

Reading various posts, the cure seems consist of two steps :
  1. run zerofree to fill unused spaces with zeros
  2. run vboxmanage with option --compact
The reference could be read in several blog posts (for Vmware : here, here , for VirtualBox :  here and here) . 
But the caveat is, there is no zerofree rpm for RHEL 6. One of the blog posts hinted that we could create one by recompiling zerofree Source RPM (srpm) for Centos 5.

Running zerofree

First, obtain the source rpm :
Using RPM search at (RPM search zerofree) , we have these urls  :
Total files number: 7
Site  Filename  Distribution  File size
ftp.muug.mb.ca  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
mirror.switch.ch  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
ftp.gwdg.de  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
ftp.icm.edu.pl  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
ftp.pbone.net  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
ftp.sunet.se  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490
ftp.is.co.za  zerofree-1.0.1-5.el5.src.rpm  RedHat EL 5  17490

Download one file of the list, and build it :

E2fsprogs-devel need to be installed first, I suppose.
Reexecuting rebuild :


After successfull build, install the RPM :

Afterwards, stop the services using the /dfs, unmount the partition, then run zerofree  on /dev/sdb1

Running VBoxmanage


The final step is shutting down the VM and run vboxmanage with modifyhd command, using --compact option and also pointing to the VDI image :
After all complete, we have this : 

57 GB have turned into 45 GB, yielding 12 GB of additional space.

Conclusion

By running zerofree first and then compacting the VDI, we could free the wasted space in the VirtualBox VM.

Sunday, July 26, 2015

Enhancing MySQL Query Performance - July 2015

Background

This post is meant to document steps that I took in July 2015 to fix a MySQL-related performance problem. 

Problem Identification

The symptoms is that pages of one application have very large response time. First thing we check is the CPU usage of the application server, which is < 25%, meaning that there are no issue in the application servers' CPU usage. And need to check logical limits also, in this case, Apache's MaxClients.. and compare it with the concurrent HTTP connections to the server. This is also < 25%. The second part we check is the database server. Check the CPU usage.. and gotcha, it was > 70%. With application server have almost no load,  this means query executions in the database server were not optimal. Next we check the queries.

MySQL Processlist

To examine queries running in a MySQL database server, open mysql command line client, and we could check the processlist using :

SHOW FULL PROCESSLIST \G

This should be run as mysql root. But if you only have login to a normal database user, try running show full processlist too. Sometimes we are granted access to see processlist of our own queries.
However, 'Full' is not 'Full' enough, for very long queries the output will be cut. The alternative for modern MySQL servers are the information_schema.processlist table.

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST

On a busy database server that are serving multiple applications, grouping by user and state would be handy to get a brief overview of current load :

select user,db,state,count(*) from information_schema.processlist group by user,db,state;

The problem that warrants our attention is 'copying to tmp table'.  For the July events (yes, there are 2 separate occasion) high number of processes on this state indicates something wrong with the query.
Use this query to find out the full query :

select * from information_schema.processlist where state <> '' and user = 'someuser' and db = 'somedb';

This will get us a snapshot of queries that are currently being run, in which the most often query that pops up would be the query that have the most significant impact in the system at that time. 
It is sufficient if we do analysis during the time of the incident. If we need to check what queries being run in the past, we use the AppDynamics for Database product, in which we could select a time range and find out top queries in that period. MySQL slow queries log can also be used as an alternative if such tool are not available, but you need to take into account the frequency and elapsed time of each query by your own.

Verify query runtime

To verify query execution time, use MySQL client to execute select queries. If the runtime is not consistent, like : 0.4 seconds, 10 seconds, 0.3 second, 0.2 second, it may means that the query cache is being used each time we have execution time < 1 seconds. To measure actual query performance, use SQL_NO_CACHE like :

SELECT  SQL_NO_CACHE
col1,col2 from T1 WHERE ...

Case A

For the first case, the query is a simple select with one join clause. The query is very often executed, its execution drags the database server into a crawl. For each problematic query, it is a good thing to run explain plan first :

EXPLAIN SELECT a.id, a.title, a.description, a.publish_date, a.category_id, b.name, a.upload_date,   HOUR(TIMEDIFF(NOW(), a.publish_date)) as hour_dif  FROM intra_berita a, intra_user b WHERE a.upload_by = b.id  AND status = '3'  AND a.category_id = 3 ORDER BY publish_date DESC LIMIT 3;

For this case, the result is that the index related to the where condition is not being used. This problem might be related to the bug https://mariadb.atlassian.net/browse/MDEV-7084, where ANALYZE table must be run manually. Anyway the problem were fixed by running ANALYZE TABLE on the table :

ANALYZE TABLE intra_berita;

Case B


SELECT 
b.OBJIDPOSISI OBJID_POSISI,t.N_NIK,t.V_NAMA_KARYAWAN,b.V_LONG_POSISI V_SHORT_POSISI,b.V_SHORT_UNIT, b.V_SHORT_DIVISI,b.V_PERSONNEL_SUBAREA,b.V_BAND_POSISI band,(case when plan.peminat is null then 0 else plan.peminat end) peminat,b.pmu PMU
FROM 
`FT_T_RPT0001` `t` right join ( 
SELECT 
tom.OBJIDPOSISI,tom.C_KODE_DIVISI, tom.V_BAND_POSISI, tom.V_LONG_POSISI, tom.V_SHORT_UNIT, tom.V_SHORT_DIVISI, tom.V_PERSONNEL_SUBAREA, pmu.matchup pmu 
FROM 
FT_T_RPTOM tom INNER JOIN FT_T_JOBID_OBJID tjo ON (tjo.OBJID = tom.OBJIDPOSISI
AND 
tjo.C_TAMPIL = 1) INNER JOIN ( 
SELECT 
m.JOBID, m.NILAI_AKHIR matchup 
FROM 
FT_T_PMU_ITEM_SUM_ALL_50 m 
WHERE 
m.NIK='#redacted#' ) pmu ON (pmu.JOBID = tjo.JOBID) 
WHERE 
SUBSTRING(tom.C_KODE_DIVISI,1,3)<>"JVC" ) b on (b.OBJIDPOSISI=t.OBJID_POSISI) LEFT JOIN
SELECT 
p.OBJIDPOSISI,COUNT(p.OBJIDPOSISI) peminat 
FROM 
FT_T_CAREER_PLAN p JOIN FT_T_JOBID_OBJID obj on (obj.OBJID=p.OBJIDPOSISI) 
GROUP BY 
p.OBJIDPOSISI ) plan ON (plan.OBJIDPOSISI=b.OBJIDPOSISI)
WHERE 
b.V_BAND_POSISI IN ('IV', 'V')
ORDER BY 
b.OBJIDPOSISI desc
LIMIT 
10

The second case is more complicated, because it involves nested query and multiple join (4-5 tables in one query). After running ANALYZE TABLE on one of the biggest table (FT_T_PMU_ITEM_SUM_ALL_50), the runtime improves somewhat. But after the data get updated the query still runs in the range of 6-15 seconds.
What we found is two blocks of query that each takes no more than 3 seconds but when joined using LEFT JOIN it becames more than 10 seconds. Changing to JOIN improve the execution time to under 2 seconds. But of course the result is fewer than LEFT JOIN, so it is not a solution.

The workaround is to avoid LEFT JOIN. The purpose of left join is only to obtain one column, 'peminat', which is a grouped row count of FT_T_CAREER_PLAN.  So we change the application to do the row counting outside the primary query. We do n queries to only do row counting, simplifying the primary query. 

The workaround works nicely turning 8-15 seconds page into 0.5 second page. However there are many queries that use a similar LEFT JOIN clause, which the application team works one-by-one to to remove such clause and converting it into additional query.

Conclusion

Several things could became a factor for MySQL performance problems, one is that table index statistics might not get updated in a timely manner, and the second is that a complicated query could became a burden to both the database and the developer. If I remember correctly, Facebook uses MySQL server (hundreds of them) and could have blazingly fast performance because they never do a JOIN query. 

Wednesday, June 24, 2015

The mystery of TCP segmentation offload bug

There are incidents that have a generic description 'TCP segmentation offload bug' that affects multiple virtualization platforms. The workaround is the same, by disabling this feature.

Case one

Virtualization Platform : KVM/QEMU
Symptom : Periodically, guest would lose network connectivity after heavy load. Restarting the guest network doesn't fix the problem. Guest will be ok after rebooting.
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/134
Workaround :  ethtool -K eth0 tx off sg off tso off ufo off gso off gro off lro off

Case two

Virtualization Platform : Xen
Symptom : DomU hangs after network heavy load (@10 Mbyte/s).
Reference : https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978/comments/132
Workaround : disable offloading using ethtool
ethtool --offload gso off tso off sg off gro off

Case three

Virtualization Platform : VMWare
Symptom :
1. Page could not be displayed after VM migration to the same ESX host. Cisco Nexus 1000V and F5 involved before reaching the IIS VM.
2. In other incident, Cisco Nexus 1000V sending a large TCP segment causing Purple Screen of Death of the ESXi host.
Reference : https://supportforums.cisco.com/discussion/11883926/tcp-segmentation-offload-tso-and-vmxnet31000v-bug
Workaround : Turn off TSO in VM

Case four

Virtualization Platform : VMWare
Symptom :
When enabling Traffic Shaping on a Distributed vSwitch (DVS), Linux virtual machines using the VMXNET3 driver experience network throughput degradation.
Reference : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2030927
Workaround : Disable TSO and LRO in Guest VM


Conclusion

Disabling TSO and LRO might fix your virtualization network problem, whatever it may be.

Thursday, June 18, 2015

SAP System Copy Procedures

Background

My company uses SAP AG's Enterprise Resource Planning software. It as a complex system, but being developed by Germans has its advantages, for example the have many documentation in the form of SAP Notes and other stuff. One of the challenging task in using SAP's ERP is system migration and cloning. Two purposes for the migration : first, if you want to move the system to another hardware. The second, if we need a cloned system to do stuffs without compromising your original system. SAP's term for migration activity is 'System Copy'

Where to start

To find out steps needed to be done for system copy, you could read this slideshare-hosted SAP document here (Best Practices : SAP System Landscape Copy). For another definitive starting point, check out SAP's wiki DOC-8324 System Copy and Migration. I noted that SAP has released Software Provisioning Manager from which we could do system copy, my previous system copy experiences has not involved such software. Anyway lets get down to the basics.

Homogenous or Heterogenous ?

SAP differentiates  homogenous and heterogenous system copy. They states that when the OS or Database is the same, the migration can be done using homogenous system copy. When one of the database or operating system is altered, the migration must be done using heterogenous system copy, with several exceptions. For homogenous system copy, we could use either the database-independent copy procedures or the database-dependent copy procedures. Usually the database-independent is the slower procedure between the two. For heterogenous system copy, there is no option, we can only use database-independent procedure.

Documentation and documentation

According to http://scn.sap.com/docs/DOC-48323, first thing that need to be done is to download the latest system copy guide from https://service.sap.com/sltoolset - Software Logistic Toolset 1.0 - Documentation - System Provisioning, which point us to either Netweaver 7.0 System Copy Guide or Netweaver 7.1 System Copy Guide. The old Best Practices document refers to Best Practice update SAP Note 885343. The Newer system Copy Guide refers us to either SAP Note 1768158 for NW 7.0 or SAP Note 1738258 for NW 7.1., SAP Note 885343 SAP System Landscape copy, and SAP Note
82478 – SAP system OS/DB migration. I noted that the System Copy Guide doesn't contain the steps need to be done to ensure archived data can be accessed, for those we need to refer to the SAP Library Administrator's Guide ->Technical Operations for SAP NetWeaver -> General Administration Tasks -> Data Archiving  and blog DOC-7856 SAP Netweaver Application Lifecycle Management. The guide doesn't specifically note general Oracle Tuning documents (it does refer to r3load optimization for Oracle in  SAP Note 936441) but the blog refers to SAP Note 1918774 (Performance issues when running a SAP Installation / System Copy) which really should be called "Oracle Performance issues when running a SAP Installation / System Copy".

Procedures

In general, the process of system copy consists of :
1. exporting source system
2. transfer the exported data ('load') to target server
3. perform SAP installation procedures on the target system
4. perform DB load on the target system

Test run

For production systems migration, it is highly recommend to do a test run, to measure the amount of downtime needed and to ensure potential problems (as SAP declared, customer-specific problems) are detected and noted before we really do the system copy. The test run consists of :
1. export source database
2. move the export data to target system
3. import data in the target system.
This seems simple, but for large production systems is not .. terabytes of data can be challenging to move, export / import are more or so challenging. For the real migration run, the production system downtime is from step 1 to 3. SAP installation could be done before the test run.

Conclusion

The SAP documentation contains many procedures to follow when doing System copy. It is imperative that the procedures are detected and gathered before actually doing the process.

 



Sunday, June 7, 2015

Cloud Storage Price

This post is a place where I would note prices relating to Cloud Storage.

As A consumer

Amazon :
S3 (Simple Storage Service) : $0.03 /GB/month for first TB  (ref: https://aws.amazon.com/s3/pricing/)
Glacier : $0.01 /GB/month
EC2 Elastic Block Storage : SSD $0.1 /GB/month  magnetic : $0.05 /GB/month  + $0.05 /million IO

Google : (ref : https://cloud.google.com/storage/pricing)
Standard Storage : $0.026 /GB/month, $0.01 /GB/month nearline  

Microsoft sells: (ref : https://onedrive.live.com/about/en-us/plans/)
15 GB : free
100 GB : $1.99 /month (means  $0.019 /GB/month)
200 GB : $3.99 /month
1 TB : $7 / month (include Office 365)

As a provider 

SwiftStack controller : (http://itknowledgeexchange.techtarget.com/storage-soup/swiftstack-enters-software-defined-storage-race/)
subscription : $10 / TB/month (means $0.01 /GB/month)

EMC VIPR :  (http://silvertonconsulting.com/blog/2013/09/30/emc-vipr-virtues-vexations-but-no-virtualization/#sthash.CEFmpF86.dpbs)
Controller base platform subscription : $0.01 /GB/month
with Object Data Service license subscription : $0.02 /GB/month

Hardware price: 
2012 ref :  http://www.slideshare.net/joearnold/7-steps-to-roll-out-a-private-open-stack-swift-cluster-joe-arnold-swiftstack-20120417
$42 520 for 105 TB .. $0.4 / GB CAPEX
2015 Backblaze storage pod ref : https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/
$13 843 for 270 TB ..  $0.051 /GB CAPEX
Swiftstack ref : https://swiftstack.com/docs/admin/hardware.html (need to calculate for your HW prices)
HP Swift Ref : http://h20195.www2.hp.com/v2/getpdf.aspx/4AA5-2798ENW.pdf?ver=1.0
HP Proliant DL 360p with 8TBx3 LFF SAS + 100GB SSD : $12066 .. $0.5 /GB
with 4TB x3 LFF SAS + 100 GB SSD : $8466 .. $0.7 /GB
HP Proliant DL 380p with 8TBx7 LFF SAS + 120 GB SSD : $20361 .. $0.36 / GB
HP Proliant DL 180 with 2 drive cage kit, 2 TBx23 SFF + 120 GB SSD : $30602 .. $0.66 /GB
HP Proliant DL 360e with 6TBx13 LFF SAS + 120 GB SSD: $21678 .. $0.277 / GB

For HP Drives, SFF drives itself cost about $0.5 / GB, LFF drives $0.2 /GB

SuperMicro 6027R-TDARF with 7x4TB SATA + 120 GB SSD : $3953 .. $0.141 / GB (ref http://www.broadberry.com/superservers-supermicro-servers/sys-6027r-tdarf)
SuperMicro SSG 6037r-e1r16l with 15x4TB SATA + 120 GB SSD: $5765 .. $0.096 / GB
SuperMicro SSG 6047r-e1r24l with 23x4TB SATA + 120 GB SSD : $7284 .. $0.079 / GB

Thursday, April 30, 2015

The PTR DNS record, Email, SSL and You

What is the relationship of PTR DNS record, email, and SSL ?
This post will answer about that. Meanwhile, the background for this post are :
  • I am unable to make my new SMTP server instance deliver mail to a certain company's mailboxes. The company uses Trend micro's reputation list to block unwanted sender.
  • I have problems connecting to SMTP server using SSL with Android, and no problem using my Laptop and iPhone. I connect to the server using a load balancer IP address.

The analysis

First, to determine why the emails are not being delivered, I used the 'Delivery Status' display that available in the Cisco Ironport's Reports menu. It is quite useful when showing the reason why the email not being delivered.


Using the url shown in the 550 rejection message, I found out that the IP address being used for the email server is listed as having 'Bad' reputation. 


What is the meaning of DUL ? Quoting from Trend Micro website :
Dynamic User List (DUL) includes IP addresses in dynamic ranges identified by ISPs. Most legitimate mail sources have static IP addresses.
What that suppose to mean? It seems that the IP given to the smtp server is identified as dynamic IP range given from an ISP. I submitted a request to remove the IP from DUL. The nice person (or is it a bot? I don't know) from Trend Micro suggested to create PTR record that matches the A record given for the IP. That is, for each A record that states mysmtp.server.com is 1.2.3.4, we must create PTR record that states 4.3.2.1.in-addr.arpa is mysmtp.server.com. Of course it is not that simple if you are in a large organization, finding out which dns server to put the PTR and who is in charge for it is quite a challenge. Hint : its in your ISP who owns the IP range, and not the DNS server that has the authority of the server.com domain. After creating the reverse PTR record, replying to Trend Micro and asking them to recheck the IP, waiting a few hours, then there is no problem delivering email to the said company's servers.

For the second case, the symptom of the connection problem to said SMTP server is quite strange. There are no problems using Laptop, either using local WiFi and using internet Wifi from a GSM operator. As for Android tablet, we could connect using SSL to the load-balanced virtual server if we are using our internal WiFi network, and also WiFi from a GSM provider. But switching to 3G / HSDPA caused connections from the Android tablet to fail ('Cannot safely connect to server'). At first I worry about misconfigurations in the load balancer side, which could be tricky. But the same configuration works using local Wifi, if the LB is misconfigured then local Wifi shouldn't able to connect to the SMTP server either.

Enlightment comes after comparing packet traces using 3G with packet traces using Wifi. I used tPacketCapture for Android. That and a calm mind is needed for me to be able to do the next analysis.
Unsuccessful SSL connection
The client (10.8.0.1) established a three way handshake which are completed in frame 48 with the 'Ack' from the client. Compare the packet capture above to the healthy packet capture (which is very useful technique IMHO ) below :
Successful SSL connection
After the three way handshake (which finished in packet 31), the client immediately initiates TLS layer 1 handshake. Which is different from the unsuccessful capture above, where the client idles for a long time, and the server closes connection in frame 54 (with F/finish flag). Why does the client idles for that long ? The TLS Hello doesn't happen until about 28 second later in frame 73. What did the client do in the mean time ?
I set the packet filter to '(FrameNumber > 48) && (FrameNumber < 73)' to check (I am using Microsoft Network Monitor to view the captured packets).


We found that there is a DNS Query in frame #51,#53, #58, which all unanswered.
So the client does a reverse DNS lookup, which is looking for PTR record, that corresponds to the server's IP address. When it timed out, the client closes the connection. So the issue is that PTR records doesnt exist yet for load balancer's public IP and the GSM/3G provider's dns server is awfully quiet about that, causing time outs and connections not being made. We will try again after registering a PTR record and an A record for the load-balanced SMTP server.

The conclusion

If you are going to enlist A record to map a domain name to IP address, please be sure that you create a matching PTR record too in the right DNS server.
Android's SSL client does a DNS reverse lookup before we can establish SSL server.

Friday, March 6, 2015

Learn Rails on Windows, part 1

Background

Ruby on Rails is a very well known framework, inspiring many frameworks created following Rails ways of organizing a web application. The frameworks I have used that are similar to Rails are :
  • In PHP : CakePHP, CodeIgniter, Yii Framework 
  • In Java : Spring Roo
And it seems also being used in Openshift Origin for the implementation of Origin's web console and broker API. That and combined with our need to run Openshift Origin in our private cloud seems a good reason to learn Rails, 

The tutorial

Currently I am following the Ruby on Rails tutorial by Michael Hartl (link). The book strongly encourages using Linux based environment for development, but I am used to work from my Windows machine, so I tried to do development from my Windows Laptop.

Preparation

I installed the x64 version of Ruby 2.0.0 installer in http://rubyinstaller.org/downloads/ for my Windows 8.1 system. I have cases that the 32 bit installer doesn't work for 32-bit Windows XP (I know, this is one ancient platform), the workaround is to use Ruby 1.9.3 instead. 

In order to follow the tutorial, we need to be able to run rails command. Start Ruby command prompt, and install rails.

gem install rails

If you want a specific version of Rails, like me, you can descibe which version you need :

gem install rails -v 3.2.16

We also need bundler :

gem install bundler

Several native plugins requires Ruby Development Kit to be installed (for x64 ruby 2.0, this is DevKit-mingw64-64-4.7.2-20130224-1432-sfx.exe ). 
MySQL gem also requires either MySQL connector or MySQL server to be installed, and gem installation needs to refer to the include and lib directories as well :

gem install mysql2 -- '--with-mysql-lib="c:\Program Files\MySQL\MySQL Server 5.5\lib" --with-mysql-include="c:\Program Files\MySQL\MySQL Server 5.5\include"' 

Rails template for Origin


I used https://github.com/openshift/rails-example as the starting point of my application. Just use the github url for the reference when creating the application in Origin. But if your private cloud doesn't have access to the internet, then you need to clone and force push :

git clone https://github.com/openshift/rails-example railtest
cd railtest

Install all requirements for your local (i.e. Windows) system  :

D:\SourceCode\www\railtest>bundle install
DL is deprecated, please use Fiddle
Using rake 10.1.0
Using i18n 0.6.9
Using multi_json 1.8.2
Using activesupport 3.2.16
Using builder 3.0.4
Using activemodel 3.2.16
Using erubis 2.7.0
Using journey 1.0.4
Using rack 1.4.5
Using rack-cache 1.2
Using rack-test 0.6.2
Using hike 1.2.3
Using tilt 1.4.1
Using sprockets 2.2.2
Using actionpack 3.2.16
Using mime-types 1.25.1
Using polyglot 0.3.3
Using treetop 1.4.15
Using mail 2.5.4
Using actionmailer 3.2.16
Using arel 3.0.3
Using tzinfo 0.3.38
Using activerecord 3.2.16
Using activeresource 3.2.16
Using coffee-script-source 1.6.3
Using execjs 2.0.2
Using coffee-script 2.2.0
Using rack-ssl 1.3.3
Using json 1.8.1
Using rdoc 3.12.2
Using thor 0.14.6
Using railties 3.2.16
Using coffee-rails 3.2.2
Using jquery-rails 3.0.4
Using minitest 5.1.0
Using mysql2 0.3.14
Using pg 0.17.0
Using bundler 1.8.3
Using rails 3.2.16
Using sass 3.2.12
Using sass-rails 3.2.6
Using sqlite3 1.3.8
Using uglifier 2.3.2
Bundle complete! 11 Gemfile dependencies, 43 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.

We could run the local development server using rails server:

D:\SourceCode\www\railtest>rails server
=> Booting WEBrick
=> Rails 3.2.16 application starting in development on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
[2015-03-07 09:39:34] INFO  WEBrick 1.3.1
[2015-03-07 09:39:50] INFO  ruby 2.0.0 (2014-05-08) [x64-mingw32]
[2015-03-07 09:39:50] INFO  WEBrick::HTTPServer#start: pid=5684 port=3000

Conclusion for Part 1

Running rails in Windows is not an out-of-the box experience, but not so difficult given a lot of clues in StackOverflow.

Sunday, January 11, 2015

Processing CSV Files using Hive / Hadoop / HDFS

Background

When there is a need to process large-sized CSV files, Apache Hive became a good option since it allow us to directly query these files. I will try to describe my recent experiences in using Apache Hive. In this case I need to group the rows and count the rows for each group. I will compare to my existing systems using MySQL database, one built using PHP and other built using combination of Pentaho and PHP.

Installation & Configuration

We have many components of Hadoop-Hive-HDFS ecosystem :
  • HDFS : Namenode service, Datanode services. 
  • MapReduce : ResourceManager service, NodeManager services
  • Hive 
  • ZooKeeper
Each component have their own configuration file (or files), and their own log files. 
For simplicity, in my opinion nothing beats the Apache-MySQL-PHP stack. Minus points for Hadoop-Hive-HDFS in complexity standpoint. I think we need additional management layer to be able to cope with complexity, maybe like Cloudera Manager or Apache Ambari, which I haven't explored yet.

My experiences here are the result of using Vagrant to provision 4 VMs like I have described in my previous post.
The Hadoop ecosystem consists of many Java components. Be prepared to review the configuration of each one. For example, the Zookeeper in my server is configured for maximum 50 connections.

The load process

The CSV files to be processed are stored in a directory. For my existing systems that uses MySQL database, the first processing to be done is parsing the CSV files and performing inserts into MySQL database table. But using Hive, we only need to upload the files to HDFS :

hdfs$ hadoop -put filename2014????.log /user/myuser

And then load the file into the 'staging' table :

hive> LOAD DATA INPATH '/user/myuser/filename20140101.log' INTO TABLE my_staging_table PARTITION (batch_id=20140101);

Due to limited time I haven't explored Hive automation from PHP or Java. So I need to run the statements for each file that need to be processed. But the statements get executed quickly, since there are no row processing done here.

My existing systems need to perform row field splitting just to be able to create insert statements. So: plus points for Hive in the efficency part, being able to quickly load the files into the Hive tables.

Hive  is able to ingest CSV files that ends with normal line endings very quickly 

The map and reduce processing

First I tried to follow the same process as my PHP-based system, that is creating additional field to the existing rows that are derived from existing field. This additional field becames one of the important grouping key. Because Hive tables are can only by appended, and somewhat immutable for all other purposes, I created a new table to contain the augmented rows.

hive> CREATE TABLE my_stage2_table (... ) PARTITION BY (batch_id) STORED AS SEQUENCEFILE;
hive>set hive.exec.dynamic.partition.mode=nonstrict;
hive> INSERT INTO TABLE my_stage2_table PARTITION (batch_id)  SELECT ...with additional column using UDF functions.. from my_staging_table;

The process is quite a long time, but definitely several times faster that my existing PHP process. It is about 40 minutes, vs 8 hours for my existing MySQL-PHP system in the same server. I found that the ResourceManager web app is very useful to check running processes (containers). I need to code a lot in PHP to get similar process visibility. But error cases are not uncommon, when processes hung or get terminated and I were unable to quickly check the  reason for failures (it is not shown in the Hue UI). In some cases I need to hunt for clues in the log files (there are many, because there is multiple VMs and components). Maybe installing ELK stack or Splunk for log file analysis would help, which I haven't done yet. Maybe my install is not a good one - because clicking in some log links in the web consoles (such as Resource Manager's web in port 8088) result in error messages.
I cannot determine my opinion on this. It is either difficulty of configuring Hive stack caused me trouble in this part, or that there are many edge cases where the Hadoop ecosystem will not became user friendly. 

If there is trouble with your MapReduce process, and the Hue UI gives very little clue, you might want to check other Hadoop web consoles, and maybe you need to check all of the log files existing in the system (hive-server2, yarn and zookeeper comes to mind)

I also find very peculiar that Hive's compression setting is stored in the session and not the metadata :

SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;

But I have trouble using the compression options above, I restarted the VMs and skip the compression enabling step to be able to run the INSERT INTO above correctly. 

The summarizing part is done on the stage2 table, creating a new summary table :

hive> CREATE TABLE summary_table AS select count(*) cnt, key1, key2,.... from my_stage2_table group by key1,key2,..

The summarizing part runs quickly, about 5 mins using few key fields, near 10 mins using the real set of keys we need in production.


In the second iteration, I remembered that the smaller data we need to write to hard disk, the faster any process will perform. So I forgo the staging table and go direct to summarizing using expressions for the group by clause, and still get 10 minutes of running time.

hive> CREATE TABLE summary_table AS select count(*) cnt, key1, key2,..,key expression 1 as keyname1,key expression 2 as keyname2 from my_stage2_table group by key1,key2,..,key expression 1, key expression 2

Hive can be very fast in a single server hardware if we manage to use processes that writes small amount of output
I noted that memory capacity of Hadoop nodes are a bit tricky to set up. First, missing entry in the configuration file about the memory results in 8 GB default, which will cause strange behaviors when we only have 2 GB VM RAM and the volume of data that need to be processed are quite large. Allocating too small memory for Hadoop MapReduce/Yarn will make the CPU cores underutilized. However, because of single HDD in the test server (which makes the disk I/O capability not in the same level with enterprise servers ), underutilization of the CPU cores doesn't have much difference.

 Post processing

After the summary table gets created, the table is exported to CSV and processed using Excel Pivot table function for analysis. Because the million rows is now summarized into hundreds of rows, Excel could easily process them.

Summary

Apache Hive's performance for the CSV data processing is several times that of custom PHP application using MySQL. However the user need to aware that the complex configuration might cause difficulties, namely in troubleshooting for errors and configuring optimal RAM/CPU core to be used by the VM Hadoop nodes.


Reference :


Friday, January 9, 2015

Openshift Log Aggregation And Analysis using Splunk

Splunk is one of popular tools we use to analyze log files. In this post I would describe how to configure an openshift cluster to send all of the platform log files (mind that this excludes gear log files) to Splunk.

Configure Splunk to listen on TCP port

From splunk web console home, choose 'Add Data', 'monitor', 'TCP/UDP', fill in port 10514 (TCP), click 'Next', select sourcetype Operating System - linux_messages_syslog.

Configure Rsyslog Forwarding

These steps should be done in every openshift node, openshift broker and console.
As root, create an /etc/rsyslog.d/forward.conf file  as follows (change splunkserver to your splunk server IP, and the @@ means TCP, instead of @ for UDP)

$WorkDirectory /var/lib/rsyslog # where to place spool files
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g   # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList   # run asynchronously
$ActionResumeRetryCount -1    # infinite retries if host is down
*.* @@splunkserver:10514

And I don't want watchman metrics to fill up local log files, so I discard the metrics before writing to local file. Edit /etc/rsyslog.conf :

#kern.*                                                 /dev/console
:msg, contains, "type=metric"                           ~
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages


And in my system SElinux block the rsyslog remote connect activity, so we need to enable allow_ypbind :

setsebool -P allow_ypbind 1

Restart the rsyslog system

service rsyslog stop
service rsyslog start

Configure Node Logging

Edit the /etc/openshift/node.conf and add these lines if they haven't already existed:

PLATFORM_LOG_CONTEXT_ENABLED=1
PLATFORM_LOG_CONTEXT_ATTRS=request_id,container_uuid,app_uuid
PLATFORM_LOG_CLASS=SyslogLogger
# enable metrics, beware of log volume increase
WATCHMAN_METRICS_ENABLED=true
# select fewer metrics
CGROUPS_METRICS_KEYS="cpu.stat,cpuacct.stat,memory.usage_in_bytes"

Restart the mcollective service.

service ruby193-mcollective stop
service ruby193-mcollective start

Configure Broker Logging

Edit the /etc/openshift/broker.conf and add this line :

SYSLOG_ENABLED=true

Restart broker 

service openshift-broker stop
service openshift-broker start

Configure Web Console Logging

Edit the /etc/openshift/console.conf and add this line :

SYSLOG_ENABLED=true

Restart the web console

service openshift-console stop
service openshift-console start

Epilog

We could analyze, for example, gear memory usage :


And gear create/destroy activities :