Friday, December 28, 2012

Nginx and Linux Kernel Update Blues

Background

We have a three-server landscape that serves our internal applications. All of them were running nginx on RHEL linux. The nginx were installed from EPEL repository packages. Two days ago I updated the nginx in one server (lets say server A) in order to repackage it after enhancing it with HttpSubsModule, during the compilation process it complains about missing kernel functionality. For the compile to be success, I updated kernel-devel and kernel packages (in server A) to the latest one. After the compilation succeeded, I copied the resulting rpm to server B and installed it.

The problem

After reinstalling the rpm package in server B, I didn't remember to restart the nginx server. After installing new kernel in server A, I havent restarted it either. But the next day, because of one thing and another, server A got restarted, and the application on server A no longer works. Server B's nginx server also got restarted, and afterwards unable to serve any web pages (not even fixed pages). So we got two servers down out of three. Thats not good.

Investigation : Eventfd not implemented

Investigation in nginx error log shows that server B's nginx server complains about failed evenfd() calls (ref) :
eventfd() failed (38: Function not implemented)
I suddenly recall this problem already happens before on server C, that the cause was newer nginx package needs a backported kernel feature. The solution is install latest kernel and reboot.
yum update kernel
reboot
The root cause of this is that the nginx EPEL package doesn't correctly specify  the true minimum kernel version required for it to work.

Investigation: SELinux prevents text relocation

After got the kernel update, both server A and B still unable to serve the application pages.
Browsing nginx and php's error log shows that the php-oci8 extension is not loaded (needed to connect to oracle which is being used as database backend of our legacy applications).
The clue why the extension failed to load were found in /var/log/messages, it seems that SELinux complained that the oracle's dynamic library (libclntsh) is trying to do code relocation and failed (see redhat bug report here and article about PHP, SELinux, and Oracle here).
setroubleshoot: SELinux is preventing php from loading /opt/oracle/instantclient/libclntsh.so which requires text relocation. For complete SELinux messages. run sealert -l ac5924e8-3af4-4f26-8d79-e5eedf9d9d7a
The solution is by giving texrel_shlib_t SELinux type to all oracle instant client dynamic libraries.
chcon -t texrel_shlib_t *.so
chcon -t texrel_shlib_t *.so.10.1
You could see the resulting attributes on all files using
ls -lZ
 Why this haven't occured before the kernel update? Maybe SELinux permission is gotten more restricted in recent kernels. But I am not certain about that. I wonder whether SELinux attributes could be specified in rpm packages. If that is so, a better solution for the world is that to build a better rpm package containing instant client, oci8,  and required SELinux permissions. But I wonder would that violate Oracle's EULA..

 Afterthought

Thinking in more generic frame of mind, these incidents occured because we update  the kernel and the nginx package. Is it wrong? No. The best practice on server administration is that all security updates is applied as soon as possible. But we better plan to do updates outside application's busy hours, so if problems crop out we have sufficient breathing room :)

Tuesday, December 25, 2012

Workflow Approval Dasar dalam Joget

The case

Dalam pembuatan aplikasi enterprise, banyak aktivitas yang perlu dibuat form elektroniknya. Umumnya form ini perlu menggunakan persetujuan sebelum diakui oleh perusahaan. Aplikasi jenis ini ialah aplikasi berbasis workflow, akan lebih terstruktur dan maintenable jika dibuat memanfaatkan workflow engine yang sudah ada. Salah satu workflow engine yang open source adalah Joget Workflow. Artikel ini mencoba menjelaskan tahap awal pembuatan aplikasi workflow yaitu pembuatan diagram workflow.

Skenario yang didukung

Dokumen request dibuat oleh aktor pembuat, kemudian disubmit ke aktor approver (umumnya atasan). Tiap aktor diperbolehkan melakukan reject dokumen. Approver dapat melakukan pengembalian dokumen (return) atau persetujuan (approve).

Basic Workflow

Pada workflow ini, Pembuat dapat melakukan aktivitas 'BuatDokumen'. Tombol aktivitas yang dipilih (submit , reject/cancel, atau return) diisikan ke variabel 'Keputusan'. Variabel ini menjadi dasar bagi pengaturan aliran. BuatDokumen dibuat ada dua box, sebenarnya cuma satu, tetapi Workflow Engine melarang aliran (panah) ke starting activity selain dari start. Sehingga dibuat BuatDokumen kedua untuk memfasilitasi dikembalikannya dokumen dari atasan.

Variasi - Beda variabel

Variasi ini menggunakan variabel KeputusanPembuat untuk menyimpan aksi pembuat dan variabel KeputusanApprover1 untuk menyimpan aksi approver1. 
Keuntungannya ialah debugging flow lebih mudah, karena dasar dasar keputusan tersimpan di variabel yang berbeda.
Kerugiannya ialah tiap panah harus dicek bahwa telah menggunakan nama variabel keputusan yang sesuai.
Jika variasi ini tidak digunakan, yaitu hanya satu variabel workflow yang digunakan, isi variabel keputusan pembuat akan tertimpa keputusan approver1. Tiap pemilihan keputusan sebenarnya dapat direkam di luar Joget, sehingga menyederhanakan workflow dengan penggunaan hanya satu variabel Keputusan.

Enhance - Otherwise

Ketika tidak ada klausa otherwise, dan workflow engine mendapatkan isi variabel tidak memenuhi semua kondisional, maka workflow akan diterminasi. Satu cara untuk mencegah terminasi yang tidak diinginkan (misalnya karena ada bug di front-end sehingga variabel keputusan tidak terisi) ialah menambahkan klausa otherwise ke semua kondisional.




Monday, December 24, 2012

Nginx Http Subs Module for CentOS - Packaging Howto

The case

I recently involved in a reverse proxy project using nginx as reverse proxy server. Turns out that nginx has a built in HttpSubModule that allow us to replace urls in http stream, which is a very important requirements for us.  But the problem is that the HttpSubModule only allow one replacement per location. 
After a few searches, found that an additional HttpSubsModule (notice the additional s)  will do the task, allowing multiple replacement per location.The nginx wiki is kind enough to provide installation instructions for HttpSubsModule, but provides no rpm package.

DISCLAIMER: This blog post shows step by step tutorial to produce a RPM package file. If you only interested in installing nginx with HttpSubsModule, please jump to the last heading 'Installation'. But if you're not on Centos 6 x86_64, maybe you really should follow all the steps anyway.

Repository Hunt

I prefer repository packages other than compiling manually. We found that EPEL repository carries nginx 0.8.55 for Centos 5 (here) or 1.0.15 for Centos 6 (here), but unfortunately these are compiled without HttpSubsModule. Nginx site points us to a repository at http://nginx.org/packages/centos/ which carries nginx 1.2.6, but this is too without HttpSubsModule.

SRPM install

First to install the source RPM for nginx  (do this as a normal user other than root) :
wget http://nginx.org/packages/centos/6/SRPMS/nginx-1.2.6-1.el6.ngx.src.rpm
rpm -i nginx-1.2.6-1.el6.ngx.src.rpm
 Prepare your system for rpmbuild and install devel packages (do these as root):
yum install rpm-build
yum install zlib-devel pcre-devel openssl-devel


Rpmbuild

Time to bite the bullet and rebuild nginx package.
Rpmbuild is a tool to build rpm. It is a packaging tool with functionality similar to Java's Ant, but albeit a complex one. In the executing rpmbuild we executed these phases:
  1. %prep stage
  2. %build stage
  3. %install stage
The spec file provides similar function such as ant's build.xml, that is specifying build steps and sources.

After wrestling with some rpmbuild online documentation, I found that essentially I need to (do these as a normal user):
  • download the http subs module sources using git, compress it into tar gz format, and host it in a website somewhere
  • add additional source line  in nginx.spec, and also download the tar gzipped file to ~/rpmbuild/SOURCES directory (rpmbuild expects the tar gz file exists with the same name as the URL given below, it won't download it for you)
Source5: nginx.vh.default.conf
Source6: nginx.vh.example_ssl.conf
Source7: nginx.suse.init
Source8: http://zzz.yourwebserver.com/httpsubs.tar.gz
  • add additional %setup invocation to extract the httpsubs.tar.gz. The -q means quiet (disable tar's chattiness), -a 8 means extract source 8 after cd to package directory, -T means don't extract source 0 again,  -D means dont delete top level directory before extract. Really complex line for simple behavior.
%prep
%setup -q
%setup -q -T -D -a 8
  • add clauses to the configure command (there are two of them):
         --with-http_addition_module \
        --with-http_sub_module \
        --add-module=ngx_http_substitutions_filter_module \        --with-http_dav_module \
        --with-http_flv_module \
And run rpmbuild (still as user), pray that there are no errors :
  • rpmbuild -ba nginx.spec
Voila, we got at rpm files at /home/username/rpmbuild/RPMS/x86_64.

Installation

If you don't want to create rpm file, but only want to install one, and it happens that you're running Centos 6 on x86_64 platform, feel free to download my resulting rpm file at https://sites.google.com/site/yudhiwsite/files/nginx-1.2.6-1.el6.ngx.x86_64.rpm?attredirects=0&d=1
From the resulting rpm file (whether from download or following steps above), install it (as root):

rpm -i nginx-1.2.6-1.el6.ngx.x86_64.rpm
Thats all.. I hope these steps works for you too.

Sunday, December 2, 2012

Useless combination of logical expressions

Recently I stumbled upon this SQL where clause :
status <> 2  OR status <> 0
or in equivalent form for you PHP developers :
(status != 2 )  || (status != 0)

To analyze why such combination are useless,
Lets define two variables to simplify things :

A = (status <> 0)
B = (status <> 2)

And draw it on one table :


 Then I wonder, what good does that do ? It always evaluates to true..
It only make sense if we combine the status comparisons with the and operator :

Wednesday, October 3, 2012

Case of Session Identifier not Updated

Background

Rational AppScan is an automated web testing tool that can be used to produce reports of web application vulnerabilities. So we usually use it to ensure our apps well protected before releasing them to the wild internet.

The problem

The problem with detection tools is that it sometimes raises a false alarm - such as when it declared that session identifier not updated :

[1 of 2]  Session Identifier Not Updated
Severity: High
Test Type:  Application
Vulnerable URL:  https://myinternalapp.com/application name/  
Remediation Tasks: Do not accept externally created session identifiers
Variant 1 of 1  [ID=26]
The following may require user attention:
 My normal reaction, because the app is a Yii framework-based PHP application, is that I should add Yii::()-app->session->regenerateID() call during login action.

Imagine my surprise that upon retesting using Rational AppScan, it spits errors like these :
Stopping scan due to out of session detection
I vaguely recall that the reason for errors like this is that the session ID in the HTTP header are unlike what the tool expected.

Analysis

So I opened Firefox, and started to check for the HTTP Header anomalies. Examining Firebug's output, it seems that during each login there are two session ID header issued in the HTTP Response by the PHP application. Both of them are different from the session ID sent in the HTTP Request. So the Rational AppScan is confused when there are two session ID header issued.
Further search reveals that Yii framework's login mechanism already issue regenerateID calls, doubling my effort in the actionLogin function.
If the application already regenerate the session ID, why on earth it reports that session ID not changed ?
Then I realize.. the application is being tested twice, first without logging in and second with a login. So during the first phase the tool don't know what user/password should be tried to login, and it naively apply blank username and blank password. And naively expecting that session ID be regenerated. Which doesn't happen because the login is not successful...

The Cure ..(Updated)

For now, I will add a regenerateID call IF the login _failed_. Lets see tomorrow whether it could shut up the testing tool. UPDATE : Seems that it is not the case. It still complains that the session ID is not regenerated. Debugging alternative flow paths in the controller shows that blank username and password makes $_POST['LoginForm'] empty.
So I add another regenerateID call even if the $_POST['LoginForm'] is empty. This do the trick.

Monday, September 10, 2012

SAP Business (Data) Warehouse

Hari ini hari kedua pelatihan SAP Business Warehouse. Dulu sudah pernah dapat pelatihan serupa dari SAP, kini yang membuat materi dan menjadi instruktur ialah dari salah satu vendor SAP. Bagi yang belum pernah mencoba, silakan ketik tcode RSA1 di sistem SAP BW anda. Jika bener-bener terpasang BW, maka akan muncul Data Warehousing Workbench.

Meskipun ini sudah kedua kalinya, saya temukan SAP BW itu sebenernya cukup rumit. Memang dia  flexibel. Kerumitan pertama ialah bahwa dia punya banyak istilah-istilah yang tidak umum, bahkan bagi orang yang biasa kustomisasi SAP dan develop aplikasi. Memang typical SAP ialah membuat istilah dan memaksa orang untuk memahaminya..
Tabel padanan berikut ini mungkin membantu :
-> InfoObject karakteristik = Tabel. Di SAP BW kita bisa seenaknya membuat tabel dengan istilah 'InfoObject'. Ini adalah tabel tabel yang bisa dijadikan Dimension. Tabel tabel InfoObject bisa dijadikan sasaran foreign key dari tabel fact.
-> PSA = Penampungan Sementara [Persistent Staging Area]. Hasil penarikan data dari file atau sistem lain tidak langsung diproses melainkan masuk ke PSA dulu. Dan jika anda menarik data berkali-kali, data PSA mungkin akan terdobel-dobel. Sempat bingung  ketika gak sengaja merun ekstraksi data dua kali, pengolahan data berikutnya error terus. Dan seingat saya pada pelatihan pertama saya juga menemui masalah sama (isi PSA dobel-dobel).
-> Data Transfer Process = Aliran data di DFD. Pada tool ETL lain seperti Pentaho Data Integrator atau Talend DI aliran data dilambangkan oleh garis. Namun di SAP, DTP disimpan sebagai icon yang ada di sebuah tree. DTP adalah aliran data dari PSA ke InfoObject atau entitas SAP BW lainnya.
-> InfoPackage = Penjadwalan untuk menarik data dari sumber ke  dalam PSA.
-> Transformation = Mapping. Untuk ini SAP menyediakan tool graphical untuk menarik garis dari satu kolom sumber ke kolom tujuan. Tool SAP cukup baik, dan lebih responsif daripada GUI berbasis Java milik Talend. Saya tidak tahu kalau dibandingkan dengan Pentaho.
-> Data Source analog dengan PSA. Jadi satu data source akan memiliki satu PSA.
Dari Transaction RSA1, kita bisa melakukan semua pendefinian proses extraksi data dan transformasi. Hal yang kurang jelas bagi saya ialah  bagaimana navigasi yang efektif untuk melihat isi PSA, melihat isi tabel InfoObject yang sudah diupload oleh DTP. Dan juga bahwa kita harus sering-sering melakukan aktivasi tiap kali merubah screen tertentu. Kadang tombol aktivasi tidak aktif padahal object sudah modified, dan versi aktif yang tercompile belum sama dengan versi modified. Solusinya ialah pura-pura change baru klik aktivasi.

Persistent Staging Area (PSA)

Cara melihat isi PSA ialah mengklik kanan pada Data Source dan kemudian klik Manage. Umumnya di SAP selalu ada beberapa cara untuk menjalankan sebuah fungsi, namun khusus untuk yang ini saya baru temukan satu cara.
Jika ternyata isi PSA ada lebih dari satu set data, maka kita perlu hapus yang tidak perlu. Pilih yang hendak dihapus dan klik kanan, pilih 'Delete Request from DB'. 
Menurut saya SAP memberikan tool yang cukup rumit ketika kita sampai harus menghapus PSA. Arsitektur tool Data Integration lain yang saya kenal tidak menyimpan state di elemen-elemen perantara sehingga pengguna tool tidak perlu maintain di level ini.

InfoPackage dan DTP

SAP membedakan InfoPackage dan DTP, karena keberadaan PSA. Ini membuat keadaan jadi lebih rumit. Padahal kedua-duanya sama-sama aliran data, yang dalam tool lain cuma digambar sebagai garis. InfoPackage adalah aliran dari data sumber menuju PSA. DTP ialah aliran dari PSA menuju tabel-tabel data warehouse seperti InfoObject.
SAP memberikan monitoring yang cukup lengkap. Namun memang jadi banyak layar yang harus dilihat. Untuk monitoring InfoPackage dan DTP dapat kita klik icon Monitor (gambar osiloskop) di toolbar baris kedua (ketiga kalau baris navigasi TCode kita hitung) ketika kita sedang memilih InfoPackage/DTP.
Dari Monitoring DTP misalnya, kita dapat mendobel klik hasil proses terakhir, dan di Toolbar akan muncul icon "Administer Data Target". Ini adalah jalur navigasi yang cukup bagus namun kurang obvious, maka setelah kita jalankan DTP, dan melihat hasilnya sukses, kita dapat langsung melihat tabel InfoObject hasil proses DTP tersebut.

Sekian bagian pertama tentang SAP data warehouse. Mungkin saya akan menulis yang kedua, tapi tidak janji ya..

Recovering from Deleted Row Disaster in MySQL

This post is dedicated to accidents that every so often resulting in deleted rows of data.
Of course everyone should have backups. But things do happen when the backups nowhere to be found, or not being done often enough, or we're storing data in a forgotten database server.
In the event of such incident.. deactivate your apps. Prevent unnecessary writes to db that could make your data overwritten. In case of oracle database, you could try flashback query that assume the transaction is still fresh in the redo log. In case of mysql database, copy the whole data directory into a safe location, where we would try to dump deleted rows using Percona Innodb recovery tool.
But such methods failed miserably when we found out about the accident much too late. I think every person should be responsible for their actions, even if the action is about deleting data. But the consequence of mass-delete actions is very different from single-row deletes. That is why mass delete function must be avoided at all cost.. just kidding. There are few legitimate reasons to provide mass delete functionality, and we should always try to help our users to do their tasks. But if there is such functionality, we must : a. ensure undo mass delete functionality is implemented as well, or b. ensure informations deleted is saved into some other tables ( folders) upon mass delete executions, or c. ensure the informations deleted is still remained in the system in another form (another table stores similar data and not affected by delete command in the first table). The functionality to actually recover the mass deleted data could be left unimplemented until the event of the accident, but you must be certain that the information stored during mass delete are correct and sufficient for recovery.
The similar rules apply to mass update functionality. The safeguards must be implemented, snapshot of changed data should be serialized and stored in a different database table. 
I hope if there ever such an accident to your app, recovery could be done and no work would be lost.

Saturday, September 1, 2012

Troubleshooting Enterprise App Performance

Have you ever deployed an application, and  find out that its real world performance is less than what you expect? Maybe you haven't got time to do proper load testing, or maybe the production environment have different performance characteristic. Fear not because for some common issues we could still improve performance even when time running out. In this blog post I will try to create conceptual framework for troubleshooting enterprise application performance. Enterprise application in this context is actually synonymous to information system (as Martin Fowler has said in his Patterns in Enterprise Application Architecture). I will use three application that I have engineered as examples. The first, application P, is primarily used for viewing the result of complex business logic calculation, have transactional features but the transactional function usage is 1 : 10 to the view/report function. The second, application C, is primarily used in transactions. The third, application S, is connected to three external systems and three database systems. I will explain what steps I have done to improve the performance of these 3 apps. All three is in production, application P have about 20000 potential users, application C have about 20000 users, application S have only about 50 users. Daily data volume for P is extremely small, for app C is small, for app S is medium.

Performance Hypotheses

There is a rule of thumb when improving performance, namely the pareto rule : that 80% of application time is spent in 20% part of the entire system. That means our first objective is to identify system bottlenecks. Emphasis here, there might be more than one bottlenecks and removing all of them might not be feasible in terms of time or cost.
For simple systems, such as when the app only connects to a single database, we might put one hypothesis that the bottleneck is in the database. And write another hypothesis that the bottleneck is in the application server.
For other systems, it is not that simple, such as with application S which has more than one external system interface and more than one database connected.
So for first step in the framework we enumerate all systems involved in the operation of the aplication. All databases, all services, pools, etc. Each system is one bottleneck candidate.

Step 1. Enumerate all system involved, for each system create a hypothesis that the system is the one causing performance bottleneck

I have 3 application as an example. Let me write the hypothesis for each application.
Application P. Only connected to an Oracle database and LDAP directory. so the hypotheses are :
P1. Bottleneck in the PHP Web server
P2. Bottleneck in the Oracle DB
P3. Bottleneck in the LDAP directory
P4. Bottleneck in interconnection between systems (networking)
Application C. Only connected to a MySQL database that replicated back to back. The hypotheses are :
C1. Bottleneck in the PHP web server
C2. Bottleneck in the MySQL DB
C3. Bottleneck in the LDAP directory
C4. Bottleneck in interconnection between systems (networking)
Application S. Connected to SAP ERP system, Joget workflow service, two MySQL database, one Oracle database.
S1. Bottleneck in PHP web serverS2. Bottleneck in SAP ERP system
S3. Bottleneck in Joget workflow service
S4. Bottleneck in MySQL A
S5. Bottleneck in MySQL B
S6. Bottleneck in Oracle DB
S7. Bottleneck in interconnection between systems (networking)
Each host/system has each performance characteristics that might contribute as application bottleneck. In general for each host, we have :
  1. CPU usage. CPU bottleneck might occured if CPU usage > 70 or 80%
  2. Memory usage. Memory could became bottleneck if swap is > 0 AND swap activity per second is high.
  3. Disk usage -> this is much less related to performance. If free is 0, the host is disabled in some aspect.
  4. Disk IO activity -> this is more likely related to performance. 20 .. 40% IO wait already indicates some kind of disk IO bottleneck.
  5. Network usage -> in some conditions this could impact performance 
For database hosts, in addition to these 5 characteristics, we have :
  • query execution bottlenecks. High disk IO is an indication of such bottleneck.
  • query parsing bottlenecks. High CPU is an indication of such bottleneck
Each database system might have tools, built in or additional, that help us to detect bottlenecks.

Diagnosis

From each hypothesis we could do test or checks that could strengthen or weaken the hyphotheses. Example, do repeated load testing while watching sql queries from a dba session. If there is a query bottleneck then we would find out from the most often sql text shown in database sessions monitoring. If the web server CPU is high then the bottleneck is more likely in the application script parsing.

Step 2. Enlist more specific performance bottlenecks and how could we test or check for such bottleneck.

Not all hypotheses I could check because limitation of what is being allowed for me to do in each system. Let me enlist some diagnosis steps I have done :
P1. check :  do a 'top' in the linux web server to check for CPU bottleneck while load testing the application using apache bench ('ab'). watch for high cpu (%us), memory use, and I/O waits (%sy and %wa). I have to change the application to ignore authentication to make the test easier.
P2. check :  monitor database sessions while load testing running repeatedly. The most often SQL query shown is identified as query bottleneck.
For application C, is similar:
C1. check : use 'top' command. Because web and database  in the same system, watch for high cpu in the php-cgi process and mysqld process. For other configurations might want to watch for apache processes.
C2. check a: connect using mysql's root user, do a show full processlist while the application is being used by more than 100 users. Actual queries that frequently shown is identified, but unable to acted upon because there is too many query, and no query specific improvement could be done only by examining them.
C2. check b: enumerate pages being used by user, configure the PHP framework to display timing results and sql query executed in each page. Thus the bottleneck identified : the pages execution is not written in optimal way, inefficiently doing a lot for query for simple displays. Part of the page executes a query which fetch a lot of data that is not being used at all. Another part is identified to fetch a lot of data to determine simple fact.
For application P, is a lot more complicated. Because the user is too few, performance indicators will be weak (user will perceive terrible performance but by the time we go hunting for bottlenecks the performance indicator will already return to normal) , so we must simulate a larger load to get better indications. In the past we used testing tools like the Grinder or JMeter to simulate large load. But I must confess that I simply refreshed my Firefox page repeatedly to simulate load during application P's troubleshooting.
S1.  check using top command  in the web server. With apc activated in the PHP web server, CPU usage is low enough, bottleneck hypothesis is weakened.
S2.  In the SAP server, OS-level measurements are simply off limits for me or not informative enough. So I did a timing measurement each time SAP is called. This is done by using PHP profiling functions to mark start and end each SAP RFC execution, the profiling functions being used is provided by the PHP framework we used (Yii::beginProfile and Yii::endProfile). Activating profile component in the PHP application's configuration will show the profiling result in the lower part of application page.
S3.  Joget workflow is a tomcat-based Java application. Doing a 'top' in the joget server and joget's database server shows a lot of CPU activity by MySQL and also Java process. Thus the hypothesis is strenghtened. Further bottleneck identification is done by using jvisualvm. It was a hassle to setup (it has a bug that caused me unable to monitor remote apps), in short I used the thread dump functionality repeatedly to identify what part of the joget workflow service that became bottleneck.
For S4, and S6, no checks done but conceptually checks that I did in P2  and C2 could be done.
For S5, I did a show full processlist repeatedly, and find out some queries indeed became a query bottleneck by appearing often in the list.
For P4, C4, and S7 I didn't do any checks yet. Conceptually we could use iperf (wikiref, jperf) to test for end to end network bandwidth and packet loss. We could also plot network traffic using Munin (ref) in each host to determine whether traffic saturation occured or not. Or better yet, network traffic graph in network switches involved could help us strengthen or weaken these network related hypothesis.
These actions I summarize as :

Step 3. Check each bottleneck hypothesis. If the check confirms (strengthens) the hypothesis, breakdown the bottleneck into a smaller hypotheses relating smaller part of the system. Check the smaller hypothesis, if it is found to be true, breakdown it further into smaller parts if possible. 

Step 3-A. The check&breakdown step should be applied recursively (repeatedly, for you non programmers) if possible.

Step 3-B. For database bottleneck, could be breakdown into :

  • SQL query (or queries). Frequently shown SQL queries could became a clue that such SQL became a bottleneck.

  • SQL Parse or SQL Execution. SQL parse problem could be fixed by changing the application to use query binding.

  • Database capability problem. If there is a problem with almost all SQL queries, then database capability is identified as bottleneck. This is further breakdown into: A) Database CPU bottleneck B) Database IO bottleneck C) Other Database Problems

Step 3-C. For bottlenecks in application script, identify which part of entire application is bigger bottleneck by using sampling or profiling techniques (choose one from the two different approach). Execute a repeated load testing in parallel with execution sampling/profiling.  If using manual profiling, examine timing results and breakdown the larger part further.

Sampling techniques is essentially finding out what part of application logic is being run at one time. The most often part that shows up in the stack trace is identified as a bottleneck. 
Profiling is essentially measure time taken to do some part of application logic. If we found, for example, one page is slow, then we apply profiling functions (change the application) to find out the time related accesses to external systems, and also time taken for parts of our application logic. If the result is, from application logic part A,B,and C, the time taken by part C is 90%, then we should profile parts of application logic C, breaking it into C1, C2, C3. Repeatedly checking the results and apply  further profiling on the largest part will identify the smallest part of the application responsible for larger part of execution time.
For application using C/C++ programming language, execution sampling could be done by using GDB (gnu debugger). The application should be compiled with debug information enabled. Do 3x - 5x consecutive thread dumps  with 2 seconds between each dump using thread apply all backtrace command. In single-threaded multiprocess configuration we are to execute backtrace command in each and every processes.
For application using Java programming language,  execution sampling could be done using thread dump functionality in jvisualvm tool. Java automated profiling tools also could be used, but because of the performance degradation associated with automated profiling techniques I seldom use any of them.
For application using PHP programming language, one of my previous blog post could be implemented to do stack dump for PHP pages that are running. Profiling could be done by modifying application to call profiling functions. These will degrade performance but selective implementation will ensure negligible performance impact.
The reader is encouraged to find out what kind of method works for other languages.

Performance Improvement Techniques

After specific bottlenecks is found out, we need to try improve condition by various means. Because of the many factors involved, not all bottlenecks could be avoided.

Step 4. For each positively identified bottleneck, try to improve performance by some means

For application P, I improved the application by adding index for tables that being used, by examining what column is used in the bottleneck queries. Note that adding unnecessary index will slow down data manipulation activity (insert, delete, updates). 
For application C, improvement is done by rewriting PHP logic in the slow pages found out during testing.  Because we use Yii's ActiveRecord classes, embedding a few with() function calls to some find() in the controller will result in automatic join to reduce query count done to the database. In some other find() we add more condition criteria so the rows retrieved is only the one we need.
In other part of the application, a flag column is add into one of tables. This flag is a substitute to querying another table. Called performance-improving denormalization, we add another place to store some data that normally obtained by aggregating data in another table, change each and every activity that modify the source data table to automatically update the flag, and replace the aggregation query to source table with simple query to the table containing the flag. The logic is that we trade the reduced time required for aggregation query with increased transaction time relating to the source table.
For application S, it is much more complex. Examining S2 profiling results, it is found out that excessive SAP RFC calls is being done for data that slowly changing. The step taken is to add a caching layer for such RFC calls. The Yii's cache component is found to be a good solution for this problem.
S3 bottleneck is shown to be existed, and from the thread dumps it is shown that SLA calculation is done by Joget while we only need general workflow activity information. The solution we take is by modifying the Joget workflow service so there is optional parameter called nosla that will prevent SLA (service level agreement) calculation. The PHP application is also modified to include parameter nosla=1 during specific calls to Joget workflow.
While S4 bottleneck hypothesis  is found to be false, S5 hypothesis is shown to be strong.
Analyzing query bottleneck from S5 checks, it seems that MySQL 5.1 have troubles optimizing IN query which actually have only one element in the 'IN clause'. The solution is replacing the MySQL 5.1 with MariaDB 5.3.

These explanations is summarised as follows:

Step 4-A. Consider upgrading your database server if the bottleneck is shown to be fixed in the newer version 

Step 4-B.  Consider using caches. Data caches can be used to reduce calls to external system that didn't manipulate data.  PHP Opcode caches speed up PHP parsing.

Step 4-C. Consider changing your data schema, denormalize data if necessary.  

Step 4-D. Consider changing application logic  implementation to access less data. Less rows is more impact than less columns. Avoid unnecessary query to BLOB columns when we only need data from other columns. 

Step 4-E. Consider creating database indexes to enhance select query performance

For completeness, I include other ideas that I have done when optimizing another  application not discussed in this post :

Step 4-F. Consider implement part of application logic in other faster language, such as Java or PL-SQL, when CPU bottleneck in web app is an issue. But try to improve the algorithm first before porting logic to other language.

Step 4-G. Consider increase memory pool configuration in the database server if buffer hit ratio is small (example, less than 95%).

Step 4-H. Consider increase parallel processes in the web server if there is a large amount of concurrent user and memory use is below 70%. This should only be done if we are certain that no database bottleneck exists and no CPU bottleneck in web server. Interpolate memory usage correctly (see maximum memory used for each php-cgi or httpd process, multiply by maximum process expected). Remember that OS will automatically use free memory as disk cache, and too small free memory will degrade IO performance.

And also some ideas that haven't been tested:

Step 4-I. Consider add more disk, enable striping with large blocks if there is I/O bottleneck. Consider add more CPU if the CPU bottleneck could not be reduced in other ways.


Summary

In this blog post I wrote steps that I have done to identify performance bottlenecks, and what steps I take to improve performance. I also tried to write a troubleshooting framework as a 4 step process:

Step 1. Enumerate all system involved, for each system create a hypothesis that the system is the one causing performance bottleneck

Step 2. Enlist more specific performance bottlenecks and how could we test or check for such bottleneck.

Step 3. Check each bottleneck hypothesis. If the check confirms (strengthens) the hypothesis, breakdown the bottleneck into a smaller hypotheses relating smaller part of the system. Check the smaller hypothesis, if it is found to be true, breakdown it further into smaller parts if possible.

Step 4. For each positively identified bottleneck, try to improve performance by some means

The practical implementation of each step is explained for each of three example application that chosen for this blog step. I hope these could serve as a guidelines for others (and myself) when the need to troubleshoot application performance arises.

HA Storage Cost Comparison : NetApp MetroCluster vs DRBD

One of the cost saving techniques being used by Google is that they used commodity SATA drives in their GFS Clusters, as opposed to a specialized storage cluster. That makes me wonder just how much saving that could be realized by using SATA drives.
For our comparison case, lets build a highly availably MySQL server. This MySQL is not clustered, but will be failover onto the second host upon first host's failure. Our core solution primarily depends on two servers, one server runs mysqld and the other will run mysqld upon failure. Both mysqld is configured to save data to the same storage, so there only can be one mysqld server allowed to run at a time. The storage solution being used is NetApp storage solution described in http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001783, that will supply storage redundancy and availability zone redundancy. NetApp's MetroCluster will do synchronous mirroring between two separate NetApp disk shelves. For comparison we will use DRBD synchronous storage replication on commodity SATA drives (http://dev.mysql.com/doc/refman/5.5/en/ha-drbd.html) with similar effect. Heartbeat open source cluster software will be our means to failover in our DRBD solution.

Requirement 

High Available MySQL server with 5-6 TB storage.
Modes of failure that need to be supported :

HA Scenario :

  • Host Failure -> Hardware problem on one of the server
  • Storage Failure -> Hardware problem on one of database storage  drives
  • Availability zone Failure -> Entire zone disabled by some problem, such as total network loss or total power disruption. We assume we have two availability zone, maybe one zone in one building and the other is in the building across the street.

Note: Network failure not discussed here because network redundancy is a blog topic for another day :), And disaster recovery not being a requirement because cannot be fulfilled using MetroCluster nor DRBD. MySQL master slave replication is a better solution for disaster recovery.

Expected failover time is in one minute or less. In practice this will be achievable depending on InnoDB Log size configured in the MySQL server, the larger the log size will result longer failover time.

Solution Design

Virtual Machine or physical servers could be used interchangeably. Performance implications of the choice of physical or virtual server might be significant but lets assume its already decided by IT organization's policy. VM servers notably easier to configure onto HA because we could configure Fault Tolerant feature into VMs, but in the open source solution we deliberately choose not to because Heartbeat cluster software installed into two machine will provide fault tolerance onto our solution.
This solution requires two servers. If we require to support availability zone failure, then the server will be deployed across two availability zone, with the consequence of some performance impact.  If we are not required to support such failure, then deploying the two servers in the same room connected to the same switch will give us better performance.

A. NetApp solution

NetApp disk shelves with one controller and configured in a MetroCluster is enough to provide storage failure and site failure protection. It still has local SPOF on the controller but lets ignore it for now because NetApp stuffs are pricey :). MetroCluster configuration ensure that each disk is mirrored in the other availability zone. 
Host failure will prompt VMWare to restart the VM, and if not possible, start the MySQL VM in the other host. Storage mirroring and storage failover in event of failure will be taken care of by NetApp MetroCluster. Availability zone failure requires manual intervention to do a forced takeover in the surviving VM site (see http://communities.vmware.com/message/1394567)
Product required :
Two disk shelf with 5.4 TB total capacity, Est price @$32,002.00 (ref : http://storagemojo.com/storagemojos-pricing-guide/netapp-price-guide/)
(DSK SHLF,12x450GB,10K,6Gb SAS,IOM6,-C,R5).
Two Brocade 200E Distance Bundle @ $750
Two FAS 3170 FAS3170A-CL-BASE-R5 @ $51,224
Note: For the Brocade I don't know which one to buy so I choose the lower price one.
Total  : $167,952,00 (Not included : VMWare licenses and VM server nodes)

Storage specific price: Price per redundant GB is 2x$32002 / 5400 = $11.8/GB

B. DRBD solution

Our DRBD solution uses commodity SATA drives in place of NetApp disk shelves. Server's price (A HP Proliant DL 180) will be counted with storage price because 4 drive bay limitation for the server. If we need more storage we need to buy another couple of servers and share the additional storage via NFS or iSCSI (and also mirror them via DRBD).
Product required : (ref: HP Proliant 100 product line)
2x HP Proliant DL 180 G6  with 4 GB memory and Xeon E5606 quad core: $1,824.00
8 x  HP 2TB 3G SATA 7.2K Hot Plug 3.5 MDL Hard Drive - 1-year warranty  @ $ 509.00
(4 drive in each server)
2 TB configured as root filesystem
6 TB configured as mysql data, giving 6TB mirrored storage
Total : $18,664.00
Upper bound Storage specific cost, price per redundant GB :  $18664 / 6000 = $3.11

The DRBD solution typically failover in under 1 minutes, this is mostly the time taken to start MySQL because the Heartbeat system have < 1 second response. (see note above regarding InnoDB Log size impact). When Heartbeat detects that filesystem failure occured on the primary disk, it will initate failover. This also occurs when it detects primary host's no longer has a heartbeat, that could be caused by host failure or site failure. Heartbeat will start the MySQL server in the secondary node and could also take over the primary's IP if necessary.
HA MySQL configuration with DRBD



Short Summary

Comparing upper bound storage cost of our DRBD with SATA solution with NetApp solution (disk shelf only), we have comparison of $3.11 / Gb versus $11.8 / GB. Tremendous cost savings indeed when choosing SATA, even if we take the whole system cost of DRBD solution into account.
Comparing total system cost for similar capacity (6 TB vs 5.4 TB redundant storage), we have $18,664.00 vs $167,952.00 .. a difference of almost 1 : 9 between NetApp solution and our DRBD solution. This primarily caused by costs of synchronization support devices.

What does the large amount price difference buy us for the NetApp? 12 disk performance vs 4 disk performance for one. But that many disk could give us no performance gain if the condition not right (such as when striping and data spread not good). Vendor support for another. We could buy linbit's Support for DRBD solutions but I haven't found a price list yet. But from the point of view of High Availability features seems that the DRBD solution is adequate by giving us failover time under 1 minute.

Other Scenarios

This configuration could be modified to support disaster recovery scenario by adding another Mysql server, with the same redundant storage, located in remote location. These two Mysql server is configured as slave to the original server, and by giving this server the same capability as the first, it will be ready to perform with full performance and availability when the need arise.

More IO performance could be given by configuring the DRBD solution with smaller capacity (but more performant) disks and more servers, sharing storage using iSCSI or NFS (maybe following HA NFS configuration in my previous post). Thus we trade cost with performance, but keeping the system cost below NetApp's.

Another idea for next article is for implementing HA on Oracle database server .. while I look for solutions I found these reference docs :  http://eval.symantec.com/mktginfo/enterprise/white_papers/b-ha_for_oracle_db_vcs_hadr_WP_14216725.en-us.pdf and http://www.vmware.com/files/pdf/partners/oracle/Oracle_Databases_on_VMware_-_High_Availability_Guidelines.pdf. Seems Symantec got two product for lower cost oracle HA : Symantec ApplicationHA and Veritas Cluster Server.

Friday, August 24, 2012

Fault (Tolerance) Ideas

Murphy's Law said that if anything could go wrong, then it will. (ref: Captain Edward A. Murphy http://www.murphys-laws.com/murphy/murphy-true.html). In our world of computing this includes :

  • our network  switches and wirings, they could be disabled, or worse : bit flipping data that were sent through the network
  • TCP checksum, instead of checksum errors (that will get transmitted), double bit flip will corrupt the packet but TCP layer not knowing that it is corrupted
  • HDD wiring, install wrong cable or wrongly install a correct cable. Switching a good ultra DMA ATA cable with a bad one (so it will still be detected as ultra DMA) and we get a large ultra DMA CRC error rate. And we have also  CRC-undetectable  error rates of something like 5x10^-13 (illogically taken from  http://doc.utwente.nl/64267/1/schiphorst.pdf ),  this corrupt data (average is one bit for two terabytes of data) will get stored to our disks.
  • HDD failure, that commodity disks will fail in 2 -3 years, and  might be sooner. Our industry standard RAID 5 is no longer suffice for large disk deployments, better use RAID6 or RAID1.
So ideas for a large scale fault tolerant system will include :
  • End to end data corruption detection. Put it in application/database level then we get a pretty good coverage of things that could be detected if goes wrong. Two different CRC algorithm will suffice.
  • redundancy at least 3x for each data block. Or object. Suddenly RAID1 no longer suffice (because only 2x redundancy)
  • auto-replication or self healing. In event that a HDD gets replaced with new one, data is ought to be re-mirrored to the new HDD.
  • multihomed systems. That means at least two Network Interface on each host, each connected with different network switch providing network redundancy.
  • monitoring. The drawback of automatic healing or automatic failover is that the human operator doesn't know that there is a failure happening. Even if nothing must be done (such as automatic re-mirroring in progress) it could be an indication that something is wrong (like bad switch that corrupt TCP packets with the same checksum).

Open Source Cloud Computing on the rise

VMware player shows us that virtualization has many benefits even on a dual processor laptop. Amazon Elastic Compute Cloud (EC2) sets the cloud computing standard, shows us that cloud computing is feasible and could be cheap (after all, they only make us pay for hourly usage). So here in 2012 we found that there are so many open source cloud computing solutions out there..

Core Service

Virtualization is provided by several commonly known hypervisors :

  • VMWare vSphere. This one is not open source at all.
  • Xen. Xen was a mature hypervisor that were fully open-sourced by Citrix in 2009. There are open source version (Xen Cloud Platform), free version (XenServer Free), and paid version.
  • Kernel-based Virtual Machine (KVM). KVM is  an open source virtualization software that tightly integrated with the Linux kernel as the host OS. 
On top of these core hypervisors is built more GUI and Management layer, resulting in cloud computing platforms.

Cloud computing platform

Lets see how many open source cloud computing platform I found by this day :

OpenNebula

OpenNebula was initially released in March 2008 by OpenNebula community. OpenNebula has an appliance marketplace (similar to VMWare's) where user could share their VM's for free or for a fee (but it seems to have no VM for sale right now). The only HA feature are hooks that could be set to resubmit (restart) a VM if it is found to be in ERROR state.

OpenNebula users :
  • Telefonica Germany
  • China Telecom
  • Akamai
  • IBM Global Business Services
  • SAP
  • RIM


Eucalyptus 

Initially released in 2008, this is one mature cloud computing platform. Canonical formerly used Eucalyptus for its Ubuntu Enterprise Cloud. Previous releases differentiated as enterprise and opensource edition, but latest (3.1) version merge both edition into one open source release. High Availability is given by  Storage controller HA, walrus storage HA, cloud controller HA, each controller is implemented as a pair of services with automatic service-removal-from-operation during failure.
Eucalyptus is being used by:
  • NTT Data (Japan IT company)
  • NetApp
  • Fujitsu 
  • NASA
  • Sony
  • Trend Micro

CloudStack

CloudStack is formerly known as Cloud.com (May 2010), that bought by Citrix (July 2011). The entire software is contributed to Apache in April 2012. High Availability feature is monitoring VMs, fencing (disable misbehave VM), and restarting failed VMs. Web UI is using server-side Java that has no HA out of the box (but could be set up to be HA with some knowledge). A lot of integration with Citrix XenServer and Citrix Netscaler hardware/software.
Among CloudStack clients are :
  • Tata Communications (India giant company)
  • KT (Korea landline operator)
  • Godaddy.com (Popular Web hosting provider)
  • Nokia Research Center

OpenStack 

OpenStack is a newer cloud computing platform (started by NASA & Rackspace in 2010), has many supporters and quickly growing feature list. 'Quickly Growing' have unstable connotation because newer features haven't endured test of time. Full HA (High availability) feature is not released yet (as of 25 august 2012), such feature is being targeted for Folsom  release in September 2012. Among expected new feature is integration with Corosync, Pacemaker to ensure HA in OpenStack services, new Cinder block storage system, and Quantum virtual network service. Existing HA feature is on Swift object storage service that replicates objects automatically. Web UI is based on python's Django framework.
OpenStack is used by :
  • Canonical (yes, creator of Ubuntu)
  • Intel (the giant that almost monopolizes processor manufacturing)
  • Deustche Telekom (service provider)
  • Rackspace hosting
  • AT&T Communications
  • NASA

High Availability menggunakan Citrix (Apache) CloudStack

Dashboard CloudStack
Pada April 2012 Apache menerima CloudStack dari Citrix sebagai salah satu project open source di Apache Incubator.
CloudStack merupakan platform software yang mengumpulkan sumberdaya-sumberdaya komputasi menjadi sebuah Cloud IaaS (infrastructure as a service). Bayangkanlah Amazon EC2 yang dapat diinstall di data center pribadi kita.
Service Offering - mirip Amazon EC2
Dengan CloudStack, kita bisa memberikan layanan cloud ke user kita.. Mereka dapat melaunch Server sesuai keinginan asalkan masih di batas kuota dan kapasitas sistem.

Pilihan Template VM

Sebenarnya komponen utama CloudStack adalah aplikasi web berbasis Java yang melakukan provisioning ke software hypervisor berupa XenServer, VMWare, ataupun KVM. Sebuah Cluster dalam terminologi CloudStack adalah sekumpulan host yang dikendalikan oleh hypervisor yang seragam.  

Apa yang didapat dari CloudStack antara lain :
  • Manajemen sentral berbasis web berfitur Ajax untuk VM-VM berbasis Xen, VMWare vSphere, ataupun KVM
  • Repository tempat menyimpan template image VM yang dapat diintegrasikan dengan OpenStack Swift
  • Konfigurasi offering VM : user dapat melaunch VM dengan memilih paket yang di dalamnya sudah terkonfigurasi ukuran memory, banyaknya CPU.

  • Integrasi dengan Citrix Netscaler untuk Elastic load balancing dan Elastic IP
  • Load balancing biasa dengan Virtual Router atau F5 BigIP
  • Konfigurasi Firewall dan NAT untuk Virtual Router ataupun Jupiter SRX
  • High Availability untuk VM yang dijalankan di CloudStack (Jika ada masalah dengan salah satu Host, CloudStack akan memboot VM tersebut di Host yang masih sehat dengan asumsi VM menggunakan virtual hard disk yang tersimpan di shared storage)
  • Provisioning VLAN
Konsep Deployment Umum Cloudstack
CloudStack dideploy dengan menyediakan dua jenis storage, primary dan secondary. Secondary Storage digunakan untuk menyimpan template VM atau ISO sebagai image dasar pembuatan virtual machine. Primary storage digunakan untuk menyimpan data hard disk milik VM-VM. CloudStack sendiri menyimpan informasi di database MySQL. Dianjurkan untuk menyediakan 3 jalur jaringan pada semua host : jaringan publik internet, jaringan privat (LAN), dan jaringan khusus manajemen dan storage.
Salah satu konfigurasi deployment CloudStack


Aspek penting dalam high availability ialah moda failure yang disupport. Di dalam sistem di atas ada beberapa moda failure :
  • Host Failure : kegagalan atau kerusakan pada host yang menjadi bagian dari Cluster
  • Management Server Failure : kegagalan atau kerusakan pada server CloudStack
  • Primary Storage Failure : kegagalan pada penyimpanan data utama (virtual hard disk)
  • Secondary Storage Failure : kegagalan pada penyimpanan template atau ISO 
  • Management Server MySQL database failure : kegagalan pada database CloudStack
Mekanisme host failure Failover pada CloudStack

Moda Failure yang ditangani oleh CloudStack ialah Host Failure. Secondary Storage failure ditangani oleh infrastruktur OpenStack Swift, yang menyimpan image secara rendundan pada cluster storagenya sendiri. Management Server Failure dapat ditangani dengan menjalankan CloudStack pada dua host yang diberi load-balancing. Mgmt Server MySQL database failure ditangani dengan memasang DRBD pada storage mysql atau mengkonfigurasi MySQL secara master-slave.

Digabungkan dengan High Availability Storage System (CloudStack tidak menyediakan layanan storage primary, hanya interface ke sistem storage berbasis iSCSI atau NFS) maka CloudStack dapat menjadi platform yang cukup robust untuk deployment sistem.

Referensi:

Thursday, August 23, 2012

(Inexpensive) Highly Available Storage Systems

The internet has pampered our users by giving them 99.9% uptime. Now every corporation needs to have similar availability to 99.9%. One part of high availability solution is the storage. In the good old proven corner we have SAN and NAS storage solutions. They are not always highly available (you must ensure the system has more than one controller, in case one controller is broken, and RAID 5 is nowhere enough these days, choose RAID 6, and dont forget network rendundancy), but they are almost always expensive. In the era where IT Directors forced to think 'cost transformation', we always welcome cost-saving alternatives.
New hardware developments influenced our infrastructure options, lets write the factors down:
  • abundance of cheap gigabit ethernet card & router -> this allow us to forget expensive fiber connetivity, and allow us to leverage newly developed distributed systems.
  • cheap large SATA drives -> large commodity storage for the masses
Development of Infrastructure support software also influenced in giving more options :
  • OpenSCSI iSCSI target -> converting our SATA drives into shared storage
  • CLVM -> clustered LVM allows distributed volume management
  • OCFS2 -> Oracle Clustered Filesystem is an open sourced cluster file system developed by Oracle 
  • Ceph RADOS massive cluster storage technology
  • Amazon Elastic Block Storage -> we need to run our servers in AWS to use these
  • DRBD -> Distributed Replicated Block Device driver, allows us to choose between synchronous mirroring and asynchronous mirroring between disks attached to different hosts
A Question in Serverfault asks about mature high availability storage systems, summarizing the answers :
  • The surprising fact is that the oracle-developed OCFS 2 is not the tool of choice because needs downtime to add a capacity to the cluster filesystem
  • Ceph RADOS is promising newcomer in the arena, giving Google File System like characteristics with standard block interface (RBD). Old enterprises usually hesitate to use such new technology.
  • The ultimate choice of sys admins is HA NFS (highly available NFS).

HA NFS

Requirements for Highly Available NFS is as follows:
  • two host (PC or Server)
  • two set of similar sized storage (identical SATA II disks preferred)
  • cluster resource manager : Pacemaker
  • cluster messaging : Heartbeat (alternative: Corosync)
  • logical volume manager : LVM2
  • NFS daemon
  • DRBD (Distributed Replicated Block Device)
Distributed Replicated Block Device ( http://www.drbd.org/ )
Installation instruction is available at references below, in short :
  1. install DRBD devices (drbd0) on top of physical disks (ex: sda1) : fill in /etc/drbd.d/nfs. Do the same in both host (server)
  2. configure LVM to ignore physical disk used (ex: sda1). configure LVM to read volumes in DRBD (ex: drbd0), disable LVM cache (fill in /etc/lvm/lvm.conf)
  3. create LVM Physical Volume (pvcreate), LVM volume group (vgcreate), LVM logical volume (lvcreate). see your favorite LVM tutorial for details.
  4. install & configure Heartbeat
  5. install & configure Pacemaker. Pacemaker must be configured to have :
    1. drbd resource (ocf:linbit:drbd). automatically set DRBD master/slave mode according to situation at hand
    2. nfs daemon resource (lsb:nfs or lsb:nfs-kernel-server)
    3. lvm resource (ocf:heartbeat:LVM, ocf:heartbeat:Filesystem)
    4. nfs exports resource (ocf:heartbeat:exportfs)
    5. floating ip addr resource (ocf:heartbeat:IPaddr2)
Illustration of two host HA NFS system
Automatic failover mechanism could be activated to give seamless NFS operation during failover. The advantage of this HA NFS configuration is :
- clients will use old proven NFS interface
- realtime synchronous replication ensure no data loss

The disadvantage of shown configuration is
- no horizontal scalability
- standby capacity of second host is not leveraged

Linbit (www.linbit.com) provides enterprise support for DRBD.

Ceph RADOS

Ceph is a distributed object storage system. Ceph (http://ceph.com/docs/master/)  has these features :
  • massive clustering with thousands Object Storage Devices (OSD). Ceph could run with minimum 2 OSD.
  • automated data replication with per-pool replication settings (ex: metadata : 3x repl, data: 2x repl)
  • data striping to improve performance across cluster
  • has POSIX filesystem client (CephFS), Openstack Swift compatible interface, and even REST interface
  • block device interface (RBD) -> suitable for virtualization, OpenStack cloud support
  • horizontal scalability : add more OSD and/or disks for more storage or performance
Illustration of Ceph RADOS cluster
The disadvantage of Ceph RADOS cluster is :
- new technology, need training and hands on experience to operate 
- stability not yet industry-proven, but already seen large deployments
Inktank (http://www.inktank.com/) and Hastexo GmbH (www.hastexo.com) provides enterprise support for Ceph Cluster deployments.

Conclusion

New inexpensive storage systems technology now exists to be leveraged to provide high availability storage and/or performance for our legacy applications that still cluster-unaware. 

Monday, August 13, 2012

Popup Text area using jQuery

In times where we have too small screen estate (primarily because too much information in the screen), we settled using popup window containing textarea to input comments.  Just implemented such simple popup textarea functionality using jquery-popbox - A small wonder.  OSS and sharing and all.



Thursday, August 2, 2012

Anti pattern : Ignoring Exceptions

Did you ever find out that your app didn't do what it was expected to do ? But no clue whatsoever about the cause. You might stumbled upon 'Ignored Exceptions' antipattern. Or worse.. a special case of this antipattern is 'whatever passed the acceptance test' mindset that causing the programmer to code in such way.
The most primitive example of this antipattern is :
   ON ERROR RESUME NEXT
[this one-liner is an example from Visual Basic]

In Java programming language, an example for Ignored Exceptions antipattern :

   try {
  ... some code ..
  --- more code ---
   }
  catch (Exception ex) {
  }

Note  that the exception is blank. It silently ignores any errors.
Yes, there are cases where this sort of code would be hard to avoid, but in most cases better alternatives exist. Such as :
- using Log4j to log the error
- converting the exception to a message that could be understand by the user
At the minimum , the error should be written to the console, so troubleshooting can be done about the error :
  catch (Exception ex) {
     ex.printStackTrace() ;
  }


Wednesday, May 30, 2012

Migrating Joget 3 Instance

On some occassions we need to move joget 3 workflow service from one server to another. We need to migrate two part : the server application files, and the database. Our aim is to migrate joget server with data intact.

Identify running instance

First we need to determine where joget is installed. In an ideal situation this information is stored in CMDB. But since we have yet to see Joget - Cloud foundry package, and we haven't deploy CMDB enterprisewide, this step is necessary.
Look  for java processes in the source machine :
ps auxw | grep java
In my machine we got these output :
501      26733  0.0 22.2 759176 457552 ?       Sl   Apr26  18:20 /home/admin-dev/p/jdk1.6.0_21/bin/java -XX:MaxPermSize=128m -Xmx512M -Dwflow.home=./wflow/ -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/conf/logging.properties -Djava.endorsed.dirs=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/endorsed -classpath :/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/bin/bootstrap.jar -Dcatalina.base=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18 -Dcatalina.home=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18 -Djava.io.tmpdir=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/temp org.apache.catalina.startup.Bootstrap start

I really don't know why the user shown as 501 here. It should be the user name that the java process runs. Lets ignore that for now.
From the parameters we note that joget is installed in /home/admin-dev/p/joget-linux-3.0.0

Identify database connection

Lets check  where the Joget database is located :
[admin-dev@ead-dev wflow]$ cd /home/admin-dev/p/joget-linux-3.0.0/
[admin-dev@ead-dev joget-linux-3.0.0]$ cd wflow
[admin-dev@ead-dev wflow]$ ls
app_datasource-default.properties  app_datasource.properties  app_plugins
app_datasource-esshr.properties    app_forms                  app_xpdlImages
[admin-dev@ead-dev wflow]$

lets see the 'pointing' file :
[admin-dev@ead-dev wflow]$ cat app_datasource.properties
#
#Mon May 25 15:33:51 SGT 2009
currentProfile=esshr
 
 and the real database config is :
[admin-dev@ead-dev wflow]$ cat app_datasource-esshr.properties
workflowDriver=com.mysql.jdbc.Driver
workflowUrl=jdbc\:mysql\://10.65.10.150\:3306/jwdb?characterEncoding\=UTF-8
workflowUser=joget
profileName=
workflowPassword=
Now we're ready to start the migration.

Shutdown server


We want to minimize data inconsistency, so lets do a cold migration. Kill the joget server.

[admin-dev@ead-dev joget-linux-3.0.0]$ ./tomcat6.sh stop
Using CATALINA_BASE:   /home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_HOME:   /home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_TMPDIR: /home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/temp
Using JRE_HOME:       /home/admin-dev/p/jdk1.6.0_21
[admin-dev@ead-dev joget-linux-3.0.0]$
The shell script runs but says nothing about whether it succeed or not. Review the java process :
[admin-dev@ead-dev apache-tomcat-6.0.18]$ ps auxw | grep java
501      26238  0.0  0.0  61204   740 pts/1    R+   09:09   0:00 grep java
501      26733  0.0 22.4 758848 461972 ?       Sl   Apr26  18:30 /home/admin-dev/p/jdk1.6.0_21/bin/java -XX:MaxPermSize=128m -Xmx512M -Dwflow.home=./wflow/ -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/conf/logging.properties -Djava.endorsed.dirs=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/endorsed -classpath :/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/bin/bootstrap.jar -Dcatalina.base=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18 -Dcatalina.home=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18 -Djava.io.tmpdir=/home/admin-dev/p/joget-linux-3.0.0/apache-tomcat-6.0.18/temp org.apache.catalina.startup.Bootstrap start

Its still running. But I check the netstat also :
[admin-dev@ead-dev apache-tomcat-6.0.18]$ netstat -anp | grep 8080
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
[admin-dev@ead-dev apache-tomcat-6.0.18]$

The 8080 port is closed. Its stopped. But the java process stuck somewhat. Lets fix that :
[admin-dev@ead-dev apache-tomcat-6.0.18]$ kill 26733
[admin-dev@ead-dev apache-tomcat-6.0.18]$ netstat -anp | grep 8080
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
[admin-dev@ead-dev apache-tomcat-6.0.18]$ ps auxw | grep java
501      26252  0.0  0.0  61204   744 pts/1    R+   09:11   0:00 grep java

Copying joget server files


Now lets do a scp to target :

[admin-dev@ead-dev joget-linux-3.0.0]$ cd ..
[admin-dev@ead-dev p]$ pwd
/home/admin-dev/p
[admin-dev@ead-dev p]$ scp -r joget-linux-3.0.0/ admin-sppd@10.65.10.156:/home/admin-sppd/p
The authenticity of host '10.65.10.156 (10.65.10.156)' can't be established.
RSA key fingerprint is e2:a8:38:8a:5f:30:8a:77:24:57:b2:9a:9d:28:ef:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.65.10.156' (RSA) to the list of known hosts.
admin-sppd@10.65.10.156's password:
CHANGES.txt                                   100%   23KB  23.4KB/s   00:00
tmlog1310.log                                 100%  169KB 168.8KB/s   00:00
README.txt                                    100% 1716     1.7KB/s   00:00
tm.out                                        100%  251KB 250.7KB/s   00:00
...

 Database migration

Dump the database from one convenient server..

[admin-dev@ead-dev p]$ mysqldump -h 10.65.10.150 -u joget -p jwdb > jogetjwdb.sql
Enter password:
[admin-dev@ead-dev p]$ ls -l
total 86120
drwxrwxr-x 9 admin-dev admin-dev     4096 Feb 15 18:16 grinder-3.4
drwxr-xr-x 9 admin-dev admin-dev     4096 Feb 13 09:18 jdk1.6.0_21
-rw-rw-r-- 1 admin-dev admin-dev 26635581 May 31 09:17 jogetjwdb.sql
...



Create a new database in the target system (in this case, the same host as the original..)
-bash-3.2$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1421944
Server version: 5.0.77-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
...
mysql> create database jwdbprod;
Query OK, 1 row affected (0.29 sec)

mysql> grant all on jwdbprod.* to jogetprod@'%' identified by 'passwordhere';
Query OK, 0 rows affected (0.09 sec)

mysql>

And then we import the database from where we dumped it last :

[admin-dev@ead-dev p]$ mysql -u jogetprod -h 10.65.10.150 -p jwdbprod
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1422462
Server version: 5.0.77-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> source ./jogetjwdb.sql; 

Reconfigure joget

[admin-sppd@sppd-online1 ~]$ cd p/joget-linux-3.0.0/wflow/
[admin-sppd@sppd-online1 wflow]$ vi app_
app_datasource-default.properties  app_forms/
app_datasource-esshr.properties    app_plugins/
app_datasource.properties          app_xpdlImages/
[admin-sppd@sppd-online1 wflow]$ vi app_datasource-esshr.properties
[admin-sppd@sppd-online1 wflow]$

Ok.. lets start it up
[admin-sppd@sppd-online1 wflow]$ cd ..
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ ls
10.2.16.13.tm0.epoch  data                             tmlog1310.log
10.2.16.19.tm7.epoch  docs                             tm.out
127.0.0.1.tm30.epoch  d:\participantLog.joget.txt      tomcat6.sh
apache-ant-1.7.1      d:\participantLog.joget.txt.lck  VERSION.txt
apache-tomcat-6.0.18  LICENSE.txt                      wflow
build.xml             README.txt
CHANGES.txt           setup.sh
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ ./tomcat6.sh start
Using CATALINA_BASE:   /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_HOME:   /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_TMPDIR: /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18/temp
Using JRE_HOME:       /home/admin-dev/p/jdk1.6.0_21
No clue about startup process as usual..
Lets check the logs
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ cd apache-tomcat-6.0.18/
[admin-sppd@sppd-online1 apache-tomcat-6.0.18]$ cd logs
[admin-sppd@sppd-online1 logs]$ tail -f catalina.out
 the last lines were
INFO: Stopping Coyote HTTP/1.1 on http-8080
apache-tomcat-6.0.18/bin/catalina.sh: line 292: /home/admin-dev/p/jdk1.6.0_21/bin/java: No such file or directory

There is some hardwired path that we must fix (this might not be exist in your installation, in my installation I changed some shell scripts). Check where our jdk is :

[admin-sppd@sppd-online1 joget-linux-3.0.0]$ which java
~/p/jdk1.6.0_31/bin/java
 Now reconfig  the path (note different home dir and jdk version)
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ vi tomcat6.sh
#/bin/sh

export JAVA_OPTS="-XX:MaxPermSize=128m -Xmx512M -Dwflow.home=./wflow/ "
#export JAVA_HOME=/usr/java/jdk1.6.0_21
export JAVA_HOME=/home/admin-sppd/p/jdk1.6.0_31
apache-tomcat-6.0.18/bin/catalina.sh $*

And try start again. but another problem looms :

[admin-sppd@sppd-online1 joget-linux-3.0.0]$ tail -n 100 -f apache-tomcat-6.0.18/logs/catalina.out
May 31, 2012 9:41:26 AM org.apache.catalina.startup.Catalina start
SEVERE: Catalina.start:
LifecycleException:  service.getName(): "Catalina";  Protocol handler start failed: java.net.BindException: Address already in use:8080
        at org.apache.catalina.connector.Connector.start(Connector.java:1138)
        at org.apache.catalina.core.StandardService.start(StandardService.java:531)
        at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:578)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)


The port 8080 is in use.
What app is using it ?
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ netstat -anp | grep 8080
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:8080                0.0.0.0:*                   LISTEN      28482/java
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ ps auxw | grep 28482
500       7345  0.0  0.0  61204   744 pts/2    R+   09:43   0:00 grep 28482
500      28482  0.0 10.4 920632 215888 ?       Sl   Feb17  42:03 /home/admin-sppd/p/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/conf/logging.properties -Xmx512M -Xss192K -Dinsight.base=/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/insight -Dinsight.logs=/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/logs -Djava.awt.headless=true -Dgemfire.disableShutdownHook=true -Dgemfire.ignoreMisses=true -XX:MaxPermSize=256m -Djava.util.logging.manager=com.springsource.tcserver.serviceability.logging.TcServerLogManager -Djava.endorsed.dirs=/home/admin-sppd/p/vfabric-tc-server-developer/tomcat-7.0.23.A.RELEASE/endorsed -classpath /home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/bin/insight-bootstrap-tcserver-1.5.1.SR2.jar:/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/lib/aspectjweaver-1.6.11.SR1.jar:/home/admin-sppd/p/vfabric-tc-server-developer/tomcat-7.0.23.A.RELEASE/bin/bootstrap.jar:/home/admin-sppd/p/vfabric-tc-server-developer/tomcat-7.0.23.A.RELEASE/bin/tomcat-juli.jar -Dcatalina.base=/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance -Dcatalina.home=/home/admin-sppd/p/vfabric-tc-server-developer/tomcat-7.0.23.A.RELEASE -Djava.io.tmpdir=/home/admin-sppd/p/vfabric-tc-server-developer/insight-instance/temp org.apache.catalina.startup.Bootstrap start

Ah. A vfabric tc server developer. I don't think we have any need of it running in the production server. Lets kill it.
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ kill 28482

and restart the Joget's tomcat

[admin-sppd@sppd-online1 joget-linux-3.0.0]$ ./tomcat6.sh stop                  [admin-sppd@sppd-online1 joget-linux-3.0.0]$ ps auxw | grep java
500       7388  0.0  0.0  61204   744 pts/2    R+   09:47   0:00 grep java
[admin-sppd@sppd-online1 joget-linux-3.0.0]$
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ ./tomcat6.sh start
Using CATALINA_BASE:   /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_HOME:   /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18
Using CATALINA_TMPDIR: /home/admin-sppd/p/joget-linux-3.0.0/apache-tomcat-6.0.18/temp
Using JRE_HOME:       /home/admin-sppd/p/jdk1.6.0_31

Check the logs.. 
oh no, still another problem :

[admin-sppd@sppd-online1 joget-linux-3.0.0]$ tail -n 100 -f apache-tomcat-6.0.18/logs/catalina.out 
ERROR 31 May 2012 09:49:32 org.hibernate.util.JDBCExceptionReporter  - Cannot create PoolableConnectionFactory (Access denied for user 'jogetprod'@'%' to database 'jwdb')
ERROR 31 May 2012 09:49:32 org.springframework.web.context.ContextLoader  - Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'setupSessionFactory' defined in class path resource [commonsApplicationContext.xml]: Invocation of init method failed; nested exception is org.hibernate.HibernateException: Hibernate Dialect must be explicitly set
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1338)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:473)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory$1.run(AbstractAutowireCapableBeanFactory.java:409)
Seems I overlook database name in workflowUrl parameter. Editing that line in   wflow/app_datasource-esshr.properties, and restart again with tomcat6.sh stop and start. The result :
[admin-sppd@sppd-online1 joget-linux-3.0.0]$ tail -f apache-tomcat-6.0.18/logs/catalina.out
May 31, 2012 9:58:07 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive spring2.war
May 31, 2012 9:58:10 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
May 31, 2012 9:58:10 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
May 31, 2012 9:58:10 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/15  config=null
May 31, 2012 9:58:10 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 25291 ms
Ok. lets see the joget web.


Seems that the migration is done.
Thats all for now.. I hope your migration succeed as well.