Friday, August 24, 2012

Fault (Tolerance) Ideas

Murphy's Law said that if anything could go wrong, then it will. (ref: Captain Edward A. Murphy http://www.murphys-laws.com/murphy/murphy-true.html). In our world of computing this includes :

  • our network  switches and wirings, they could be disabled, or worse : bit flipping data that were sent through the network
  • TCP checksum, instead of checksum errors (that will get transmitted), double bit flip will corrupt the packet but TCP layer not knowing that it is corrupted
  • HDD wiring, install wrong cable or wrongly install a correct cable. Switching a good ultra DMA ATA cable with a bad one (so it will still be detected as ultra DMA) and we get a large ultra DMA CRC error rate. And we have also  CRC-undetectable  error rates of something like 5x10^-13 (illogically taken from  http://doc.utwente.nl/64267/1/schiphorst.pdf ),  this corrupt data (average is one bit for two terabytes of data) will get stored to our disks.
  • HDD failure, that commodity disks will fail in 2 -3 years, and  might be sooner. Our industry standard RAID 5 is no longer suffice for large disk deployments, better use RAID6 or RAID1.
So ideas for a large scale fault tolerant system will include :
  • End to end data corruption detection. Put it in application/database level then we get a pretty good coverage of things that could be detected if goes wrong. Two different CRC algorithm will suffice.
  • redundancy at least 3x for each data block. Or object. Suddenly RAID1 no longer suffice (because only 2x redundancy)
  • auto-replication or self healing. In event that a HDD gets replaced with new one, data is ought to be re-mirrored to the new HDD.
  • multihomed systems. That means at least two Network Interface on each host, each connected with different network switch providing network redundancy.
  • monitoring. The drawback of automatic healing or automatic failover is that the human operator doesn't know that there is a failure happening. Even if nothing must be done (such as automatic re-mirroring in progress) it could be an indication that something is wrong (like bad switch that corrupt TCP packets with the same checksum).

Open Source Cloud Computing on the rise

VMware player shows us that virtualization has many benefits even on a dual processor laptop. Amazon Elastic Compute Cloud (EC2) sets the cloud computing standard, shows us that cloud computing is feasible and could be cheap (after all, they only make us pay for hourly usage). So here in 2012 we found that there are so many open source cloud computing solutions out there..

Core Service

Virtualization is provided by several commonly known hypervisors :

  • VMWare vSphere. This one is not open source at all.
  • Xen. Xen was a mature hypervisor that were fully open-sourced by Citrix in 2009. There are open source version (Xen Cloud Platform), free version (XenServer Free), and paid version.
  • Kernel-based Virtual Machine (KVM). KVM is  an open source virtualization software that tightly integrated with the Linux kernel as the host OS. 
On top of these core hypervisors is built more GUI and Management layer, resulting in cloud computing platforms.

Cloud computing platform

Lets see how many open source cloud computing platform I found by this day :

OpenNebula

OpenNebula was initially released in March 2008 by OpenNebula community. OpenNebula has an appliance marketplace (similar to VMWare's) where user could share their VM's for free or for a fee (but it seems to have no VM for sale right now). The only HA feature are hooks that could be set to resubmit (restart) a VM if it is found to be in ERROR state.

OpenNebula users :
  • Telefonica Germany
  • China Telecom
  • Akamai
  • IBM Global Business Services
  • SAP
  • RIM


Eucalyptus 

Initially released in 2008, this is one mature cloud computing platform. Canonical formerly used Eucalyptus for its Ubuntu Enterprise Cloud. Previous releases differentiated as enterprise and opensource edition, but latest (3.1) version merge both edition into one open source release. High Availability is given by  Storage controller HA, walrus storage HA, cloud controller HA, each controller is implemented as a pair of services with automatic service-removal-from-operation during failure.
Eucalyptus is being used by:
  • NTT Data (Japan IT company)
  • NetApp
  • Fujitsu 
  • NASA
  • Sony
  • Trend Micro

CloudStack

CloudStack is formerly known as Cloud.com (May 2010), that bought by Citrix (July 2011). The entire software is contributed to Apache in April 2012. High Availability feature is monitoring VMs, fencing (disable misbehave VM), and restarting failed VMs. Web UI is using server-side Java that has no HA out of the box (but could be set up to be HA with some knowledge). A lot of integration with Citrix XenServer and Citrix Netscaler hardware/software.
Among CloudStack clients are :
  • Tata Communications (India giant company)
  • KT (Korea landline operator)
  • Godaddy.com (Popular Web hosting provider)
  • Nokia Research Center

OpenStack 

OpenStack is a newer cloud computing platform (started by NASA & Rackspace in 2010), has many supporters and quickly growing feature list. 'Quickly Growing' have unstable connotation because newer features haven't endured test of time. Full HA (High availability) feature is not released yet (as of 25 august 2012), such feature is being targeted for Folsom  release in September 2012. Among expected new feature is integration with Corosync, Pacemaker to ensure HA in OpenStack services, new Cinder block storage system, and Quantum virtual network service. Existing HA feature is on Swift object storage service that replicates objects automatically. Web UI is based on python's Django framework.
OpenStack is used by :
  • Canonical (yes, creator of Ubuntu)
  • Intel (the giant that almost monopolizes processor manufacturing)
  • Deustche Telekom (service provider)
  • Rackspace hosting
  • AT&T Communications
  • NASA

High Availability menggunakan Citrix (Apache) CloudStack

Dashboard CloudStack
Pada April 2012 Apache menerima CloudStack dari Citrix sebagai salah satu project open source di Apache Incubator.
CloudStack merupakan platform software yang mengumpulkan sumberdaya-sumberdaya komputasi menjadi sebuah Cloud IaaS (infrastructure as a service). Bayangkanlah Amazon EC2 yang dapat diinstall di data center pribadi kita.
Service Offering - mirip Amazon EC2
Dengan CloudStack, kita bisa memberikan layanan cloud ke user kita.. Mereka dapat melaunch Server sesuai keinginan asalkan masih di batas kuota dan kapasitas sistem.

Pilihan Template VM

Sebenarnya komponen utama CloudStack adalah aplikasi web berbasis Java yang melakukan provisioning ke software hypervisor berupa XenServer, VMWare, ataupun KVM. Sebuah Cluster dalam terminologi CloudStack adalah sekumpulan host yang dikendalikan oleh hypervisor yang seragam.  

Apa yang didapat dari CloudStack antara lain :
  • Manajemen sentral berbasis web berfitur Ajax untuk VM-VM berbasis Xen, VMWare vSphere, ataupun KVM
  • Repository tempat menyimpan template image VM yang dapat diintegrasikan dengan OpenStack Swift
  • Konfigurasi offering VM : user dapat melaunch VM dengan memilih paket yang di dalamnya sudah terkonfigurasi ukuran memory, banyaknya CPU.

  • Integrasi dengan Citrix Netscaler untuk Elastic load balancing dan Elastic IP
  • Load balancing biasa dengan Virtual Router atau F5 BigIP
  • Konfigurasi Firewall dan NAT untuk Virtual Router ataupun Jupiter SRX
  • High Availability untuk VM yang dijalankan di CloudStack (Jika ada masalah dengan salah satu Host, CloudStack akan memboot VM tersebut di Host yang masih sehat dengan asumsi VM menggunakan virtual hard disk yang tersimpan di shared storage)
  • Provisioning VLAN
Konsep Deployment Umum Cloudstack
CloudStack dideploy dengan menyediakan dua jenis storage, primary dan secondary. Secondary Storage digunakan untuk menyimpan template VM atau ISO sebagai image dasar pembuatan virtual machine. Primary storage digunakan untuk menyimpan data hard disk milik VM-VM. CloudStack sendiri menyimpan informasi di database MySQL. Dianjurkan untuk menyediakan 3 jalur jaringan pada semua host : jaringan publik internet, jaringan privat (LAN), dan jaringan khusus manajemen dan storage.
Salah satu konfigurasi deployment CloudStack


Aspek penting dalam high availability ialah moda failure yang disupport. Di dalam sistem di atas ada beberapa moda failure :
  • Host Failure : kegagalan atau kerusakan pada host yang menjadi bagian dari Cluster
  • Management Server Failure : kegagalan atau kerusakan pada server CloudStack
  • Primary Storage Failure : kegagalan pada penyimpanan data utama (virtual hard disk)
  • Secondary Storage Failure : kegagalan pada penyimpanan template atau ISO 
  • Management Server MySQL database failure : kegagalan pada database CloudStack
Mekanisme host failure Failover pada CloudStack

Moda Failure yang ditangani oleh CloudStack ialah Host Failure. Secondary Storage failure ditangani oleh infrastruktur OpenStack Swift, yang menyimpan image secara rendundan pada cluster storagenya sendiri. Management Server Failure dapat ditangani dengan menjalankan CloudStack pada dua host yang diberi load-balancing. Mgmt Server MySQL database failure ditangani dengan memasang DRBD pada storage mysql atau mengkonfigurasi MySQL secara master-slave.

Digabungkan dengan High Availability Storage System (CloudStack tidak menyediakan layanan storage primary, hanya interface ke sistem storage berbasis iSCSI atau NFS) maka CloudStack dapat menjadi platform yang cukup robust untuk deployment sistem.

Referensi:

Thursday, August 23, 2012

(Inexpensive) Highly Available Storage Systems

The internet has pampered our users by giving them 99.9% uptime. Now every corporation needs to have similar availability to 99.9%. One part of high availability solution is the storage. In the good old proven corner we have SAN and NAS storage solutions. They are not always highly available (you must ensure the system has more than one controller, in case one controller is broken, and RAID 5 is nowhere enough these days, choose RAID 6, and dont forget network rendundancy), but they are almost always expensive. In the era where IT Directors forced to think 'cost transformation', we always welcome cost-saving alternatives.
New hardware developments influenced our infrastructure options, lets write the factors down:
  • abundance of cheap gigabit ethernet card & router -> this allow us to forget expensive fiber connetivity, and allow us to leverage newly developed distributed systems.
  • cheap large SATA drives -> large commodity storage for the masses
Development of Infrastructure support software also influenced in giving more options :
  • OpenSCSI iSCSI target -> converting our SATA drives into shared storage
  • CLVM -> clustered LVM allows distributed volume management
  • OCFS2 -> Oracle Clustered Filesystem is an open sourced cluster file system developed by Oracle 
  • Ceph RADOS massive cluster storage technology
  • Amazon Elastic Block Storage -> we need to run our servers in AWS to use these
  • DRBD -> Distributed Replicated Block Device driver, allows us to choose between synchronous mirroring and asynchronous mirroring between disks attached to different hosts
A Question in Serverfault asks about mature high availability storage systems, summarizing the answers :
  • The surprising fact is that the oracle-developed OCFS 2 is not the tool of choice because needs downtime to add a capacity to the cluster filesystem
  • Ceph RADOS is promising newcomer in the arena, giving Google File System like characteristics with standard block interface (RBD). Old enterprises usually hesitate to use such new technology.
  • The ultimate choice of sys admins is HA NFS (highly available NFS).

HA NFS

Requirements for Highly Available NFS is as follows:
  • two host (PC or Server)
  • two set of similar sized storage (identical SATA II disks preferred)
  • cluster resource manager : Pacemaker
  • cluster messaging : Heartbeat (alternative: Corosync)
  • logical volume manager : LVM2
  • NFS daemon
  • DRBD (Distributed Replicated Block Device)
Distributed Replicated Block Device ( http://www.drbd.org/ )
Installation instruction is available at references below, in short :
  1. install DRBD devices (drbd0) on top of physical disks (ex: sda1) : fill in /etc/drbd.d/nfs. Do the same in both host (server)
  2. configure LVM to ignore physical disk used (ex: sda1). configure LVM to read volumes in DRBD (ex: drbd0), disable LVM cache (fill in /etc/lvm/lvm.conf)
  3. create LVM Physical Volume (pvcreate), LVM volume group (vgcreate), LVM logical volume (lvcreate). see your favorite LVM tutorial for details.
  4. install & configure Heartbeat
  5. install & configure Pacemaker. Pacemaker must be configured to have :
    1. drbd resource (ocf:linbit:drbd). automatically set DRBD master/slave mode according to situation at hand
    2. nfs daemon resource (lsb:nfs or lsb:nfs-kernel-server)
    3. lvm resource (ocf:heartbeat:LVM, ocf:heartbeat:Filesystem)
    4. nfs exports resource (ocf:heartbeat:exportfs)
    5. floating ip addr resource (ocf:heartbeat:IPaddr2)
Illustration of two host HA NFS system
Automatic failover mechanism could be activated to give seamless NFS operation during failover. The advantage of this HA NFS configuration is :
- clients will use old proven NFS interface
- realtime synchronous replication ensure no data loss

The disadvantage of shown configuration is
- no horizontal scalability
- standby capacity of second host is not leveraged

Linbit (www.linbit.com) provides enterprise support for DRBD.

Ceph RADOS

Ceph is a distributed object storage system. Ceph (http://ceph.com/docs/master/)  has these features :
  • massive clustering with thousands Object Storage Devices (OSD). Ceph could run with minimum 2 OSD.
  • automated data replication with per-pool replication settings (ex: metadata : 3x repl, data: 2x repl)
  • data striping to improve performance across cluster
  • has POSIX filesystem client (CephFS), Openstack Swift compatible interface, and even REST interface
  • block device interface (RBD) -> suitable for virtualization, OpenStack cloud support
  • horizontal scalability : add more OSD and/or disks for more storage or performance
Illustration of Ceph RADOS cluster
The disadvantage of Ceph RADOS cluster is :
- new technology, need training and hands on experience to operate 
- stability not yet industry-proven, but already seen large deployments
Inktank (http://www.inktank.com/) and Hastexo GmbH (www.hastexo.com) provides enterprise support for Ceph Cluster deployments.

Conclusion

New inexpensive storage systems technology now exists to be leveraged to provide high availability storage and/or performance for our legacy applications that still cluster-unaware. 

Monday, August 13, 2012

Popup Text area using jQuery

In times where we have too small screen estate (primarily because too much information in the screen), we settled using popup window containing textarea to input comments.  Just implemented such simple popup textarea functionality using jquery-popbox - A small wonder.  OSS and sharing and all.



Thursday, August 2, 2012

Anti pattern : Ignoring Exceptions

Did you ever find out that your app didn't do what it was expected to do ? But no clue whatsoever about the cause. You might stumbled upon 'Ignored Exceptions' antipattern. Or worse.. a special case of this antipattern is 'whatever passed the acceptance test' mindset that causing the programmer to code in such way.
The most primitive example of this antipattern is :
   ON ERROR RESUME NEXT
[this one-liner is an example from Visual Basic]

In Java programming language, an example for Ignored Exceptions antipattern :

   try {
  ... some code ..
  --- more code ---
   }
  catch (Exception ex) {
  }

Note  that the exception is blank. It silently ignores any errors.
Yes, there are cases where this sort of code would be hard to avoid, but in most cases better alternatives exist. Such as :
- using Log4j to log the error
- converting the exception to a message that could be understand by the user
At the minimum , the error should be written to the console, so troubleshooting can be done about the error :
  catch (Exception ex) {
     ex.printStackTrace() ;
  }