Posts

Copying Big Oracle Tables into Iceberg

During my piloting of Trino Query Engine (formerly PrestoSQL), I tried several datawarehouse destination options. The first option is using Trino's Hive connector with the data stored in Minio storage accessed using S3 API. The Minio services were  run on IBM hardware (ppc64le architecture), but that's another story for another blog post. The metadata were stored in a Hive metastore to serve the metadata, which takes some effort because at some point the metastore need to access the S3 storage (which I don't understand why) and thus need to have proper Hadoop AWS jars. The second option is using Trino's Iceberg Connector to store the data in the same Minio storage and Hive metastore with Iceberg table format. For reference's sake, I will note the version of the software being used in this experiment.  Trino version 442, deployed on openshift OKD 4.13 using Pulumi and Trino Helm template as starting point. Using pristine Trino image taken from docker hub (docker.io...

Rants On NFS Lack of File Handle Visibility To Sysadm

 NFS is a not-so-recent solution to share filesystem across linux nodes. It have some capability that are currently indispensable for Linux Clusters : to lock files across nodes and allow either exclusive or non-exclusive access to the same file. Fault Tolerance / Recovery I have read some papers on NFS, it should be able to recover a restarting host / server. Unfortunately in several occassion we found this to be not quite true, after a host serving NFS being restarted, we have stale handle errors in the client. The workaround is to restart NFS client, and if that still doesn't fix the situation, restart NFS server. In our cases sometimes we need to restart twice across the cluster (because the client hangs running a program over NFS). Some might said program shouldn't be run over NFS (and only data files should) but we have deployed a SAP documented cluster architecture that requires such use of NFS. Locks When a file were locked in the NFS, a lock is being created in the hos...

Decision Making Puzzle 1

Lets say that currently you are managing a big multi-year project that have significant impact in your enterprise application landscape. Suddenly there is another significant business initiative that also requires a thoughtful, distinct change in the core applications to support it, and have somewhat shorter timeline. Do you :  a) request help from one of your system implementation vendor in the first project, knowing that some parts require some expertise that this vendor's team has, and knowing it would delay your first project because the whole team moves on from working on the first project to the second one, b) finding another team to support the change, knowing there is some effort that will need to be put  in preparing budget, preparing procurement process to for the new team, so your involvement in first project will be reduced also, or c) manage the changes between yourself and some peers, knowing that the only person to understand the needed change in the core applic...

Copying Big Oracle tables Using Apache Spark

Background Sometimes we need to copy table data from one database to another. Logically the best way to do this is to do database specific export (expdp in oracle lingo) and import in the destination database (impdp in oracle). But sometimes there are deficiencies in this method such as unable to do parallel process for single table, and requirements of DBA access in the database. This post shows how to do table copy using Apache Spark and Apache Zeppelin. Preparations In order to allow Apache Spark access to oracle jdbc connections, we need to add dependency to ojdbc6.jar. To do this, write this paragraph in Zeppelin : %dep z.reset() z.load("/path/to/ojdbc6.jar") Basic Approach The most basic approach to copy table data is to retrieve data in one query and save the  resulting records in the target database. val tableNameSrc = "TABLENAME" val tableNameTrg = "TABLENAME" import java.util.Properties Class.forName("ora...

Automating tasks using Python for sending files through SFTP

Scheduled background tasks are a staple of IT world. For example, some things (programs) should be done in a certain time in each day,  or some procedure should be run a few times in an hour. In this we will discuss how to send a file to another server using SFTP. Crontab Most of the time, background tasks are triggered using UNIX-like crontab, there are some alternatives as well but lets say it is out of current post's scope. The crontab entry usually calls a shell script that in its own triggers other program, such calling a  web site using wget or curl, or invoking application command (php yiic commandname).  The existing crontab entry for sending the file to another server uses a shell script to prepare the files, create control files, and sending them across the network using sshpass with input redirection. The Challenge The problem we're facing is that sometimes the remote server doesn't respond normally. The cause is still unknown, and after getting incomplete file...

Tips using HMC for VMWare Users

Image
 Creating a Virtual Machine is a necessary step of most of software development process. The reason is that we want to optimalise our hardware use, to isolate one application instance from another application  instance running in the same hardware. For that purpose I recently tried to create a VM instance in an IBM machine, a PowerVM. Creating such VM is quite different than creating such VM in VMWare, and this post will show some tips when facing challenges creating a PowerVM. HMC Console First, we need to access HMC (Hardware Management Console), which is the interface from where we configure our hardware and logical partitions. One HMC could manage more than one system (which stands for a single box of server hardware).  Choose Resource >> All Systems to get an overview of which systems are available to manage. From the HMC we could create a new Logical Partition. The menu path is : Resource >> All Systems >> [ choose system ], then by clicking on Cre...

Displaying running PHP 7.4 stack trace on Red Hat UBI Container

Image
 When running a PHP application in container, sometimes we need to find out what bottleneck is impacting the app's response time. This post documents the approach to do so, but still requiring access to root user in the container. The platform where I was running the PHP app is Red Hat Openshift Community (OKD) version 3.6, which is still using docker to run containers instead of Red Hat's podman. Preparation - Locating the pod and the worker node Open openshift's console home Select your Project (namespace) containing the pod Select Application => Pods Choose your pod from the list. Note the worker node hosting the pod. Install php debug-info Normally, the openshift platform doesn't allow containers running as root. So are in this case, the container will be running as normal random-uid-user. But we need to install php debuginfo packages, thus we need to run as root. The trick is to access the node running the application's pods, so we need admin access on the w...