Saturday, October 21, 2017

Running Pods as Anyuid in Openshift Origin

When using Openshift Origin, by default all pods are running with 'restricted' context, where they are forced to use a generated user id. Some Containers just doesn't work that way, so we need to relax the restriction a bit. Reference : https://blog.openshift.com/understanding-service-accounts-sccs/

Creating A service account

First, create a service account in your project (see https://docs.openshift.com/enterprise/3.0/admin_guide/manage_scc.html). These are a sample yaml to do that :
kind: ServiceAccount
apiVersion: v1
metadata:
  name: mysvcacct
Note that underscore are not allowed as service account name despite the official openshift example contains it.

Assigning anyuid

Then, a cluster administrator should login to the project and assign anyuid SCC :


oc loginoc project theprojectoc adm policy add-scc-to-user anyuid -z mysvcacct

Using the service account

Now, edit the deployment config or the replication controller config to use the service account :

apiVersion: v1
kind: ReplicationController
metadata:
  name: spark-master-controller
  namespace: sparkz
  selfLink: /api/v1/namespaces/sparkz/replicationcontrollers/spark-master-controller
  uid: a1f26de8-b6e3-11e7-846c-005056a56b12
  resourceVersion: '129053544'
  generation: 2
  creationTimestamp: '2017-10-22T04:44:04Z'
  labels:
    component: spark-master
spec:
  replicas: 1
  selector:
    component: spark-master
  template:
    metadata:
      creationTimestamp: null
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: 'gcr.io/google_containers/spark:latest'
          command:
            - /start-master
          ports:
            - containerPort: 7077
              protocol: TCP
            - containerPort: 8080
              protocol: TCP
          resources:
            requests:
              cpu: 100m
          terminationMessagePath: /dev/termination-log
          imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: mysvcacct
      securityContext: {}
status:
  replicas: 1
  fullyLabeledReplicas: 1
  readyReplicas: 1
  availableReplicas: 1
  observedGeneration: 2

Note the serviceAccountName at the same level as containers inside spec. Add the row if it doesn't exist yet.

Cleaning Openshift Origin Images Registry

When using and tending an Openshift Origin cluster (for example, Origin version 3.7), it is normal to start the storage allocation in small sizes. However soon we find that storage for registry get filled up quickly with images from each build process. This post will show how to clean them up.

Preparation before pruning

First you need oc (origin client) binary and a user account with cluster administration capability.
If the openshift docker registry  is installed inside the cluster without external access, then you also going to need OS access to one of the hosts inside the cluster.
First step is to login to the cluster from your client or inside one of the hosts:
oc login

Prune steps

Reading the documentation (https://docs.openshift.com/enterprise/3.0/admin_guide/pruning_resources.html) we find that the pruning starts at deployment, then builds, and last images.

Pruning Deployment 

Run this to preview which deployment are going to be pruned:
oc adm prune deployments
Then execute the pruning :
oc adm prune deployments --confirm
We could use the CLI utility oadm (origin adm) or oc adm command, depending on availability of the executable.

Pruning Builds

Run this to preview the builds :
oc adm prune builds
or
oadm prune builds
Then execute the pruning :
oc adm prune builds --confirm

Pruning Registry Images

And finally, we prune images. Images could not be pruned if there are some deployments are referring to the image, thus the prune deployment  and build steps above are need to be done first.

oadm prune images
Confirm the pruning with additional --confirm flag:
oadm prune images --confirm

If we the registry is not accessible, we get this message :
error: error communicating with registry: Get http://IPredacted:5000/healthz: dial tcp IPredacted:5000: getsockopt: operation timed out
Such error means we need to ssh into one of the hosts in to be able to prune images.
 

Lessons Upgrading MySQL DB 5.1 to Percona 5.7

I just recently upgraded a database server that were previously running MySQL 5.1 (standard, Sun/Oracle version) into Percona Server 5.7. A few quirks notable enough to warrant this blog post.

Planning and Preparation

A Percona blog post (mysql-upgrade-best-practices) stated that the best way to upgrade with such huge difference in major version (5.1 to 5.7) is to do a full logical dump for all database except mysql, dump user and grants, uninstall database and remove datafiles, then install new version and import the logical dump and grants. But alas the database we are going to upgrade is so big and the IO subsystem became some sort of bottleneck when doing logical dump, our colleagues tried to do mysqldump and it tooks more than 2 days to run, prompting us to cancel the backup (otherwise it would interfere with workday application usage of the database).  Reading the blog I noted that :
  1. for major version upgrade, using logical dump  and restore dump is the safest way to go.
  2. for minor version upgrade, in-place upgrade is possible.
  3. for major version upgrade, do not skip versions. I deduced that the sequence is : 5.1 ⇨ 5.5 ⇨ 5.6 ⇨  5.7
  4. two mandatory reading for my upgrade  - http://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html and http://dev.mysql.com/doc/refman/5.6/en/upgrading-from-previous-series.html
  5. another two mandatory reading for percona server : https://www.percona.com/doc/percona-server/5.5/upgrading_guide_51_55.html#changes-in-server-configuration and https://www.percona.com/doc/percona-server/5.6/changed_in_56.html
The main thing is to be careful. In IT world, to be careful is to do backups before you do something big.
Backing up is mandatory for any upgrade task relating to production system. Our database server is running on VMWare platform, so we prepared to do a snapshot before doing the upgrade. But alternative backup is in order, so we installed Percona Xtrabackup to create such backup. Another option is to use mydumper (see https://www.percona.com/blog/2015/11/12/logical-mysql-backup-tool-mydumper-0-9-1-now-available/), but in the past I were having problems compiling it so I avoid it.

Percona Xtrabackup

Being an open-source software, it is strange that the Percona Xtrabackup PDF manual is not easily found. Need to register your name, address, company, et cetera, just to get a hold of the PDF version of the manual. Anyway, the quirk is, sometimes the manual said 'innobackupex' and sometimes said 'xtrabackup'. Checking the installed file reveals that innobackupex is symbolic-link to xtrabackup.. seems that the two are now interchangeable, but no such statement found in the PDF manual. 
Quoting the Percona Xtrabackup 2.4.8 PDF manual in chapter Ten :
innobackupex is the tool which provides functionality to backup a whole MySQL database instance using the xtrabackup in combination with tools like xbstream and xbcrypt
The paragraph above is confusing, because seems that the correct way to do backup is by using innobackupex. But in the web version of the manual we get :

innobackupexinnobackupex is the symlink for xtrabackupinnobackupex still supports all features and syntax as 2.2 version did, but is now deprecated and will be removed in next major release.

Seems that the web version is the best one to follow, unfortunately I noticed this after completing the backup task.
Our backup command is like this : 
xtrabackup --backup --compress --compress-threads=4  --target-dir=/targetdir/compressed/ --user=root --password=xxxxxx
One quirk is xtrabackup doesn't like the internal innodb engine (error : Built-In InnoDB 5.1 is not supported in this release), so I need to add these in /etc/my.cnf and restart the db before doing the xtrabackup command :

ignore-builtin-innodb
plugin-load=innodb=ha_innodb_plugin.so

Replacing MySQL 5.1 with Percona Server 5.1

To replace mysql 5.1, we uninstall them and then install percona server. Both (mysql and percona server) are using my.cnf. The steps are :
  1. stop mysqld service (note: mysqld is the Oracle-based service, percona-based service in RHEL is mysql)
  2. ensure backups are done, if not, do create one backup.
  3. uninstall mysql by : yum remove mysql mysql-server
  4. install percona server by : yum install Percona-Server-client-51 Percona-Server-server-51 (refer to the steps in yum-related installation in https://www.percona.com/doc/percona-server/5.1/installation.html)
These steps quite straightforward, but Percona Server 5.1 don't like the innodb engine plugin that were installed before (Error: The option ignore-builtin-innodb is incompatible with Percona Server with XtraDB), forcing me to remove/comment these two  lines :


#ignore-builtin-innodb#plugin-load=innodb=ha_innodb_plugin.so
After that and starting the mysql service (not mysqld), all is well.

Upgrading Percona Server 5.1 to 5.5

  1. stop mysql service
    • service mysql stop
  2. check installed packages 
    • rpm -qa | grep Percona-Server
  3. uninstall 
    • rpm -qa | grep Percona-Server | xargs rpm -e --nodeps
  4. install 5.5 version by 
    • yum install Percona-Server-server-55 Percona-Server-client-55
  5. run in skip grant tables mode:
    • /usr/sbin/mysqld --skip-grant-tables --user=mysql &
  6. then do the actual upgrade process:
    • mysql_upgrade
  7. stop and start :
    • service mysql stop
    • service mysql start
The process run smoothly with no quirks.

Upgrading Percona Server 5.5 to 5.6


For 5.5 ⇨  5.6 upgrade, similar steps are found in https://www.percona.com/doc/percona-server/5.6/upgrading_guide_55_56.html :
  1. stop mysql service
    • service mysql stop
  2. uninstall by 
    • rpm -qa | grep Percona-Server | xargs rpm -e --nodeps
  3. install by
    • yum install Percona-Server-server-56 Percona-Server-client-56
  4. run in skip grant tables mode:
    • /usr/sbin/mysqld --skip-grant-tables --user=mysql &
  5. then do the actual upgrade process:
    • mysql_upgrade
  6. stop and start :
    • service mysql stop
    • service mysql start
In step 4, the server refused to start because of the unknown 'log_slow_queries' option in /etc/my.cnf. Seems that in 5.6 this is replaced by slow_query_log_file (see https://stackoverflow.com/questions/10755151/mysql-what-is-the-difference-between-slow-query-log-vs-log-slow-queries), so I replace log_slow_queries with slow_query_log_file.
After that resolved, we proceed to step 5 we found another error :
mysqlcheck: Got error: 1045: Access denied for user 'root'@'localhost' (using password: NO) when trying to connect
This error pops up when I tried to run mysql_upgrade. Seems this is a known bug (https://bugs.mysql.com/bug.php?id=72896) that  didn't  get fixed, in which mysql_upgrade calls flush privileges which rereads grant tables . Our solution is to use alternate syntax to execute mysql_upgrade :
mysql_upgrade -u root -p 
 After doing that, all is well.

Upgrading Percona Server 5.6 to 5.7

The reference for the 5.6 ⇨ 5.7  upgrade is https://www.percona.com/doc/percona-server/5.7/upgrading_guide_56_57.html. Which essentially the same with 5.5 ⇨ 5.6 upgrade, so I would not duplicate here. But there is a few major difference :

  • By default, mysql_upgrade will convert tables that are using  date,time, and timestamp to the MySQL 5.6.4 format (note that the 5.6 upgrade process does not issue a warning about these  tables, and doesn't suggest conversion process either). The new binary date/time format are more space-efficient and allows extension types (such as TIMESTAMP(4)) with fractional seconds. Refer to https://www.percona.com/blog/2016/04/27/upgrading-to-mysql-5-7-focusing-on-temporal-types/, the majority running time of mysql_upgrade are now spent converting these tables (shown as ALTER TABLE ... FORCE). A workaround to prevent this is to run mysql_upgrade with -s flag / --upgrade-system-tables flag, which does not upgrade data tables.
  • Default sql_mode is now includes  ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES. The only_full_group_by now fixes enforces group by to be in proper form, but unfortunately this breaks many sloppy SQLs that previously allowed to run. The strict_trans_tables changes the mysql behavior on INSERT and UPDATE queries regarding invalid field contents.

Conclusion

In-place upgrades from 5.1 to 5.7 are indeed feasible, especially when the data size is quite large and mysqldump takes too much time to run. For backups, Percona Xtrabackup will perform the task with better speed than mysqldump (but with binary backup as a result).





Thursday, March 9, 2017

Securing Openshift Origin Nodes

Background

We have deployed Openshift Origin based cluster based on  Origin Milestone 4 release. When security assessment performed on several of the applications in the cluster, some issues crop up and needs further remediation. Some issue related to application code, some others related to the openshift node configuration, which we shall discuss here.

SSH issues

One of the issues is SSH weak algorithm support.
To remediate that, we need to tweak /etc/sshd/sshd_config by inserting additional lines :

#mitigasi assesment security SSH weak algoritm support
Ciphers aes128-ctr,aes192-ctr,aes256-ctr
MACs hmac-sha1,hmac-ripemd160,hmac-ripemd160@openssh.com

SSL issues

The other issue is related to SSL crypto algorithms. The cipher suite 3DES is no longer considered secure, so  we need to tweak /etc/httpd/conf.d/000001_openshift_origin_node.conf (line 63) by adding   !3DES:!DES-CBC3-SHA  :

SSLCipherSuite kEECDH:+kEECDH+SHA:kEDH:+kEDH+SHA:+kEDH+CAMELLIA:kECDH:+kECDH+SHA:kRSA:+kRSA+SHA:+kRSA+CAMELLIA:!aNULL:!eNULL:!SSLv2:!RC4:!DES:!EXP:!SEED:!IDEA:!3DES:!DES-CBC3-SHA


We also need to disable SSLv2 and v3 in 000001_openshift_origin_node.conf (line 58) :

SSLProtocol ALL -SSLv2 -SSLv3

And, because SSL certificate chains its a bit tricky, we are required to have SSLCertificateChain line too (inserted in line 32 of the same file)

SSLCertificateChainFile /etc/pki/tls/certs/localhost.crt

The httpd SSL virtual host configuration conflicts with openshift's, so need to delete all virtual host line in /etc/httpd/conf.d/ssl.conf .

The final step, files localhost.crt, localhost.key in /etc/pki/tls/certs/localhost.crt and /etc/pki/tls/private/localhost.key respectively  need to be replaced with one from the company's valid SSL certificates.

Restart httpd afterwards.

SSL in node proxy issue

Nodejs websocket proxy runs in port 8443, and also have SSL issues. We use the websocket proxy if the application in openshift requires websocket technology.

In /etc/openshift/web-proxy-config.json (between private key line at line 125 and } in 126), need to add these line :

"ciphers" : "kEECDH:+kEECDH+SHA:kEDH:+kEDH+SHA:+kEDH+CAMELLIA:kECDH:+kECDH+SHA:kRSA:+kRSA+SHA:+kRSA+CAMELLIA:!aNULL:!eNULL:!SSLv2:!RC4:!DES:!EXP:!SEED:!IDEA:+3DES:!DES-CBC3-SHA"

Also need to replace this file - /opt/rh/nodejs010/root/usr/lib/node_modules/openshift-node-web-proxy/lib/utils/http-utils.js with the latest from https://raw.githubusercontent.com/openshift/origin-server/master/node-proxy/lib/utils/http-utils.js. Just edit the file in vi, delete all lines, insert with the raw lines from github.

Conclusion

Some maintainance are needed to ensure openshift origin nodes are not a security liability. These steps would reduce number of security issues need to be dealt with when securing apps in the Openshift origin cluster.