Sunday, January 31, 2016

Long running process in Linux using PHP


To do stuff, I usually create web-based applications written in PHP. Sometimes we need to run something that takes a long time, far longer than the 10 second psychological limit for web pages.
A bit of googling in stack overflow found us this, but I will tell the similar story with a different solution. One of the long running tasks that need to be run is a Pentaho data integration transformation.

Difficulties in long running PHP scripts

I encountered some problems when trying to make PHP do long running tasks :
  1. PHP script timeout. This could be solved by running set_time_limit(0); before the long running tasks.
  2. Memory leaks. The framework I normally use have a bit of memory issues, this can be solved either by patching the framework (ok, it is a bit difficult to do, but I did something similar in the past) or splitting the data to process into several batches. And if you are going to loop the batches in one PHP run, make sure after each batch there are no dangling reference to the objects processed. 
  3. Browser disconnects in Apache-PHP environment would terminate the PHP script. During my explorations I found that :
    1. Some firewall usually disconnects a HTTP connection after 60 seconds.
    2. Firefox have a long timeout (300 seconds or something, ref here
    3. Chrome have timeout similar to Firefox (about 300, ref here), and longer for AJAX (stackoverflow ref doesnt timeout after 15 hours)
  4. Difficulties in running pentaho transformations, because the PHP module would run as www-data, and will be unable to access the kettle repository stored in another user's home directory.


I have experiences using these workarounds to force PHP to be able to do long running web pages :
  • Workaround 1 : use set_time_limit(0); and ignore_user_abort(true); to ensure script keeps running even after client disconnects.  Unfortunately the user will no longer see the result of our script.
  • Workaround 2 : use HTTPS so the firewall will unable to do layer 7 processing and doesn't dare disconnect the connection. If the user closed the browser then the script would still terminate, except when you also do workaround 1.
I haven't tried detaching a child process yet like , but my other solutions involve separate process for background processing with similar benefits.

Solution A - Polling task tables using cron

It is better to separate the user interface part (PHP web script) with the background processing part. My first solution is to create cron task that are run every 3 minutes, which runs a PHP CLI script which checks a background task table for tasks with state 'SUBMITTED'. Upon processing the task, the script should update the state to 'PROCESSING'. 
So the user interface/ front end only checks the background task table, and when the user orders to, inserts a task there with the specification required by the task, setting the state to 'SUBMITTED'.
When cron gets to run the PHP CLI script, it would check for tasks, and if there any, change the first task state to PROCESSING and begin processing. When processing complete, the PHP CLI script would change the state to COMPLETED.
Complications happen, so we will need to do risk management by :
  1. logging phases of the process in some database table, including warnings that might be issued during processing.
  2. recording error rows if there is any in another database table, so the user could view problematic rows
Currently this solution works, but recently I came across another solution that might be a better fit for running a Linux process.

Solution B - Using inotifywait and control files

In this solution, I created a control file which contains only one line of CSV. I prepared a PHP CLI script which parses the CSV and executes a long running process, and also a PHP Web page which would write to the control file. Inotifywait from inotify-tools will listen on file system notifications from Linux kernel that are related to changes on the control file.
The scenario is like this :
  1. User opens PHP web page, and choose parameters for the background task, clicked on Submit
  2. PHP web page receive the submitted parameters, and write them into the control file, including job id. The user received a page that states 'task submitted'.
  3. A shell script that running inotifywait, will wait for notifications on the control file, specifically for the close_write event
  4. After close_write event received, the shell script will continue, and run PHP CLI script to do the background processing
  5. PHP CLI script reads the control file for parameters and job id
  6. PHP CLI script executes linux process, redirecting the output to a file identified by job id in a specific directory
  7. The web page that states 'Task Submitted' could periodically poll the output file with the job id, and shows the output to the end user (OK, this one I need to actually try later)
  8. PHP CLI returns, the shell script performs an endless loop by going to (3)


By using Linux file system notifications, we could trigger task execution with parameter specified from a PHP web page. The task could be run as another Linux user, provider the user running the shell script. Data sanitization are done by php, so no strange commands could be passed to the background task. 

These solutions are written entirely in open source solutions. I saw that Azure have WebJobs which might fulfill similar requirements that I have, only it is in Azure platform which I never used.


Priya Kannan said...

This information really worth saying, i think you are master of the content and thank you so much sharing that valuable information and get new skills after refer that post.
PHP Training in Chennai

johnsy sai said...

The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
Digital Marketing Training in Chennai

Digital Marketing Training in Bangalore

digital marketing training in tambaram

digital marketing training in annanagar

gowsalya said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
full stack developer training in annanagar

full stack developer training in tambaram

full stack developer training in velachery

Unknown said...

It is better to engaged ourselves in activities we like. I liked the post. Thanks for sharing.
Thanks a lot for sharing us about this update. Hope you will not get tired on making posts as informative as this. 

selenium training in pune

Mouni yoga said...

This is a terrific article, and that I would really like additional info if you have got any. I’m fascinated with this subject and your post has been one among the simplest I actually have read.
python training institute in chennai
python training in Bangalore
python training institute in chennai

isai 14 said...

Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....
Blueprism training in Pune

Blueprism online training

Blue Prism Training in Pune

Naga Manickam said...

Excellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this

Data Science training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune

sudhagar raja said...

All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.

java training in omr | oracle training in chennai

java training in annanagar | java training in chennai

Revathy A said...

You’ve written a really great article here. Your writing style makes this material easy to understand.. I agree with some of the many points you have made. Thank you for this is real thought-provoking content

angularjs Training in bangalore

angularjs Training in btm

angularjs Training in electronic-city

angularjs online Training

angularjs Training in marathahalli

Anbarasan14 said...

Very informative blog! I liked it and was very helpful for me. Thanks for sharing. Do share more ideas regularly.

Spoken English Class in Chennai
Best Spoken English Class in Chennai
Spoken English Training Center in Chennai
IELTS Coaching Centre in Chennai
Best IELTS Courses in Chennai
IELTS in Chennai
IELTS Coaching Center near me

dinesh said...

I believe that your blog will surely help the readers who are really in need of this vital piece of information. Waiting for your updates.
Best Selenium Training Institute in Bangalore
Selenium Testing Training in Bangalore
Selenium Institutes in Bangalore
Python Tutorial in Bangalore
Python Coaching Centers in Bangalore
Best Python Institute in Bangalore

Riya Raj said...

Outstanding information!!! Thanks for sharing your blog with us.
Spoken English Institute in Coimbatore
Spoken English Training in Coimbatore
English Training Institutes in Coimbatore
Spoken English Training
Spoken English Course

Ramya Krishnan said...

This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points. To appreciate this I like to share some useful information regarding Microsoft Azure which is latest and newest,


Azure Training in Chennai
Azure Training Center in Chennai
Best Azure Training in Chennai
Azure Devops Training in Chenna
Azure Training Institute in Chennai
Azure Training in Chennai OMR
Azure Training in Chennai Velachery
Azure Online Training
Azure Training in Chennai Credo Systemz

haripriya said...

Impressive. Your story always bring hope and new energy. Keep up the good work.
Microsoft Azure online training
Selenium online training
Java online training
Java Script online training
Share Point online training

Aman CSE said...

One of the best content i have found on internet for Data Science training in Chennai .Every point for Data Science training in Chennai is explained in so detail,So its very easy to catch the content for Data Science training in Chennai .keep sharing more contents for Trending Technologies and also updating this content for Data Science and keep helping others.
Cheers !
Thanks and regards ,
Data Science course in Velachery
Data Scientists course in chennai
Best Data Science course in chennai
Top data science institute in chennai

sasitamil said...

I found this informative and interesting blog so i think so its very useful and knowledge able.I would like to thank you for the efforts you have made in writing this article.

devops online training

aws online training

data science with python online training

data science online training

rpa online training

Franca Famous said...

I love this content, thank you for this article. To make your downloads of any file check out Wapquick. Wapquick (What the new name Toxicwap is being called) is a large movie and TV series download website.

Anjali Siva said...

I feel happy to see your webpage and looking forward for more updates.
Machine Learning Course in Chennai
Machine Learning Training in Velachery
Data Science Course in Chennai
Data Analytics Courses in Chennai
Data Analyst Course in Chennai
R Programming Training in Chennai
Data Analytics Training in Chennai
Machine Learning course in Chennai