Wednesday, November 21, 2007

The children of my children are my enemies... What about their children

I had to deal with this problem related to process sincronization some months ago, and since Kwort 2.4 core edition is waiting for Linux 2.6.24, I will show you what my problem was and how I dealed with it.

I was trying to run a process, this process forks itself and ran some bash scripts. So far there's nothing bad on it, the problem is that those shell scripts create some childs, and it is more than sure that the parent (the main script) die before the childs, so I'll get some orphans process. And the main C program shouldn't die before all the childs die.

So, let's take for example the code in the first post in this blog "Checking the network with bash". In this case a wait in the end would fix the problem, but let's just think that we can't modify this script:
#!/usr/bin/env bash

function pinging(){
PING="$(which ping) -c 1 -W 1"
${PING} ${SUBNET}.${LAST} > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo -e "${SUBNET}.${LAST} is up"

if [ -z ${2} ]; then
for((x=1;x<255;x++)); do
${0} ${1} ${x} &
pinging ${1} ${2}

So, this was the first attempt to control this mess but unfortunately didn't work as expected because this process was in another Process Group ID (avoid comments about controlling the return value of fork and such, since this is an example of something that didn't work):
int main(int argc, char *argv[]){
int status;
waitpid(-1*getpgrp(), &status, 0);
execlp(argv[1], argv[1], argv+1, NULL);
return 0;

So, basically, the idea was that the main C program shown above shouldn't die until all the childrens of the scripts die (that's why I was using using getpgrp(), which didn't work as expected for what I explained above).
On Linux, running ps axfj, showed me that the small childrens (those one generated by the script) are in the same group, but I couldn't wait for them, as they are not my childrens, but then child of my child process children's.

After dealing a little of time I came up whit this program that control all the childrens, grandchildrens and rest of the "family". The code is very simple to understand and explains on itself:
int main(int argc, char *argv[]){
int status;
pid_t pid;
if((pid=fork()) != 0) {
waitpid(pid, &status, 0);
while(killpg(pid, 0) != -1)
else if(pid==0){
setpgid(getpid(), getpid());
execv(argv[1], argv+1);
return 0;

Thanks to Marcel, who actually helped me a lot with some concepts and ideas to find a workaround to this.

No comments: