Log in

activity wholist changelog info go back go back go forward go forward
wtf pthreads - now if you've got a pair of headphones...
you'd better get 'em on and get 'em cranked up
wtf pthreads
So I had this code that did essentially:
for (int i = 0; i < num_files; i++)
    if ((fd = open(files[i]->name, O_RDONLY)))
And I figured, hey, I've got an I/O scheduler and my disks have command queueing now, let's mitigate some of that seek death and do multiple stuffs in parallel. Well, here in ***america*** where fork() is cheap you'd do:
for (int i = 0; i < num_files; i++) {
    if (num_kids == MAX_KIDS)
    if ((fd = open(files[i]->name, O_RDONLY))) {
        if (!fork()) {
        } else {
    while (waitpid(-1, NULL, WNOHANG) > 0)
while (num_kids--)
And then it's off to the bar. For some reason I thought that maybe doing it in pthreads would be a good exercise, and tried it that way first. (Actually I tried POSIX AIO zeroth. Ha! Good one, guys. You almost got me.) That went something like:
for (int i = 0; i < num_files; i++) {
    if ((fd == open(files[i]->name, O_RDONLY)))
        pthread_create(&thread[min_thread_index], NULL, do_stuff, (void *)fd);
    /* hmm, how to throttle thread creation? */
    pthread_join(&thread[i % MAX_KIDS], NULL); /* bonghits in the hood */
    /* and then reset min_thread_index somehow? ugh */
Except that doesn't work, because you can only join with named threads, and not just whichever one happens to die first. I have no idea how long each of these threads is actually going to take, so round-robin is a waste of time. I could stick in a semaphore that every thread downs on init and ups on termination, and block the dispatch loop on that, but that's still not enough, since then I still need to figure out - somehow - which thread died so I can reuse its slot in thread[]. Why pthread_create() doesn't just return the new pthread_t is beyond me.

Of course ideally we'd have waitfd() that was smart enough to cope with thread IDs too, and you'd message the thread ID back out to the master process somehow, and you'd still have to allocate your own pthread_t's anyway. Hooray for unix.

Tags: ,
music: nine inch nails - gave up

pong! (x12) || ping?
From: (Anonymous) Date: November 17th, 2008 03:45 pm (UTC) (link)

OpenMP could be the solution

have a look at openmp, this is really easy and powerfull
ajaxxx From: ajaxxx Date: November 17th, 2008 08:34 pm (UTC) (link)

Re: OpenMP could be the solution

That's sort of like swatting a fly with an elephant.
From: (Anonymous) Date: November 17th, 2008 03:48 pm (UTC) (link)

detach + counting semaphore?

You could detach the thread after creation and use a counting POSIX semaphore to throttle the creation, something like this:

sem_init(sem, 0, MAX_KIDS);
for (...) {

void *whatever_thread(...) {

Now, if you also need to wait for the completion of all the threads (and fortunately the cumulative number of spawned threads is known a priori) you'll have to use a barrier (sem_getvalue + sem_wait is not atomic).
The alternative - which works also when the total number of threads is not known - is a mutex + count + conditional variable.
From: kevin_kofler Date: November 19th, 2008 06:51 am (UTC) (link)

Re: detach + counting semaphore?

Careful there, in some testing I did not too long ago, I found that POSIX semaphores are a lot slower than pthread mutexes. It's actually faster to use an array of mutexes than a counted semaphore.
From: gsbarbieri Date: November 17th, 2008 04:34 pm (UTC) (link)


First, why don't you open files inside threads themselves? Open is quite an expensive operation as well.

Second, seems your first example with fork() might have problems when you reach MAX_KIDS, because you don't decrease kids count! Last but not least you might have SIGCHLD from other stuff, so be careful to not have that in code that might have other childs.

Third I guess semaphore is the best situation, maybe you can try barriers as well, but then you'd not be able to reuse thread slots and you'd wait until the last thread ends. But if you look at pthread_join(3P) you'll read:

The pthread_join() function is a convenience that has proven useful in
multi-threaded applications. It is true that a programmer could simu-
late this function if it were not provided by passing extra state as
part of the argument to the start_routine(). The terminating thread
would set a flag to indicate termination and broadcast a condition that
is part of that state; a joining thread would wait on that condition
variable. While such a technique would allow a thread to wait on more
complex conditions (for example, waiting for multiple threads to termi-
nate), waiting on individual thread termination is considered widely
useful. Also, including the pthread_join() function in no way precludes
a programmer from coding such complex waits. Thus, while not a primi-
tive, including pthread_join() in this volume of IEEE Std 1003.1-2001
was considered valuable.

but really, if they advise that I wonder why it was not made easier. :-P
ajaxxx From: ajaxxx Date: November 17th, 2008 08:33 pm (UTC) (link)

Re: idea

Second, seems your first example with fork() might have problems when you reach MAX_KIDS, because you don't decrease kids count!

D'oh. Nice catch.
From: ext_133931 Date: November 17th, 2008 04:42 pm (UTC) (link)

The "bag of tasks" idiom is easier with threads

pthreads isn't so bad if you use the common "bag of tasks" idiom:
1. Have a multi-producer-consumer safe queue on hand like the 'Queue' module in the Python standard library with get, put, task_done, and join methods.
2. Create an empty queue
3. Create MAX_KIDS threads each doing 'queue.get; do_stuff; queue.task_done'
4. In the main thread, open num_files files, put them in the queue and call 'queue.join'
neillparatzo From: neillparatzo Date: November 17th, 2008 04:51 pm (UTC) (link)
Would this be a bad time to point out that Win32 has first-class threads and WaitForMultipleObjects()?
From: (Anonymous) Date: November 17th, 2008 05:53 pm (UTC) (link)


WaitForMultipleObjects is fine, but it's limited to 64 objects. In case you have to deal with greater amount of threads, you're back to a 'manual' solution.
neillparatzo From: neillparatzo Date: November 18th, 2008 02:26 am (UTC) (link)

Re: WaitForMultipleObjects

Raymond Chen makes a well-timed post... apparently RegisterWaitForSingleObject aggregates multiple WaitForMultipleObjects into multiple threads, 64 per thread.
ajaxxx From: ajaxxx Date: November 17th, 2008 08:37 pm (UTC) (link)
Not at all. They're still more expensive than Linux processes, but I do have to admit that WaitForMultipleObjects() is exactly the correct API. You can get infuriatingly close to it in posix by turning everything into an fd, but it's infuriating because of all the grossness you have to do to make them into fds.
From: (Anonymous) Date: November 18th, 2008 03:52 am (UTC) (link)

make -j

You could do what make -j does: create a pipe, write MAX_KIDS bytes to the pipe, and always read a byte from the pipe before creating a thread. Each thread in turn writes a byte to the pipe when it finishes.

If you need specific array indexes, use them as the values going through the pipe.
pong! (x12) || ping?