Log in

activity wholist changelog info go back go back go forward go forward
Post a comment - now if you've got a pair of headphones...
you'd better get 'em on and get 'em cranked up
SIGIO considered harmful
The X server has a pseudothread for handling input. It's not really a thread, it's something much worse. Recently I got to discover just how much worse.

fcntl(F_SETOWN) on a file descriptor lets you get SIGIO whenever input is available for reading. So, we do that. We don't do threads because this hack predates the wide availability of thread support, but we still want minimal latency in updating the cursor position. It's slightly ugly because, once you've received the signal, there's no way of knowing which file descriptor the signal was about, so we call select() on all the input file descriptors we've got and walk through all of them that appear to be readable. You'd like to avoid this; input is latency-sensitive, so you'd like to do it in as few system calls as possible.

Linux has the additional feature of fcntl(F_SETSIG), which lets you get the file descriptor that generated the signal back as part of the signal information. So I implemented this, with the obvious semantics: receiving a SIGIO would call the handler for just that file descriptor and then return.

Not long after, people started complaining of weird keyboard repeat bugs. Keys would - rarely, but enough to notice - behave as though they were still pressed when released, until the next time you pressed a key on the same device. After staring at it for a while, I figured it out. SIGIO is a one-shot signal, and its delivery is blocked while a handler for it is already executing. So, if you released a key while you were processing a mouse event, the SIGIO for the key release would never be heard; the kernel would try to deliver it, but see that the mouse handler was already running, and would drop the signal. Since the key would still appear to be pressed from X's perspective, our soft timer for generating key repeats would fire. And fire. And fire... Eventually you'd press another key, and the kernel would queue up a signal for that event, and you'd finally read both the key release from earlier and the new key press.

One subtle thing about this discovery is that this race is there in the old, non-F_SETSIG code too! If the keyboard release event comes in just after the call to select() returns but while still in the signal handler, the exact same thing will happen. But in this case it's much less noticeable, because a subsequent wakeup on any input device will clear things up; the select() call will see both the new input event, and the release event we missed last time. So merely moving the mouse will stop the spurious repeats. Gross.

There would appear to be an obvious solution: use realtime signals. fcntl(F_SETSIG) not only lets you get the notifying fd, it also lets you set the signal to receive. POSIX realtime signals have queued delivery, unlike SIGIO, so if one comes in while another is currently being handled, the kernel will queue it up for you, and deliver the second signal after the first one returns.

There's two problems with realtime signals though. One is that, due to a detail of how Linux event devices work, you get signal notifications way more than you would expect. You get one for every "event" packet, but the evdev driver doesn't do anything meaningful with them until it hears a synchronization packet. A keyboard release, for example, is three event packets: keycode, scancode, and sync. They get queued up really fast internally though, so the first signal handler call will read all three packets from the device, and then be called twice more to do nothing. (evdev attempts to minimize the number of times it calls read(), for the same reason we're trying to minimize the number of times we call select(): context switch is expensive, and you never get latency back). This is more expensive than the plain SIGIO method: you're doing three signal deliveries instead of one, and three read()'s instead of one read() and one select().

The other problem is that the kernel is only guaranteed to queue so many realtime signals. What happens when this queue limit is exceeded? Well, in this case, the kernel reverts to sending you SIGIO instead of the requested realtime signal. So you have to implement the SIGIO handler anyway. And that's the killer, really. You can't do a SIGIO handler correctly, because there's always a race. Even if you select() in a loop until there's no more ready fds, there's still a gap between the last select call and returning from the signal where an input event can come in without the process being notified. It's small, but it's there.

The moral is: don't use signals. Use a damn thread already.


No HTML allowed in subject


(will be screened)