activity wholist changelog info go back go back go forward go forward
now if you've got a pair of headphones... - SIGIO considered harmful
you'd better get 'em on and get 'em cranked up
ajaxxx
ajaxxx
Share
SIGIO considered harmful
The X server has a pseudothread for handling input. It's not really a thread, it's something much worse. Recently I got to discover just how much worse.

fcntl(F_SETOWN) on a file descriptor lets you get SIGIO whenever input is available for reading. So, we do that. We don't do threads because this hack predates the wide availability of thread support, but we still want minimal latency in updating the cursor position. It's slightly ugly because, once you've received the signal, there's no way of knowing which file descriptor the signal was about, so we call select() on all the input file descriptors we've got and walk through all of them that appear to be readable. You'd like to avoid this; input is latency-sensitive, so you'd like to do it in as few system calls as possible.

Linux has the additional feature of fcntl(F_SETSIG), which lets you get the file descriptor that generated the signal back as part of the signal information. So I implemented this, with the obvious semantics: receiving a SIGIO would call the handler for just that file descriptor and then return.

Not long after, people started complaining of weird keyboard repeat bugs. Keys would - rarely, but enough to notice - behave as though they were still pressed when released, until the next time you pressed a key on the same device. After staring at it for a while, I figured it out. SIGIO is a one-shot signal, and its delivery is blocked while a handler for it is already executing. So, if you released a key while you were processing a mouse event, the SIGIO for the key release would never be heard; the kernel would try to deliver it, but see that the mouse handler was already running, and would drop the signal. Since the key would still appear to be pressed from X's perspective, our soft timer for generating key repeats would fire. And fire. And fire... Eventually you'd press another key, and the kernel would queue up a signal for that event, and you'd finally read both the key release from earlier and the new key press.

One subtle thing about this discovery is that this race is there in the old, non-F_SETSIG code too! If the keyboard release event comes in just after the call to select() returns but while still in the signal handler, the exact same thing will happen. But in this case it's much less noticeable, because a subsequent wakeup on any input device will clear things up; the select() call will see both the new input event, and the release event we missed last time. So merely moving the mouse will stop the spurious repeats. Gross.

There would appear to be an obvious solution: use realtime signals. fcntl(F_SETSIG) not only lets you get the notifying fd, it also lets you set the signal to receive. POSIX realtime signals have queued delivery, unlike SIGIO, so if one comes in while another is currently being handled, the kernel will queue it up for you, and deliver the second signal after the first one returns.

There's two problems with realtime signals though. One is that, due to a detail of how Linux event devices work, you get signal notifications way more than you would expect. You get one for every "event" packet, but the evdev driver doesn't do anything meaningful with them until it hears a synchronization packet. A keyboard release, for example, is three event packets: keycode, scancode, and sync. They get queued up really fast internally though, so the first signal handler call will read all three packets from the device, and then be called twice more to do nothing. (evdev attempts to minimize the number of times it calls read(), for the same reason we're trying to minimize the number of times we call select(): context switch is expensive, and you never get latency back). This is more expensive than the plain SIGIO method: you're doing three signal deliveries instead of one, and three read()'s instead of one read() and one select().

The other problem is that the kernel is only guaranteed to queue so many realtime signals. What happens when this queue limit is exceeded? Well, in this case, the kernel reverts to sending you SIGIO instead of the requested realtime signal. So you have to implement the SIGIO handler anyway. And that's the killer, really. You can't do a SIGIO handler correctly, because there's always a race. Even if you select() in a loop until there's no more ready fds, there's still a gap between the last select call and returning from the signal where an input event can come in without the process being notified. It's small, but it's there.

The moral is: don't use signals. Use a damn thread already.

Tags: , ,
music: skindive - space age lullaby

Comments
neillparatzo From: neillparatzo Date: December 18th, 2009 06:05 pm (UTC) (link)
It's like they started out by mimicking multiplexed interrupts, but never added a way to properly acknowledge them.
From: tfheen Date: December 19th, 2009 01:43 pm (UTC) (link)

sigprocmask

Can't you call sigprocmask to unblock SIGIO before calling select?

Of course, this would mean making the signal handler reentrant, which might be its own can of pirayas.
From: http://www.google.com/profiles/maxim.yegorushkin Date: December 20th, 2009 02:10 am (UTC) (link)

Re: sigprocmask

Or use pselect to avoid the race between the signal handler and select return.
ajaxxx From: ajaxxx Date: December 20th, 2009 07:26 pm (UTC) (link)

Re: sigprocmask

I don't think pselect helps me. I already have SIGIO delivery blocked; I'm not using SA_NODEFER. If I did use SA_NODEFER, then regardless of whether I use pselect or not, I'd need a condition variable somewhere that means "I got SIGIO while already in the handler, there's more to do", and there's always going to be a gap between the last time I check that variable and the call to rt_sigreturn that ends the signal handler. Signals can happen between any two userspace instructions.
From: (Anonymous) Date: December 19th, 2009 06:04 pm (UTC) (link)
can you turn off SIGIO when starting the handler and turn it back on when leaving (or will the realtime queue get flushed?)
ajaxxx From: ajaxxx Date: December 20th, 2009 07:15 pm (UTC) (link)
Define "turning off". If you mean "block delivery of the signal while the handler is already running", then that's what we already do. And the problem is then the signal semantic. SIGIO is one-shot, so "blocking delivery" means dropping the signal on the floor. Realtime signals queue, so "blocking delivery" means queuing them up in the kernel, which means excess signal delivery and the need to still implement a SIGIO handler anyway.

If you mean "turn off signal delivery at all", then... maybe? I guess you'd turn it off at the start of the handler, then read a bunch, then turn it back on next time through the main loop. But to make that work you'd have to hope that turning on SIGIO delivery with input already queued delivers a signal. (Also, you're adding two more system calls on every input event.)
ajaxxx From: ajaxxx Date: December 20th, 2009 07:55 pm (UTC) (link)
But to make that work you'd have to hope that turning on SIGIO delivery with input already queued delivers a signal.

Having just checked the source: it doesn't. F_SETOWN merely sets the PID or PGID that will receive the signal, it does not check for queued data at all.
From: (Anonymous) Date: December 19th, 2009 08:51 pm (UTC) (link)

Getting the fd

You can easily get the fd from the signal.
Just use sigaction to set the signal handler, then set the sa_sigaction member, not sa_handler. Then set SA_SIGINFO in sa_flags. Your signal handler will look like:

void signal_callback (int signal, siginfo_t *info, void *ucontext)

Then the fd is availible in info->si_fd.

Of course, you still want to use realtime signals to get the queue.

/ Alex



ajaxxx From: ajaxxx Date: December 20th, 2009 07:09 pm (UTC) (link)

Re: Getting the fd

That's what I did, yes. That's what F_SETSIG means. If you don't do F_SETSIG, then even for SIGIO, the si_fd member is not filled in by the kernel.
neillparatzo From: neillparatzo Date: December 24th, 2009 08:10 pm (UTC) (link)
This entry is now Google result #2 for "SIGIO". You've got the juice...
From: farnkerl Date: December 30th, 2009 07:36 pm (UTC) (link)

Related Kernel Bug

I've been having similar keyboard repeat problems for a while now but it's gotten a lot worse now in Fedora 12. The explanation you've given is really good. If you've got a workaround, it might be worth making a mention in the following kernel bug. It doesn't appear they've figured out why it's happening yet:
http://bugzilla.kernel.org/show_bug.cgi?id=9448
From: amyncognito Date: January 13th, 2010 06:47 pm (UTC) (link)

Re: Related Kernel Bug

Making a mention in this kernel bug would appear to be more appropriate:

http://bugzilla.kernel.org/show_bug.cgi?id=9147
From: amyncognito Date: January 13th, 2010 07:04 pm (UTC) (link)

Re: Related Kernel Bug

Sorry! I should have said that a mention in 9147 as well as 9448 would be a good idea.

(I got 9448 mixed up with another, less relevant, bug)
From: amyncognito Date: January 13th, 2010 06:51 pm (UTC) (link)

When did all this start?

[quote]
Linux has the additional feature of fcntl(F_SETSIG), which lets you get the file descriptor that generated the signal back as part of the signal information. So I implemented this, with the obvious semantics: receiving a SIGIO would call the handler for just that file descriptor and then return.

Not long after, people started complaining of weird keyboard repeat bugs.
[/quote]

When exactly did you do this?

The reason I ask is that there've been so many different experiences regarding the keyboard repeat bugs, and for such a long time, I'd like to do some googling and see if anyone experienced them prior to your implementing fcntl as described
ivansorokin From: ivansorokin Date: August 11th, 2012 07:54 am (UTC) (link)
Why not use epoll to handle input events in X?
ajaxxx From: ajaxxx Date: August 14th, 2012 03:47 am (UTC) (link)
Well it's not portable, but whatever, we could abstract that away if we wanted. And select's fd_sets are accidentally part of the driver ABI, but that's fixable.

But the whole point of using SIGIO and/or a thread is to get input events asynchronously of the rest of the main loop, because you want to update cursor position immediately. Switching out select for epoll_wait is simply changing one synchronous event mux call for another. It might make running input from the main loop marginally faster, but running input from the main loop is the whole thing we're trying to not do.
pong! (x16) || ping?
$ ph
Adam Jackson
User: ajaxxx
Name: Adam Jackson
$ cat .plan
gpg: DD38 563A 8A82 2453 7D1F 90E4 5B8A 2D50 A0EC D0D3
$ nm -D
$ cal
Back November 2010
123456
78910111213
14151617181920
21222324252627
282930
page summary
tags