Log in

activity wholist changelog info go back go back
now if you've got a pair of headphones...
you'd better get 'em on and get 'em cranked up
RHEL6 shipped yesterday. This is the first RHEL release I've seen all the way through from inception to release, and it's definitely been an interesting (and mostly enjoyable!) experience.

I got curious about what exactly I'd been doing in that time:

synephrine:~/xserver% git log -p --author=ajax xorg-server-1.1.1..xorg-server-1.7.7 | diffstat | tail -1
3402 files changed, 55908 insertions(+), 307854 deletions(-)

Not a bad start.

Tags: ,
music: girls against boys - the come down

pong! (x2) || ping?
I'm currently fighting the Intel graphics driver on Ironlake (Core i3/i5/i7). It's nice that I have some documentation for it? Except that a typical page looks like this:

Strong work guys.

Tags: , ,
music: placebo - you don't care about us

pong! (x7) || ping?
The X server has a pseudothread for handling input. It's not really a thread, it's something much worse. Recently I got to discover just how much worseCollapse ).

fcntl(F_SETOWN) on a file descriptor lets you get SIGIO whenever input is available for reading. So, we do that. We don't do threads because this hack predates the wide availability of thread support, but we still want minimal latency in updating the cursor position. It's slightly ugly because, once you've received the signal, there's no way of knowing which file descriptor the signal was about, so we call select() on all the input file descriptors we've got and walk through all of them that appear to be readable. You'd like to avoid this; input is latency-sensitive, so you'd like to do it in as few system calls as possible.

Linux has the additional feature of fcntl(F_SETSIG), which lets you get the file descriptor that generated the signal back as part of the signal information. So I implemented this, with the obvious semantics: receiving a SIGIO would call the handler for just that file descriptor and then return.

Not long after, people started complaining of weird keyboard repeat bugs. Keys would - rarely, but enough to notice - behave as though they were still pressed when released, until the next time you pressed a key on the same device. After staring at it for a while, I figured it out. SIGIO is a one-shot signal, and its delivery is blocked while a handler for it is already executing. So, if you released a key while you were processing a mouse event, the SIGIO for the key release would never be heard; the kernel would try to deliver it, but see that the mouse handler was already running, and would drop the signal. Since the key would still appear to be pressed from X's perspective, our soft timer for generating key repeats would fire. And fire. And fire... Eventually you'd press another key, and the kernel would queue up a signal for that event, and you'd finally read both the key release from earlier and the new key press.

One subtle thing about this discovery is that this race is there in the old, non-F_SETSIG code too! If the keyboard release event comes in just after the call to select() returns but while still in the signal handler, the exact same thing will happen. But in this case it's much less noticeable, because a subsequent wakeup on any input device will clear things up; the select() call will see both the new input event, and the release event we missed last time. So merely moving the mouse will stop the spurious repeats. Gross.

There would appear to be an obvious solution: use realtime signals. fcntl(F_SETSIG) not only lets you get the notifying fd, it also lets you set the signal to receive. POSIX realtime signals have queued delivery, unlike SIGIO, so if one comes in while another is currently being handled, the kernel will queue it up for you, and deliver the second signal after the first one returns.

There's two problems with realtime signals though. One is that, due to a detail of how Linux event devices work, you get signal notifications way more than you would expect. You get one for every "event" packet, but the evdev driver doesn't do anything meaningful with them until it hears a synchronization packet. A keyboard release, for example, is three event packets: keycode, scancode, and sync. They get queued up really fast internally though, so the first signal handler call will read all three packets from the device, and then be called twice more to do nothing. (evdev attempts to minimize the number of times it calls read(), for the same reason we're trying to minimize the number of times we call select(): context switch is expensive, and you never get latency back). This is more expensive than the plain SIGIO method: you're doing three signal deliveries instead of one, and three read()'s instead of one read() and one select().

The other problem is that the kernel is only guaranteed to queue so many realtime signals. What happens when this queue limit is exceeded? Well, in this case, the kernel reverts to sending you SIGIO instead of the requested realtime signal. So you have to implement the SIGIO handler anyway. And that's the killer, really. You can't do a SIGIO handler correctly, because there's always a race. Even if you select() in a loop until there's no more ready fds, there's still a gap between the last select call and returning from the signal where an input event can come in without the process being notified. It's small, but it's there.

The moral is: don't use signals. Use a damn thread already.

Tags: , ,
music: skindive - space age lullaby

pong! (x16) || ping?
The Portland State University machine room - host of annarchy, aka people.freedesktop.org and the storage for personal git repos - had a rather catastrophic series of power failures yesterday. As in, both of the independent, UPS-backed power rails to the room failed simultaneously, multiple times. As best we can tell, one of those failures was during ext3 journal recovery of /home from one of the previous failures. fsck was utterly, utterly unable to cope with this.

So, /home got a reformat. /home on that machine was not backed up (and never has been advertised as permanent storage), so anything you had there is lost. Git repos can be restored from your own clones, of course.

music: kmfdm - diy

pong! (x6) || ping?
I see Intel is continuing their excellent support story for Poulsbo.  I wonder just how much goodwill they're willing to lose over this chip.

Tags: ,
music: download - affirmed

pong! (x2) || ping?
Monitor plug and play works by storing a descriptor called EDID in the monitor. The most important thing in this descriptor is the list of modes that the monitor supports. You would think this would be a fairly straightforward thing, but it's actually not, because you've only got 128 bytes to work with. So there end up being multiple ways to encode a mode in EDID:
  • the Established Timings fields I and II and Manufacturer's Timings, which is a three-byte bitmap of 17 industry-standard timings, plus seven bits of "manufacturer's mask" which could in principle be more modes but no manufacturer publishes what they mean
  • the Standard Timings field, which is an array of eight two-byte codes, thus: eight bits for width in multiples of 8 with 0x01 meaning 256; two bits for image aspect ratio (with one of the combinations overloaded to mean 16:10 or 1:1 depending on the spec revision); six bits for field refresh rate minus 60
  • Up to four of various 18-byte detailed monitor descriptors, including:
    • Detailed Timing descriptors, which encode more or less every timing parameter including sync intervals and polarity, borders, interlacing, stereo, star sign, etc.
    • Display Range Limit descriptors, which can describe sync range intervals and optionally mark them as supporting the GTF or CVT timing formulas, implying that any mode conforming to those sync ranges and timing formulas is supported
    • arrays of up to six Standard Timing identifiers, encoded as above
    • arrays of up to four 3-byte CVT descriptors, which encode twelve bits for image height, two bits for aspect ratio, a two-bit enum for preferred refresh rate, and a five-bit mask for supported refresh rates
    • the Established Timings III descriptor, which comprises 44 bits of yet more industry-standard timings as published in the DMT spec, and 52 unused bits set to zero
    • potentially any of sixteen manufacturer-specific detailed blocks, none of which are documented, any of which could contain mode support information
Not content with the restrictive expressive power of base EDID, an extension mechanism was introduced.  You can have up to 255 128-byte extension blocks, including:
  • the Consumer Electronics Association extension, which lets you specify:
    • Arbitrary numbers of Short Video Descriptors, which are one-byte indices into a list of timings in the CEA spec
    • Arbitrary numbers of detailed timing blocks, encoded as in base EDID
  • the Video Timing Block extension, which lets you specify:
    • Zero to six detailed timing blocks, encoded as in base EDID
    • Zero to forty CVT 3-byte descriptors, encoded almost as in base EDID except with fewer legal aspect ratios
    • Zero to sixty-one Standard Timing descriptors, encoded as in base EDID
  • the Device Information extension, which lets you specify:
    • A bit under the Miscellaneous Display Capabilities section for "VGA/DOS Legacy Timing Modes" support, which is a predefined set of 24 low-resolution modes you remember from EGA
  • Manufacturer-specific extension blocks, which are naturally undocumented and could contain timing info
Naturally, the display industry found this lack of flexibility entirely unacceptable.  So, soon and very soon, you will also begin to see displays that conform to the newer DisplayID specification.  Fortunately, to minimize time to market and maximize software reuse, DisplayID timing descriptors are entirely compatible with EDID:
  • Type I detailed timings, which are encoded exactly like the detailed timing descriptors in EDID, except for using an additional byte for pixel clock, moving the stereo and interlace bits, adding an aspect ratio field, eliminating borders, and widening the storage for all the horizontal and vertical timing numbers
  • Type II detailed timings, which are encoded exactly like the detailed timing descriptors in EDID, except for being only 11 bytes long, using an additional byte for pixel clock, moving the stereo and interlace bits, eliminating borders, storing all horizontal timing parameters in terms of eight-pixel character cells, and widening the storage for vertical timing numbers
  • Type III short timings, which encode CVT timings in three bytes exactly like CVT 3-byte descriptors in base EDID, but do it as a bit for preferred-timing-or-not, a bit for reduced-blanking-or-not, four bits for aspect ratio, eight bits for horizontal image size in eight-pixel character cells, a bit for interlaced-or-not, and seven bits for refresh rate
  • Type IV timing codes, which are one-byte indices into the DMT timing list, unlike any EDID encoding method
  • Supported VESA Timings data blocks, which is a ten-byte bitfield corresponding to various modes in DMT, but not in the order that they're in the DMT spec, nor in the same order as the Established Timings I through III in base EDID
  • Supported CEA Timings data blocks, which is a seven-byte bitfield corresponding (shockingly) to a superset of the CEA mode list in the most recent version of the spec I have handy, and even in the same order as in the spec
  • Display Range Limit descriptors, which are exactly like the encoding in base EDID, except for widening the storage for pixel clock, shrinking the storage for horizontal and vertical sync limits, adding maxima for horizontal and vertical blanking, and removing support for GTF
  • CEA-defined blocks, which are presumably in a newer version of the spec than I have
  • Manufacturer-specific data blocks
Did I say compatible?  That other thing.

So the other great part about DisplayID is that it requires host software updates to work, because - afaict - it's published at the same I2C address as EDID was.  This is kind of okay for laptops, since you know what software is shipping on them.  It's less okay for standalone displays, because now it raises the very real possibility of being able to buy a monitor that is too new to work with your old operating system.

Even better, drivers using VESA BIOS Extensions (which admittedly are kind of boned already) are about to be left out in the cold.  The VBE spec only defines a call to get EDID, and no legal DisplayID block can possibly be confused for an EDID block, so if your video BIOS actually checks for the EDID block header, there's no way to get it to return DisplayID anyway.  You now have a combination of video card, cable, and monitor, that can not be used together with VBE.  For people running the unfortunate sort of OS you have to pay for, there's the other kind of connector conspiracy too, where you probably can't get a combination of (native driver that supports DisplayID) and (operating system with native driver support) for your hardware either.

Thanks VESA.

Tags: ,
music: reverend horton heat - bales of cocaine

pong! (x2) || ping?
I really am stunned that anyone thinks this is a worthwhile use of time.

Stunned is one word for it. Dismayed is another. Take your pick.

Tags: ,
music: rob zombie - dragula

pong! (x20) || ping?
I've been poking at (tearing my hair out over) DisplayPort support for the various graphics chips under Linux. Rather than post something here that would soon be out of date, I made a status page on the wiki. The executive summary right now is No It Doesn't Work, but that should change soon.

DP actually promises to be a pretty cool technology. Compared to DVI it seems a bit overengineered, but that's sort of like comparing a car to a skateboard. Given the per-port licensing tax on HDMI, I think DP's eventual dominance is inevitable, but HDMI is pretty entrenched so that won't be any time soon. We'll just have to deal with the connector conspiracy until HDMI dies.

Tags: ,
music: depeche mode - wrong

pong! (x4) || ping?
I realized while at the show last night that I've been going to see Local H perform for over ten years.

No real story there or anything, but it was kind of pleasant to note, given how my life feels like it tends towards the transient. Four years is an eternity. Ten is incomprehensible. Maybe some things last after all.

music: local h - cynic

pong! (x7) || ping?
So I had this code that did essentially:
for (int i = 0; i < num_files; i++)
    if ((fd = open(files[i]->name, O_RDONLY)))
And I figured, hey, I've got an I/O scheduler and my disks have command queueing now, let's mitigate some of that seek death and do multiple stuffs in parallel. Well, here in ***america*** where fork() is cheap you'd do:
for (int i = 0; i < num_files; i++) {
    if (num_kids == MAX_KIDS)
    if ((fd = open(files[i]->name, O_RDONLY))) {
        if (!fork()) {
        } else {
    while (waitpid(-1, NULL, WNOHANG) > 0)
while (num_kids--)
And then it's off to the bar. For some reason I thought that maybe doing it in pthreads would be a good exercise, and tried it that way first. (Actually I tried POSIX AIO zeroth. Ha! Good one, guys. You almost got me.) That went something like:
for (int i = 0; i < num_files; i++) {
    if ((fd == open(files[i]->name, O_RDONLY)))
        pthread_create(&thread[min_thread_index], NULL, do_stuff, (void *)fd);
    /* hmm, how to throttle thread creation? */
    pthread_join(&thread[i % MAX_KIDS], NULL); /* bonghits in the hood */
    /* and then reset min_thread_index somehow? ugh */
Except that doesn't work, because you can only join with named threads, and not just whichever one happens to die first. I have no idea how long each of these threads is actually going to take, so round-robin is a waste of time. I could stick in a semaphore that every thread downs on init and ups on termination, and block the dispatch loop on that, but that's still not enough, since then I still need to figure out - somehow - which thread died so I can reuse its slot in thread[]. Why pthread_create() doesn't just return the new pthread_t is beyond me.

Of course ideally we'd have waitfd() that was smart enough to cope with thread IDs too, and you'd message the thread ID back out to the master process somehow, and you'd still have to allocate your own pthread_t's anyway. Hooray for unix.

Tags: ,
music: nine inch nails - gave up

pong! (x12) || ping?
I find it entirely appropriate that the ACPI name for the embedded controller in a Sony Vaio would be H8EC.

Tags: ,
music: junkie xl - more

pong! || ping?
There's a new kind of toy that keeps showing up at the office. Laptop vendors are really starting to push multiple-GPU machines. The theory goes that you can use one low-power GPU while you're on battery, and a high-performance GPU when you're plugged into wall power. I've played with a couple of these so far and people are starting to ask about them, so I figured I'd write down what I know and what I think the plans are.

The best case is that you get a laptop like the Lenovo Thinkpad W500, where the BIOS has options controlling the GPU setup. You can pick GPU A, or GPU B, or switchable graphics. If you pick one or the other, that's all that will show up on the PCI bus, and so X will pick it up and run with it. Hooray!

If you pick switchable graphics, we'll see two GPUs on the PCI bus. And now, things get tricky. Which one is active? Well, you could look at the VGA routing bits in the PCI configuration and attempt to figure out which GPU the BIOS enabled. But on the above-mentioned Lenovo, that doesn't work, VGA is just not routed anywhere. Maybe you could look to see whether only one device has memory and i/o decoding enabled, but again, that doesn't seem to be reliable, and what do you do if there's more than one?

Ideally this is where ACPI would come to our rescue, and there'd be some platform method to call to tell us which ASIC to talk to. Maybe there is, but we haven't found it yet. Neither do we seem to get any unexpected ACPI events when switching from battery power to wall power. The Lenovo has an option in the BIOS that claims to automatically detect whether the OS supports GPU switching, but it doesn't seem to be reliable, in that if I turn detection on, I still see both chips on the PCI bus. Nonetheless this does indicate that there's some platform support there somewhere and we just need to look harder.

Of course, if you're unlucky, you got a machine like the Sony Vaio Z540, where the BIOS has no GPU options, period. If you end up in a situation where you see two video devices on PCI, just write out a minimal xorg.conf that picks the driver and the PCI slot, and hopefully things will work. If not, you have two pieces, and you can keep them or not.

Anyway, that gets you as far as doing the same boring single GPU stuff you've always done. As far as runtime switching goes, we're still pretty far off from making that a reality in the open drivers. We could hack up the relevant drivers such that we initialize them both but just only feed commands to one or the other, and then write the serialization exercise to move pixmaps and such from one to the other on switch events. Or, we might be able to start one X server on each GPU and then stick an Xdmx in front of them. In neither case will GLX work the way you expect (if at all), and there will be all kinds of fun corner cases trying to get the second chip to come up exactly compatibly with the first.

Getting this to work well should actually be a lot of fun, and there's lots of opportunity to sweep away old bad design and come up with something good.

In tangentially related news: my LCA talk was accepted! I'll be talking about shatter, which is a project to rewrite the X rendering layer to work around various hardware and coordinate limitations. This is not unrelated to the above problem, hopefully we'll get rendering abstracted far enough away from the driver to make it easier to switch among drivers at runtime.

music: local h - michelle (again)

pong! (x6) || ping?
The Fedora 10 name elections are open. My totally awesome name suggestion for F9 (Bathysphere) lost by a slim margin, so now I'm lobbying harder for decent names. So here's what you should vote for:
  • Whiskey Run, because whiskey is awesome, and the potential for a whiskeyrunner theme in the artwork is excellent.

  • Saltpetre, because it's a double-whammy connection with Sulphur (British variant spelling, and component of gunpowder), and gunpowder -> steampunk art -> awesome.

  • Farnsworth, because Futurama.

  • Cambridge, because Red Hat Linux 10 would have been Cambridge, and we finally made it to version 10 of something.

Every time you vote for something else, God detonates a kitten.

music: dethklok - awaken

pong! (x8) || ping?
<airlied> is a blur-facist, someone who doesn't like Damon Albarn, or someone who thinks Damon Albarn is the new Hitler..

music: system of a down - kill rock 'n roll

pong! (x2) || ping?
The Symbian Foundation: Because you didn't have enough bad free software already.

music: hybrid - higher than a skyscraper

pong! (x16) || ping?
I tire of people objecting to using git on the basis of bad documentation and UI obscurity. There seems to be a common blind spot here, which is that CVS (which everyone learned first) is just as bad, if not worse. The CVS manual has been getting steadily worse over time, and I challenge anyone reading about cvs up -j for the first time to correctly explain it, let alone sticky tags or binary file handling.

More fundamentally, I think people forget that source control is a very weird and very hard problem. Much like the lower levels of particle physics, you kind of have to implement the behaviour you expect using a model that's not intuitive. That the doc writers for git seem to revel in spewing nerdporn words like "tree-ish" at you is unfortunate, but people like seeing dmesg spew at boot for the same reason: it's obscure and therefore makes you feel cool.

Of course, no one needs to know how pretty much any vcs works under the skin on a daily basis, and that's why sensible git documentation exists. HELLO PEOPLE THE INTERNET HAS SEARCH NOW.

music: front line assembly - future fail

pong! (x13) || ping?