Yay, long periods without updates

So, work being work, I haven’t had much time to do stuff. This post is mostly just a reminder that projects are being maintained.

There have been a few minor updates to prismscript: a few of the math.random functions previously resulted in recursion loops which have been fixed, regular expression support was added, and some types and namespaces were extended. All backwards-compatible changes.

pystrix has seen a few new features added and a couple of internal tweaks for compatibility with changes in Asterisk, though everything remains 1.10.x-compatible. I’ve also been working with Asterisk, as time permits (sadly, not often enough), to improve its conferencing module.

Lastly, I’ve started work on a new game against libgdx, intended to be playable (if not mostly complete) in time for the Ouya launch, but meant to be tablet-and-phone-friendly, too. That’s kinda back-burner, really only getting time while I’m in transit to and from the office, but progress is promising.

I’m still watching for feature-requests and bug notifications, so submit ’em if you’ve got ’em.

staticDHCPd 1.5.7

staticDHCPd 1.5.7 is now available, which is identical to 1.5.6, except that PXE requests will work, even if you have your network configured to use relays only. PXE requests are strictly unicast and omit gateway information in many cases. Chances are this will affect very few people, but it’s understandably confusing for those who are.

staticDHCPd 1.5.6

Okay, so I’m a terrible liar because I’ve had no time to really do anything related to project-work, but have staticDHCPd 1.5.6, which should now really do PXE, not break when setting complex RFC options, timeout when sending e-mail to unreachable servers, and generally do other things more goodishly.

No, you won’t need to update your config files, since missing values will be set to sane defaults, but you should read the new one to see what the ‘pxe’ parameter has become. Using ‘if pxe’ like before, even though it didn’t work, will not result in any breakages, though.

Python threads that mysteriously appear to stop executing

I just solved a weird problem that, once understood, actually makes a lot of sense, but would probably be pretty hard to identify without a lot of guesswork.

My scenario, simplified:

  • One thread that runs in an infinite loop, polling a C-implemented function (from Cython), with a five-second timeout, to populate a queue
  • Any number of worker threads that block on the queue (timeout=7.5s) to get events to process

Now, this should seem like a fairly straightforward thing: a handful of threads, each capable of running in isolation, except for a common dependency on a threadsafe queue. The problem, however, is that the worker threads all eventually seemed to freeze, doing nothing while the infinite-looping thread ran fine.

Symptoms included being able to enumerate all threads, being able to have printouts saying that the threads were, indeed, alive, and what seemed to be freezing related to the logging module.

After commenting out every logging statement in the threads, the problem persisted, so they weren’t the issue. After that, I tried replacing queue.get() with a simple time.sleep(7.5) to see if the threads were still operating and the queue was at fault. The same behaviour occurred, with threads freezing when they slept. This implied that the problem was related to blocking.

It wasn’t until I started pinging someone uninvolved as a sounding board that the pattern started to make sense: the threads may not be reacquiring the GIL, so they might not ever be able to resume, even after they’re supposed to wake up. I tried waiting for ten minutes and, sure enough, one of the threads showed signs of life.

The problem was that my C polling function never released the GIL, so the entire timeout window would have been one big instruction to Python. Instead of taking advantage of threads for extended I/O delays, every other thread was blocking on their completion and the default 100-instruction context-switch was making the process take forever.

 

Simple to fix, but really, really hard to diagnose when just looking at the obvious symptoms. Hopefully, anyone who reads this will jump to a conclusion faster than I did, since it’s the sort of issue that can be really frustrating in what seems like a common design.

staticDHCPd 1.5.4

Yeah, I skipped 1.5.2 and 1.5.3 and went straight to 1.5.4. I didn’t want to push out three versions within the span of two days (the others are tagged in Subversion if you really want them on their own).

staticDHCPd 1.5.4 is now available for download, obviously.

1.5.2 fixes an issue with Cisco relays not liking packets sent from ports other than 67 and adds support for Oracle (thanks to Matthew Boedicker for making both of these possible). 1.5.3 makes PXE booting work when the client insists on using 4011 and you don’t want to set up another server (and it makes it easier to process client-vendor options). 1.5.4 makes upgrading from older versions a very simple process.

Full details, as always, are in the changelog.

Feelin’… networky

As I get stuff set up to set about working on DHCPv6 support for staticDHCPd, I find that I now have five distinct subnets in a sub-1000sqft space. Yes, they all have a purpose (and will continue to exist long after 2.0 is out, ’cause I like my resources clearly defined by boundaries), yet I can’t help but feel that I’m doing something just a little unusual.