PDA

View Full Version : Brief update -- PyCI on a real RouterStation


riskable
07-31-2009, 02:26 PM
I just thought I'd post an update regarding something rather pertinent: I finally got my hands on a RouterStation and loaded up PyCI on it moments after I had it up and running...

It didn't work :(
...at all. Period. It appears that there's something broken with the math libraries or some math part of the Linux kernel on the image that ships with the RouterStation that causes Python to completely crap out on anything more complex than 1+1.

So I spent some time building and loading a firmware image from the latest OpenWRT SVN repository... That fixed the math problem! PyCI was now running on an actual RouterStation!

INCREDIBLY SLOWLY

Yes, it was PAINFUL to sit there while it took 30+ seconds to load the login page. Ouch! However, being the resourceful guy that I am I tried a few things to see if I could figure out what was slowing it down...

Was it the number of threads? No. Increasing them was still ridiculously slow.
Was it Python? No. Benchmarking Python revealed that it was plenty fast enough.
Was it CherryPy on MIPS? Not that either. It seemed to run everything ELSE just snappy.
Was it the compression I was using? YES! It turns out that the flash filesystem on a RouterStation is just too slow at reading/decompressing gzipped tarballs.

So I changed the code and the build script to just gzip each file individually (skip the tarball part) and see if that had an impact... That fixed it (!) and it gave me a chance to try something I've had in my TODO list: Would it be possible to serve the gzip-compressed files to a browser without first decompressing them?

After about 10 minutes of trying various things I had it working: The files are just served up pre-compressed to the web browser--skipping the decompression part completely (and everything displays fine since every browser supports gzip these days--even IE6)! How did it impact performance? Here's a rundown on what I did to speed everything up:

1) Switching from tarballs to individually gzip-compressed files cut the average page load time from 30+ seconds to about 8 seconds. MUCH better but still kinda slow. This increased the size of PyCI by about 100k but that isn't enough to be a big deal. There's still plenty of room left on the RS (4.5MB free as I write this with many extras installed).

2) Switching from decompressing the gzip files to sending them as-is brought the average page load time down to about 2-3 seconds. That's pretty good! Some complicated AJAX pages still take 5-7 seconds and there's one page that takes 10 seconds but I can fix it (just need to have the page do a bit more on-demand loading of things hidden from view when the page loads). Regardless, the pages that take longer still load quite fast--it just takes an extra second or two to finish loading things like extra informational panels (which, from a usability perspective, is precisely how it should work).

Testing shows that just about every page displays the relevant/important pieces under 2 seconds so I'm pretty happy with the performance. Of course, things can be tweaked further :) .

So I went from freaking out (PyCI doesn't work!) to being pleasantly pleased. I wish I had time to post a new screencast since I've added like 90% of the remaining required features.

I just finished the speed test requirement and I really like how it turned out!

MaximumISP
08-01-2009, 05:22 AM
Thanks for keeping us updated
best of luck in the dash for cash

riskable
08-01-2009, 11:48 AM
I just thought I'd post some other things I noticed about PyCI running on the RouterStation...

* The memory usage is much lower on the RouterStation than on x86. I knew it would be better but not as much as it is. I'm not sure why but when I run a Qemu instance of OpenWRT-x86 with 64MB of RAM PyCI takes up about 42% of available memory (as displayed by the 'top' command). That is with 5 threads. On an actual RouterStation PyCI only takes up 25% of available memory. Pumping it up to 10 threads only increases the memory footprint to 26% whereas on x86 10 threads jumped to 52% memory utilization. Anyone know why x86 is so memory inefficient?

Since 10 threads is much faster than 5 on pages that do a lot of AJAX calls I'm going to release PyCI with a default of 10 threads. If people want that extra 1% memory available to them they can change it on-the-fly via the Admin plugin. Overall I'm very pleased with the performance at this point so I'm not going to fiddle with it anymore (too many other things to finish in the next 30 days). Post-contest there will be plenty of time to make things faster.

For reference, PyCI running on Circuits.web instead of CherryPy uses up a lot less memory. I haven't done a full port yet (Only old versions of the Network and Admin plugins work--mostly =) but post-contest I suspect this is going to be one of my first tasks. I should be able to get PyCI down to using less than 10% of the 64MB. Porting PyCI plugins from CherryPy to Circuits.web is fairly trivial but it is tedious and time consuming.

* The mkfwimage command included with OpenWRT doesn't let me make RS-compatible firmware images over 6 megabytes in size. This means that I can't make an image with Python and PyCI pre-installed. Anyone know of a workaround or a way to fix it?

riskable
08-05-2009, 07:40 AM
I finally got PyCI included in a firmware image and flashed it to my RouterStation. Here's the output of df:

root@OpenWrt:/usr/PyCI# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 8.9M 8.9M 0 100% /rom
tmpfs 30.3M 1.1M 29.2M 4% /tmp
tmpfs 512.0K 0 512.0K 0% /dev
mini_fo:/tmp/root 8.9M 8.9M 0 100% /tmp/root
/dev/mtdblock3 6.0M 1.0M 5.0M 17% /jffs
mini_fo:/jffs 8.9M 8.9M 0 100% /

Having PyCI in the squashfs partition (ROM) appears to save about 300k. The actual image file is about 10MB:

$ ls -lh PyCI-openwrt-ubnt-rs-squashfs.bin
-rw-r--r-- 1 riskable riskable 9.9M 2009-08-05 10:36 PyCI-openwrt-ubnt-rs-squashfs.bin

Arx
08-06-2009, 06:20 PM
Would it be possible to serve the gzip-compressed files to a browser without first decompressing them?
I can guess you can rely on filesystem compression - both JFFS2 and squashfs are already compressed. And ensure to update to beta version. Default factory version comes with a bad bootloader version which underpowers RAM and CPU and recent OpenWRT version is a good idea as well. With default firmware and bootloader performance is quite poor.

(De)compression speed depends quite much on RAM speed and CPU cache setup. Unfortunately, both are configured poorly with default stock firmware and boot loader... :(.

So you may want to try beta (this should update your boot loader to a better one) and use recent revisions of openwrt (if not already done).

P.S. wasting RAM is a bad idea - there could run a dozen of networked processes except configuration UI, there could be NAT with large connection tracking table and OOM condition is a thing which really suxx. so you're really do not want to toggle it...

riskable
08-09-2009, 11:35 AM
Would it be possible to serve the gzip-compressed files to a browser without first decompressing them?
I can guess you can rely on filesystem compression - both JFFS2 and squashfs are already compressed. And ensure to update to beta version. Default factory version comes with a bad bootloader version which underpowers RAM and CPU and recent OpenWRT version is a good idea as well. With default firmware and bootloader performance is quite poor.

(De)compression speed depends quite much on RAM speed and CPU cache setup. Unfortunately, both are configured poorly with default stock firmware and boot loader... :(.

Using gzip compression (generally) saves more space than jffs2 and squashfs while at the same time speeding up delivery of the files to the browser. Not only are the files read from the filesystem faster they are also delivered to the browser faster. Essentially, it saves both file size and bandwidth.
So you may want to try beta (this should update your boot loader to a better one) and use recent revisions of openwrt (if not already done).
My current PyCI firmware does use the latest SVN release of OpenWRT. You're right: It is a lot faster (probably the memory speed bug was fixed?).
P.S. wasting RAM is a bad idea - there could run a dozen of networked processes except configuration UI, there could be NAT with large connection tracking table and OOM condition is a thing which really suxx. so you're really do not want to toggle it...
I'm not sure about "a dozen networked processes" but NAT shouldn't be a problem. Right now without any optimizations PyCI is hovering at around 27% of RAM according to 'top'. That leaves somewhere around 44962938 bytes free for connection tracking (not counting the other default stuff which uses up very little memory). Each tracked connection requires 350 bytes of non-swappable kernel memory. So that leaves >120,000 tracked connections free while running PyCI. Or to put it another way, PyCI is currently using up the equivalent of 51,769 tracked connections.

It won't matter as much once I complete the PyCI port to circuits.web. Circuits.web uses MUCH less memory than CherryPy (and a little less disk space). I don't have the time to get this done before the contest ends but it shouldn't take more than a few weeks to port once the contest is over.

CzechEnglishFrenchGermanItalianPolishPortugueseRussianSpanish
Languages translations supported by vBET 3.5.4