View Full Version : very slow RAM performance on Routerstation? (only 180MB/sec)
markit
03-20-2009, 05:47 AM
Hi
after realizing that some software runs much slower than expected on the routerstation,.. i tried to narrow it down with some simple adhoc benchmarks, and also compared results with my wrt54GL
for example my wrt54GL managed to gzip /dev/zero nearly twice as fast as the routerstation!, and was on par with /dev/urandom input
so i compiled this simple memtest for routerstation,
http://forum.openwrt.org/viewtopic.php?pid=58789#p58789
and as reference for my old WRT54GL aswell
and the results i found do not look nice!!
/memtest_ar711xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 6144812 usecs: 182.26 MB/s
L1: 2576.68 MB/s
./memtest_wrt_8mb
Allocating 8 MB... ok
Renicing to -20... ok
32 loops over 8MB in 371023 usecs: 754.67 MB/s
L1: 907.59 MB/s
Is this a known issue?
(or is just my routerstation defect,..)
can anyone try this on his routerstation too,..
my compiled binary is downloadable from here,. memtest binary (http://193.238.157.78/~markus/olsrd/rs/memtest_ar71xx_16mb)
markit
03-20-2009, 12:48 PM
in the meantime i did another test
i used http://wiki.openwrt.org/HardwarePerformance
in above link i also found values for a Ubiquiti LS71 (same processor as RS)
but same results:
my RS produced some bad results again,..
mem || pi || e || float
my RS
8.6s || 19.9s || 15.9s || 10.0s
LS SR71 (values taken from openwrt.wiki)
3.8s || 4.8s || 4.8s || 2.0s
my wrt54G (again faster than RS )-;)
6.5s || 13.1s || 14.6s || 9.3s
hmm at the moment i start thinking i may have more than only RAM problems on my RS,..
maybe somebody here can repeat my benchmarks?
(imho they should have results like the LS SR71)
ar71xx openwrt binary (http://193.238.157.78/~markus/olsrd/rs/openwrt_cpu_bench_v06_ar71xx)
UBNT-Mike.Taylor
03-20-2009, 01:31 PM
Hi markit,
The problem is not with the hardware, but with a co-processor register which is not set correctly by the boot loader (shared by these two boards).
The register in question controls the behavior of the cached segment of RAM (KSEG1) and is left in a default/debug configuration. The issue is addressed in the next firmware release which is still being prepared/tested.
For now, you can build a later snapshot of OpenWRT/Kamikaze which has a fix for this (in the Linux kernel).
The new image should be posted soon.
Thanks,
Mike
Hi Mike,
Greats news!
May this be related to the bugs in USB STORAGE?
Saludos, OSCAR.
Hi markit,
The problem is not with the hardware, but with a co-processor register which is not set correctly by the boot loader (shared by these two boards).
The register in question controls the behavior of the cached segment of RAM (KSEG1) and is left in a default/debug configuration. The issue is addressed in the next firmware release which is still being prepared/tested.
For now, you can build a later snapshot of OpenWRT/Kamikaze which has a fix for this (in the Linux kernel).
The new image should be posted soon.
Thanks,
Mike
hi,
i tried latest openwrt trunk from 2009-03-18 and the memory performance still the same:
root@OpenWrt:~# uname -a
Linux OpenWrt 2.6.28.8 #1 Wed Mar 18 10:22:26 CET 2009 mips unknown
root@OpenWrt:~# ./memtest_ar71xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 6006312 usecs: 186.47 MB/s
L1: 2591.96 MB/s
I tried to look to openwrt config, but i can't find any option related to KSEG1 caching you mentioned - can you tell me, where it can be set to non-debug state?
tnx
Jan
markit
03-21-2009, 06:42 AM
i tried latest openwrt trunk from 2009-03-18 and the memory performance still the same:
Jan
maybe try the openwrt cpu benchmark also,.. with new firmware
as i have no access to this routerstation at hte moment, i just run the same binaries on an rb411 AR7131 300Mhz cpu (under routerOS) to get another reference
i got also unexpected low memory results,.. but at least the cpu performance was better
mem || pi || e || float
my RS
8.6s || 19.9s || 15.9s || 10.0s
rb411 (300Mhz, routeros)
4.6s || 12.9s || 12.9s || 5.8s
./memtest_ar71xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 5741235 usecs: 195.07 MB/s
L1: 956.81 MB/s
but still imho 200MB/sek is way too low!!
markit
03-21-2009, 06:44 AM
i tried latest openwrt trunk from 2009-03-18 and the memory performance still the same:
Jan
maybe try the openwrt cpu benchmark also,.. with new firmware
as i have no access to this routerstation at hte moment, i just run the same binaries on an rb411 AR7131 300Mhz cpu (under routerOS) to get another reference
i got also unexpected low memory results,.. but at least the cpu performance was better
mem || pi || e || float
my RS (680Mhz)
8.6s || 19.9s || 15.9s || 10.0s
rb411 (300Mhz, ROUTEROS)
4.6s || 12.9s || 12.9s || 5.8s
./memtest_ar71xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 5741235 usecs: 195.07 MB/s
L1: 956.81 MB/s
in the eveneing i will test a rb493AH, which specs are much closer to RS
markit
03-21-2009, 06:47 AM
pls. try the opernwrt_cpu_test also
i just testet on an rb411 (mikrotik board with 300Mhz AR7131) and got similar results on memory benchmark, but much better on the cpu bench,..
pls. try the opernwrt_cpu_test also
i just testet on an rb411 (mikrotik board with 300Mhz AR7131) and got similar results on memory benchmark, but much better on the cpu bench,..
Hi, here are results from routerstation:
root@OpenWrt:~# ./openwrt_cpu_bench_v06_ar71xx
This is CPU and memory benchmark for OpenWRT v0.6. This will then take some time... (typically 30-60 seconds on a 200MHz computer)
Overhead for getting time: 28us
Time to run memory bench: 3.78[secs]
Time to run computation of pi (2400 digits, 10 times): 4.78[secs]
Time to run computation of e (9009 digits): 4.76[secs]
Time to run float bench: 2.03[secs]
Total time: 15.4s
You can copy/paste the following line in the wiki table at: http://wiki.openwrt.org/HardwarePerformance
|| 1970-01-20 || ''Author'' || 3.8s || 4.8s || 4.8s || 2.0s || v0.6 || ''OS'' || ''DeviceModel'' || ''CPU model'' || ''CPU Frequency'' || ''LinkToHwPage'' ||
root@OpenWrt:~#
Can you tell me something about the KSEG1 caching? I tried to look for this, but without much success.,.,
Jan
markit
03-21-2009, 10:19 AM
3.8s || 4.8s || 4.8s || 2.0s
ok very fine! now we have same results as LS SR71
but still only <200MB/s from DDRAM??
hmm, is this going to stay that slow?
3.8s || 4.8s || 4.8s || 2.0s
ok very fine! now we have same results as LS SR71
but still only <200MB/s from DDRAM??
hmm, is this going to stay that slow?
still the same results...
root@OpenWrt:~# ./memtest_ar71xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 6010697 usecs: 186.33 MB/s
L1: 2591.78 MB/s
Still no info about config register and SEG1? I look through the sources and found something similiar maybe by port to ifxmips, but not by ar71xx..
Jan
lorenzo.allegrucci
04-08-2009, 02:05 AM
Hi markit,
The problem is not with the hardware, but with a co-processor register which is not set correctly by the boot loader (shared by these two boards).
The register in question controls the behavior of the cached segment of RAM (KSEG1) and is left in a default/debug configuration. The issue is addressed in the next firmware release which is still being prepared/tested.
For now, you can build a later snapshot of OpenWRT/Kamikaze which has a fix for this (in the Linux kernel).
The new image should be posted soon.
Thanks,
Mike
Still no news about this?
I'm using opnwrt r14912 and the problem is still there, I get about 180Mb/s
Can you please be more specific on this KSEG1 register? patches around?
Thank you
lorenzo.allegrucci
04-08-2009, 02:06 AM
Hi markit,
The problem is not with the hardware, but with a co-processor register which is not set correctly by the boot loader (shared by these two boards).
The register in question controls the behavior of the cached segment of RAM (KSEG1) and is left in a default/debug configuration. The issue is addressed in the next firmware release which is still being prepared/tested.
For now, you can build a later snapshot of OpenWRT/Kamikaze which has a fix for this (in the Linux kernel).
The new image should be posted soon.
Thanks,
Mike
Still no news about this?
I'm using opnwrt r14912 and the problem is still there, I get about 180Mb/s
Can you please be more specific on this KSEG1 register? patches around?
Thank you
UBNT-Robert
04-08-2009, 01:31 PM
Guys --
Just an update -- we will release a fix for this on Friday.
Robert
freezer2k
04-15-2009, 04:59 AM
Where can we find the update/fix?
UBNT-Mike.Taylor
04-16-2009, 05:10 PM
Guys --
Sorry for the delay on the beta. I'm working on it. It's important to me that it be solid. I also want to let everyone know where I am on this, given the delay and get some feedback.
* I've got stable builds based on OpenWRT trunk r15225.
* KSEG0 write-back caching is enabled prior to transfer of control to Linux.
* The 'cpu' performance issue is fixed, but the RAM throughput problem is still a work in progress.
Background on "KSEG0" issue -
On MIPS processors, address space is divided into segments. "kseg0" is used by convention for the kernel-mode code to access ram with caching.
Our boot loader uses write-through caching, which is not as fast as 'write-back' caching. Write-through caching is simpler and safer in a compact boot loader -- but Linux handles write-back caching fine and should get the speed boost. With my current stable builds, the boot loader enables write-back caching prior to transfering control to Linux.
NOTE: For those who built their own OpenWRT images, a workaround for this was already placed in openwrt/target/linux/ar71xx/files/arch/mips/include/asm/mach-ar71xx/kernel-entry-init.h . This is why your test numbers improved.
I'm am still working on DDR performance tuning AND I have now seen (and am invesigating) the intermittent issue with high speed USB devices [timing out and/or being reset by the master].
I'm not big on closed betas, but given the nature of this DDR timing fix - I'd like to limit exposure to a small group of folks who know the risk, (such as those on this thread). Please email me at mike.taylor at ubnt.com if you are interested.
Public beta for this should be posted in the next couple of days. I do want this RAM issue fixed and some degree of confidence that the build results in stable systems.
Thanks,
Mike
UBNT-Mike.Taylor
04-16-2009, 05:14 PM
BTW,
The reason other AR71xx-based boards will tend to have a similar memory benchmark is that they are all based on the same reference hardware/software and probably also used the pb42 reference board's conservative DDR timing values.
- M
freezer2k
04-16-2009, 06:11 PM
Hi Mike,
I already compiled and flashed the latest trunk.
The kernel-entry-init.h is in there (was there for 2months or so already), but memory benchmark did not really got any better -- still about 180mb/s.
There are other people in this thread who reported the same issue even with a recent trunk build, so i was wondering if this fix has any effect. Might it be possible that this header file isn't included properly, therefore has no effect?
UBNT-Mike.Taylor
04-16-2009, 08:33 PM
freezer2k,
The fix is there and working in your build - it just doesn't happen to affect this test. This test you are looking at measures uncached sequential reads, while the kseg0 write-back fix affects cached writes only.
Enabling write-back in kseg0 allows for writes [to cached locations] to be buffered in the L1 cache for speed. The L1 cache is on-die and extremely fast, compared to RAM bus accesses. Programs that read/write variables frequently get the most benefit - as they can defer accessing the memory bus when transfering values from registers back into variables, and keep the variables in the L1 cache for subsequent faster access, repeated writes, etc. Anyway, anything that reads/writes variables intensely like your CPU tests will be affected more dramatically. In the raw memory read test I think any savings in the loop counters, etc. is dwarfed by the RAM bus access times.
So, your memtest_ar71xx_Xmb tests will not be affected but your cputest_ar71xx will show improvements because a lot of the calculations in the test read/write a lot and benefit from the speed increase.
As I say, I'm working on tuning all the timings now.
- M
Guys --
I'm am still working on DDR performance tuning AND I have now seen (and am invesigating) the intermittent issue with high speed USB devices [timing out and/or being reset by the master].
I'm not big on closed betas, but given the nature of this DDR timing fix - I'd like to limit exposure to a small group of folks who know the risk, (such as those on this thread). Please email me at mike.taylor at ubnt.com if you are interested.
Again, great news, Mike.
You must have a mail showing my interest in to test theses new betas.
Did you think that theses issues can affect also the RS-PRO board?
Is there previsible a 'hardware' solution (some like cut or wire a new track in the board)?
Saludos, OSCAR.
UBNT-Mike.Taylor
04-17-2009, 12:14 PM
ogry,
Yes, thanks, I received your email and you shall have something to try shortly.
The basic design of RSPro is the same and will benefit from improvements we are making now.
Before this beta build is released it will support both boards. After some field testing, it will become the image used at the factory for new boards.
NOTE: initial firmware images for RSPro were based on a branch which I have merged but have *not* yet re-tested. I'll let you know when it is safe to flash this image onto RSPro boards. (Most important is to verify the GigE PHY driver for RSPro in RedBoot works in the merged boot loader so you can still undo the change if you ran into problems.)
There will be a few minor missing tweaks to OWRT base files for RSPro probably but I think I already have most of those.
I'll keep you posted.
Thanks,
Mike
freezer2k
05-11-2009, 11:57 AM
Hi,
Just flashed the newest bootloader:
root@OpenWrt:/tmp# ./memtest_ar71xx_16mb
Allocating 16 MB... ok
Renicing to -20... ok
64 loops over 16MB in 3204474 usecs: 349.51 MB/s
L1: 2586.56 MB/s
At least double speed here %)
Can we expect more or ist this all the hardware can do?