Bananian

Pro & M2 Tuning

27 4877
skraw  
Hm, it seems all of you do not read my writing. The only reason I entered this forum is that all bought bananas (dual core and quad core) do not deliver the network speed I expected from the papers. I bought the RPI2 only for having a comparison. Unfortunately I had to find out in every single test that the bananas deliver (much) worse results than the (tuned) RPI2. It may well be that none of you ever tried my above RPI2 tuning and has in fact no idea of the performance it delivers. Seeing a single hdparm result together with around 30 years of programming experience does teach you that the cache read performance around 550 MB compared to around 400 MB on all tested bananas that they all lack core and ram speed. Therefore the attached GB lan chip is useless. It may be you can flood ping (or iperf) such a box showing good performance. But true data transfer will be seriously damaged because quite some ram is accessed during high rates. If you tried only once you could have noticed.
Still nobody seems to know how to bring up core and ram speed to 500 like in the above config.txt. I do expect quite a performance boost out of it, as it is on RPI2. But the other big banana problem is the complete lack of tuning and patches.
It may well be that only some echos to /sys/XYZ are needed, but which and where ... ?

tkaiser  
Edited by tkaiser at Mon May 25, 2015 14:07
skraw replied at Mon May 25, 2015 13:54
Hm, it seems all of you do not read my writing.


True from now on. A final note: Here exists a subforum called Network and servers.There you find the stuff you pretend to be interested in: Real world throughput and tuning.

skraw  
And just for the sake of it, here are some hdparm results c&p from console:

root@bananapi ~ # hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   720 MB in  2.00 seconds = 360.12 MB/sec
Timing buffered disk reads: 100 MB in  3.04 seconds =  32.93 MB/sec
root@bananapi ~ # hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   752 MB in  2.00 seconds = 375.38 MB/sec
Timing buffered disk reads: 100 MB in  3.04 seconds =  32.89 MB/sec
root@bananapi ~ # hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   732 MB in  2.00 seconds = 365.61 MB/sec
Timing buffered disk reads: 100 MB in  3.04 seconds =  32.88 MB/sec

root@raspberrypi:~# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   1130 MB in  2.00 seconds = 564.73 MB/sec
Timing buffered disk reads:  92 MB in  3.02 seconds =  30.46 MB/sec
root@raspberrypi:~# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   1116 MB in  2.00 seconds = 558.27 MB/sec
Timing buffered disk reads: 100 MB in  3.05 seconds =  32.84 MB/sec
root@raspberrypi:~# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   1086 MB in  2.00 seconds = 542.96 MB/sec
Timing buffered disk reads: 100 MB in  3.06 seconds =  32.70 MB/sec
root@raspberrypi:~# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   1082 MB in  2.00 seconds = 540.68 MB/sec
Timing buffered disk reads:  94 MB in  3.05 seconds =  30.78 MB/sec
root@raspberrypi:~# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads:   1112 MB in  2.00 seconds = 555.55 MB/sec
Timing buffered disk reads:  94 MB in  3.04 seconds =  30.93 MB/sec

The console prompts tell you which box it was. (banana is in this case M2, Pro is acting alike).
The "buffered disk reads" show more or less the USB bottle neck around 30 MB, the "cached reads" show the lack of core/ram performance (around 400 compared to around 550).
Drop the idea that I do not have the boxes I am talking about, please, because this is just ridiculous.

Ok, so - your cached reads on the Rasperry are faster. So what? That's nice in theory but has no practical relevance. You were talking about server usage. How will you benefit from faster cached reads, if the network connection of the device is limited to 12MB per second? If you add a USB GBit network adapter, you get a bit fast performance, but still - that's still way slower than even the continuous read performance of your attached USB storage.

As said before - on the Banana Pi, you can achieve much faster transfer rates in real world applications. If you use a SATA harddrive, even way beyond the USB2.0 limit.

Now, about you're actual questions: There is tweaking you can do. The easiest would be to overclock the CPU by doing:
echo 1200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
(this would overclock it to 1.2GHz)

It depends a little bit on the kernel version you use, but since you're positng in the Bananian forum, I assume it's kernel 3.4.x. If you use a newer mainline kernel, then there's more to it. You would have to edit the device tree file before compiling because the maximum frequency is hardcoded there.

As for RAM: To my knowledge the kernel or Linux don't set the RAM frequency but the bootloader (U-Boot) does. If you are on the old 3.4.x kernel you could change that maybe by editing the fex file or you would have to edit the U-Boot source before compilation (with mainline U-Boot you would certainly have to do that in order to change the DRAM frequency). That being said, I never tried this. And I still think you're going at it from the wrong angle. If you have 30 years programming experience, then you better report tests that have actual meaning for real world applications (not cached reads) and ask how you can tweak that.

tkaiser  
Edited by tkaiser at Mon May 25, 2015 14:57
silentcreek replied at Mon May 25, 2015 14:17
As for RAM: To my knowledge the kernel or Linux don't set the RAM frequency but the bootloader (U-Boot) does


I did some tests with a PCDuino Nano v3 (comparable to the 'original' Banana Pi) with 408 and 480 MHz and it made a little difference regarding I/O performance: http://forum.lemaker.org/forum.php?mod=redirect&goto=findpost&ptid=12167&pid=66862

But even without any tweaking of DRAM or CPU settings A20 based boards are always multiple times faster than any RPi in NAS scenarios.

skraw  
Ok, at least someone read half of my writing :-)

The interesting point about "cached read" is not that its cached reads but to ask the question _why_ there is such a tremendous difference of around 37% gain on RPI2. since cpu freq is merely the same on both boxes it obviously is the ram and core timing. It makes no real sense to fiddle around with only one of them as them not matching will mean you loose almost everything you should have won by increasing only one of the two. This is the main reason why you had almost no difference between 408 and 480 MHz. You only increased the dram clocking which makes little sense as there is no additional throughput inside the core. Same goes for increasing only the cpu freq. Sure it is better than the dram because the cpu can make something out of the gained cycles, but still it has to go through the core bottleneck very often.
In between your lines I understand that the possibilities of tuning all three freqs are next to nil on current boot scheme, yet another win for the RPI where this is dead simple.
Still I would be interested in hearing results of a trivial scp to your server setups onto usb hds with more than 100 MBits. I tried really hard and find it simply impossible. Keep in mind that we don't want to time scp to cache here, so you have to use a file large enough. So my claim stands: banana's GB lan is _useless_ in setups with several hds (which means true file servers (>8TB), not talking of a few GB here).

I don't use scp (at least not often), but I do see, why you would get less than ideal performance on a banana pi. The problem with scp on All winner devices is that the encryption has to be done all by the CPU. The SOC has a crypto accelerator, but the driver for that is still being worked on. With that applied scp transfers should be faster. Of course overclocking the CPU should help too. You can search the mailing lists for the patch if you want to try the patch, it's already in a quite progressed state. Nevertheless, if you want to achieve faster transfer rates then you might want to use samba or nfs or ftp instead.

Edited by mikronauts at Mon May 25, 2015 20:29
skraw replied at Sun May 24, 2015 22:29
It is obvious to me that some things were changed on M2 specs on the fly. There are several places o ...


Well, we all are happy that you like your Raspberry Pi 2.

I like mine as well - but then again, I also like my BPi's and BPro.

FYI, it would be appreciated if you did not write totally incorrect information in your messages - NEITHER RPi2, or any BPi can get 400MB/sec through USB.

USB is limited to 480Mbps, which after protocol overhead, translates to a maximum of around 35MB/sec for USB hard drives.

The "cached" read results from hdparm might be useful as a single-core use memory bandwidth, but are TOTALLY useless for hard drive benchmarks.

These ARM boards have 1GB of ram, you will not realistically have a lot in the disk cache.

tkaiser  
Edited by tkaiser at Tue May 26, 2015 02:17
silentcreek replied at Mon May 25, 2015 20:03
The problem with scp on All winner devices is that the encryption has to be done all by the CPU.


The problem with scp is that it's neither a storage nor a network benchmark at all! Unless he provides the whole scp command line and the whole output of 'scp -vv' to see which cipher suites have been negotiated and top output from both client and 'servers' noone can have a clue why performance differed. It's about encryption when you use scp/ssh. And there's a strong relationship between performance and strength of encryption (should be easily understandable by someone who claims '30 years of programming experience')

The performance differences between SSH ciphers are huge, SSH server and client always try to negotiate the best/'strongest' cipher available on both sides (which is why we always use internally an OpenSSH build that provides the 'none' cipher or use at least rsync -e 'ssh -c arcfour' when doing incremental syncs). Bananian for example uses a hardened SSH configuration probably only allowing strong ciphers like aes256 and I doubt that Raspbian does the same. So results will always vary heavily depending on the individual settings SSH client and server negotiated dynamically when establishing the connection.

The approach to use scp as a benchmark is worthless. Unfortunately this applies to all his 'benchmarks'. And the conclusion he draws from the results are even more weird. And since he wants to believe in irrelevant core, RAM and other timings he won't realise that he's looking for the wrong numbers. It's still a no-brainer to realize that a Raspberry Pi with its internal 4 or 6 port USB hub and its single USB connection isn't able to serve as a performant NAS. And when testing is done right this is simply obvious. And all this stuff is outlined in the approriate subforum covering exactly these questions. If he would've started reading instead of complaining...

BTW: At the time of this writing the only SBC with a working/useable crypto accelerator is the Beaglebone Black. And you would have to build the crypto suites you want to use manually to make use of it. So it's most likely due to different ciphers and a single CPU core being the bottleneck why his results differ. But since using scp for 'benchmarks' is always a bad idea (same applies to hdparm)...

skraw  
Ok you smart guy, then please explain the obviously bad performance in cached read hdparm on all bananas compared to the RPI2 and atom which are merely equal. I am quite interested in your explanation not touching core and ram speed ...

You have to log in before you can reply Login | Sign Up

Points Rules