NAS

Openmediavault RAID1 NAS Performance Test on Banana Pi

26 20751
@bpiuser:

Thanks for the interesting report. Regarding the SATA performance: Have you done or considered any testing with a newer mainline kernel? BPi support in the mainline kernel is added in Kernel 3.19 which will arrive as a stable version sometime in mid-February. But for testing purposes the current release candidates should suffice. Might be interesting to compare if this changes anything.

Cheers,

Timo

TooMeeK  
Let's compare this to single drive unit..

Lubuntu 14.04.1
kernel 3.4.90
a little tuned Samba version 4.1.6-Ubuntu (see this: https://calomel.org/samba_optimize.html)
and single 2,5" 7.2k RPM disk with Ext4 volume

no switch (crossover cable)

Results:
average raw read speed with dd on disk connected to BPi:
10122395648 bytes (10 GB) copied, 292.976 s, 34.6 MB/s
BPi-Samba-perf.png

tkaiser  
Edited by tkaiser at Tue Jan 20, 2015 02:24
silentcreek replied at Fri Jan 16, 2015 08:11
BPi support in the mainline kernel is added in Kernel 3.19 which will arrive as a stable version sometime in mid-February


I've had a first look at Igor's jessie image (kernel 3.19.0-rc4). The cpufreq stuff seems to be not working:
  1. [    0.002186] /cpus/cpu@0 missing clock-frequency property
  2. [    0.002216] /cpus/cpu@1 missing clock-frequency property
Copy the Code
I used sysbench to get a clue at which speed the BPi operates currently. Since sysbench finished in 55 secs it should be 1008 MHz according to http://forum.lemaker.org/forum.p ... d=153&pid=12447. EDIT: This assumption was wrong, the current default in mainline is 912 MHz.

This as well as the used cpu governor ist a bit important if one does benchmarking because on a platform like Banana Pi both filesystem as well as network performance scales almost linearly with the CPU frequency. So comparing different setups without knowing how the cpufreq settings in question have been applied here and there is pretty useless

I used iozone on a Samsung EVO 840 connected via SATA ("iozone -a -g 2000m -s 2000m -i 0 -i 1" with different blocksizes) and got results as expected:
  1.      KB  reclen   write rewrite    read    reread
  2. 2048000       4    43161    43540   205450   207404
  3. 2048000      32    42965    43246   185958   186141
  4. 2048000     512    42654    42905   175532   179017
  5. 2048000   16384    42535    42843   182920   184055
Copy the Code
Testing with iperf looks a bit promising (network settings tweaked as in http://pastebin.com/f8dnrxgX):

OS X --> Banana Pi:
  1. [  4] local 192.168.83.44 port 5001 connected with 192.168.83.247 port 61761
  2. [ ID] Interval       Transfer     Bandwidth
  3. [  4]  0.0-10.0 sec   796 MBytes   668 Mbits/sec
  4. [  5] local 192.168.83.44 port 5001 connected with 192.168.83.247 port 61762
  5. [  5]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec
  6. [  4] local 192.168.83.44 port 5001 connected with 192.168.83.247 port 61763
  7. [  4]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
  8. [  5] local 192.168.83.44 port 5001 connected with 192.168.83.247 port 61764
  9. [  5]  0.0-10.0 sec  1.09 GBytes   935 Mbits/sec
Copy the Code
Banana Pi --> OS X:
  1. [  4] local 192.168.83.247 port 5001 connected with 192.168.83.44 port 40688
  2. [ ID] Interval       Transfer     Bandwidth
  3. [  4]  0.0-10.0 sec   554 MBytes   464 Mbits/sec
  4. [  4] local 192.168.83.247 port 5001 connected with 192.168.83.44 port 40689
  5. [  4]  0.0-10.0 sec   530 MBytes   444 Mbits/sec
  6. [  4] local 192.168.83.247 port 5001 connected with 192.168.83.44 port 40690
  7. [  4]  0.0-10.0 sec   596 MBytes   499 Mbits/sec
  8. [  4] local 192.168.83.247 port 5001 connected with 192.168.83.44 port 40691
  9. [  4]  0.0-10.0 sec   528 MBytes   443 Mbits/sec
  10. [  4] local 192.168.83.247 port 5001 connected with 192.168.83.44 port 40692
  11. [  4]  0.0-10.0 sec   586 MBytes   491 Mbits/sec
Copy the Code
Same situation as with Kernel 3.4.x: You're able to push data from a client to a Banana as server way faster than in the other direction. And with SATA it's vice versa so the overall performance is limited in either direction:



Nice but not as good as with 3.4.10x and 1.2 Ghz: http://forum.lemaker.org/thread-7102-2-1-2.html

tkaiser replied at Sun Jan 18, 2015 09:30
I've had a first look at Igor's jessie image (kernel 3.19.0-rc4). The cpufreq stuff seems to be no ...

About the missing cpufreq functionality: A driver for cpufreq is planned to land in Linux 3.20 according to the sunxi wiki. You could try to build the sunxi-next Kernel which already includes the patchset.

Nevertheless, it's interesting to see that mainline shows the same discrepancy between read and write as the sunxi-3.4 kernel. I guess the only thing that might be able to change something about that in the future might be the missing DMA driver that is still being worked on.

tkaiser  
Since rebuilding Igor's images from scratch seems to be pretty easy I will give
  1. #define CONFIG_CLK_FULL_SPEED 1200000000
Copy the Code
in sun7i.h a try in the meantime. The only thing that makes me wonder is that the default has been 912MHz in this file (unlike the other sunxi variants where it's 1008 MHz). So maybe I'm editing the wrong file. Will see in the evening when I can burn the new image to a SD card and test.

If it works I will benchmark again, rebuild the kernel with port multiplier support, test again with the very same SSD behind the PM and will then start with more than one disk behind the PM.

tkaiser  
Regarding the SATA performance. Both read and write speed do not max out the SATA 2.0 specs. Maybe it's just a hardware limitation. And if so I doubt that it will ever be fixed. The main market for A10 and A20 have been Tablets and HTPC. A few of the latter made use of the SATA port. But that's it and no new devices have been built based on these old SoCs in the meantime... except of some SBC designs.

According to Olimex Allwinner  will produce older SoCs on demand if you order 50Kpcs or above. But I really doubt that they will make a new hardware revision in 2015 even if the SATA port is the key feature of A20 based SBCs.

tkaiser replied at Mon Jan 19, 2015 07:32
Since rebuilding Igor's images from scratch seems to be pretty easy I will givein sun7i.h a try in t ...

Maybe I'm mistaken, but I think I read somewhere that without cpufreq support, the Kernel will just keep using the clock speed that is set in U-Boot. So, in case your experiments don't bring the intended changes, you might have a look the U-Boot.

tkaiser  
Edited by tkaiser at Mon Jan 19, 2015 17:03
silentcreek replied at Mon Jan 19, 2015 10:55
I think I read somewhere that without cpufreq support, the Kernel will just keep using the clock speed that is set in U-Boot


I modified sun7i.h in the u-boot sources. First I changed the default 912000000 to 1200000000 but this lead to the kernel not executing. Now I tried it step by step and succeeded with 1008000000 Hz. The environment is somewhat different because now I use 3.19.0-rc5 with wheezy:

default (912 MHz): sysbench execution time: 53.8241 sec
1008 MHz: sysbench execution time: 51.4150 sec

Then I tried 1104 MHz but now the kernel doesn't start as with 1200 MHz before. Maybe there's a hard limit that has to be tweaked also. Or maybe I would've to change voltage tables (no idea how/where).

But that's interesting that using mainline kernel the old A20 default of 912 MHz applies (also to my tests from yesterday)

tkaiser  
Edited by tkaiser at Tue Jan 20, 2015 10:02

Some final words regarding mainline. Yesterday I spent a few hours with testing just to realize that there's so much more to come regarding CPU frequency stuff and thermal issues the next monts that I will stay with 3.4.x in the meantime: http://www.spinics.net/lists/arm-kernel/msg388446.html

I got the BPi to start the kernel (3.19.0-rc5) with 1056 MHz but anything above failed. With 1056 MHz sysbench executed in 48.9100 secs. And while the execution times look reasonable and might suggest mainline kernel executes code faster -- see below -- this might just be the result of board initialisation (e.g. different DRAM clocking or the effect of the 'on demand' governor on thatsbanana's tests) or differing base load (X vs. headless). Therefore no conclusion is possible based on these values.

These were thatsbanana's results back in August running Lubuntu (with kernel 3.4.90 I would assume)
  1.   61s * 0.9: 54,9
  2.   47s * 1.2: 56.4
  3.   43s * 1.3: 55,9
Copy the Code
My result's with Wheezy and 3.19.0-rc5:
  1. 53.8s * 0.912: 49.066
  2. 51.4s * 1.008: 51.811
  3. 48.9s * 1.056: 51.638
Copy the Code
It's also important to keep in mind that the CPU clock can only be adjusted by 48MHz. So when you set 900 you will end up with 864, 1000 --> 960 and so on.

EDIT: Frichenbruder contributed two sysbench results for 1200 and 1296 MHz that perfectly match my measurements:
  1.   43s * 1,200: 51,6
  2. 39,8s * 1,296: 51,58
Copy the Code

tkaiser  
With 3.19.0-rc5 there seems to be no patching needed when switching between PM mode and directly attached SATA disks. First results (Igor's image, Wheezy, 3.19.0-rc5, 1056 MHz CPU clock):

Directly attached Samsung EVO 840 128G, ext4:
  1.      KB  reclen   write rewrite    read    reread
  2. 2048000       4   43053   43643   202636   203209
  3. 2048000      32   42743   42949   184831   183612
  4. 2048000     512   41774   42533   174186   175606                                                                          
  5. 2048000   16384   41897   42500   175786   172778
Copy the Code
1056 MHz with JMB321 in between Banana Pi and the very same SSD/fs:
  1.      KB  reclen   write rewrite    read    reread
  2. 2048000       4   42308   43433   137458   137479                                                                          
  3. 2048000      32   42247   42690   137394   137445
  4. 2048000     512   41443   42048   136712   136934
  5. 2048000   16384   42056   42056   136465   136525
Copy the Code
The PM decreases read performance further even in single disk mode without real switching activity involved.

You have to log in before you can reply Login | Sign Up

Points Rules