Discussion

Banana Pi & Pro SATA and USB Hard Drive Tests

11 10566
Hi,

I ran a lot of SATA tests on both the Banana Pi and the Banana Pro to nail down the performance boost we get from SATA

There are four drives competing here… a 5400rpm laptop drive, a 7200rpm 3TB desktop drive, a slow SSD and a fast SSD.

As expected, it is a LOT faster :-) than the same drives over a USB-SATA bridge. I do wonder about the very slow SATA writes.

Read: http://www.mikronauts.com/2015/0 ... ts-and-experiments/

Regards,

Bill
cxy  
Good Share . . .

hawhill  
What kernel was used? Did you try a current mainline one?

hawhill replied at Tue Mar 10, 2015 01:38
What kernel was used? Did you try a current mainline one?

Hi,

I used 3.4.103, here is the uname -a output

Linux BPiNAS 3.4.103 #4 SMP PREEMPT Thu Dec 18 12:55:58 CST 2014 armv7l GNU/Linux

I suspect that the SATA port is configured as SATA 1.0 even thought the A20 is supposedly capable of SATA 2.0, and further, there is something wonky about writes to the drive, as the write performance is much lower than should be feasible even with SATA 1.0

I looked at all available A20 data sheets, but the SATA section is redacted so there is no easy way for me to dig deeper

When I have a bit more time maybe I'll see if I can dig into the kernel sources.

tkaiser  
Edited by tkaiser at Mon Mar 16, 2015 10:47
mikronauts replied at Tue Mar 10, 2015 07:57
I suspect that the SATA port is configured as SATA 1.0 even thought the A20 is supposedly capable of SATA 2.0


If you didn't use a port multiplier the SATA negotiation will be SATA 2.0 ("dmesg | grep ata"). But that won't help with slow write performance of the A20's SATA implementation. BTW: Since both Banana Pi and Banana Pro are absolutely identical when it comes to features like the SATA implementation for every difference you measured the setup or measuring inaccuracies have to be blamed. One example: If you study the benchmark values I collected in the link below you will realize that I got a lot more SATA throughput on a PCDuino3 (same A20 but RAM clocked with either 408 and 480 MHz).

The LeMaker guys set the DRAM frequency on the Banana Pi higher in the first days and then lowered it due to stability issues. So a test made half a year ago with a fex file that set the RAM clock to 480 MHz on the Banana Pi then might show better SATA throughput values compared to a Banana Pro since the fex files (and device tree stuff in mainline as well) started with 432 MHz. But their performance is identical and if you use a more recent distro for the Banana Pi you'll realize that your Pi got slower since it's also just using 432 MHz for DRAM.

In my opinion you should adjust your benchmark setup (neither hdparm nor dd are good, you tested partially fs caches/buffers in RAM, if you exceed 35 MB/s over USB then this is a clear indication of such behaviour). You should test with file sizes at least twice the amount of RAM and use the appropriate dd flags. I would better go with iozone and bonnie++.

On slow platforms like the A20 many things depend on tunables and also the record/block size of accesses matters. You will get different results when you choose a different kernel config (the A20 is a SMP system lacking many pro features of x86 or other architectures and since everything has to be done in software scheduler settings and the like influence SATA performance as well, at least a whopping 10 MB/s more when testing a fast SSD!).

The good news: You can get read speeds above 200 MB/s and write speeds close to 45 MB/s over SATA. But then the SoC is almost completely busy doing disk I/O so if you have a mixed workload that's also CPU intensive (or network accesses in parallel) you can forget about these maximum throughput values:

http://forum.lemaker.org/forum.php?mod=viewthread&tid=12167

Doing benchmarks on such a weak platform without having an eye on 'iostat 5' output to see where the bottlenecks are is a bit useless if one wants to get a clue how benchmarking correlates with 'real world scenarios'

I did not use a SATA multiplier.

I used the same SD card image, and same hard drives for the tests, specifically to eliminate differences - I did not use previous results, and I am also puzzled between the Banana Pi / Pro slight differences. I am aware of the dram re-clocking, tony posted about it quite a while ago.

I totally agree that hdparm buffered read is not a great benchmark, however it does return a quick metric

There should be no caching effects on the dd reads, as I used a file 1.5x the size of the total memory, and performed it after the writes, so the head of the read file would no longer be in the cache.

There is a slight write buffering gain on the writes and copy as some buffers will be flushed afterwards, however the difference was very small. I timed the dd's, so I could compensate, but the difference was small (sync returned almost immediately after the dd).

I am aware of the nice tuning work you have done (thank you)

I've been avoiding iozone and bonnie as I am interested in the maximum possible transfer rate out of the box, not file system performance.

tkaiser  
Edited by tkaiser at Mon Mar 16, 2015 16:07
mikronauts replied at Mon Mar 16, 2015 13:25
I am aware of the nice tuning work you have done


It's not about tuning it's about identifying the parameters that affect performance of different subsystems

There are no 'low level firmware' differences between different A20 based boards. What makes a real difference is the PCB layout. The closer RAM and SoC are placed the higher the DRAM might be clocked and the faster SATA transfers are because the performance is dependant of memory bandwidth.

But since this is not a x86 or Sparc system with PCIe connected host bus adapters containing own controllers nearly everything has to happen 'in software' on the SoC and this makes the real difference. If you run one test and it's time for a maintenance cron job in the background then this specific test job will be way slower than the next one even using the same SD card image and the same hardware initialisation (fex file or device tree stuff with mainline kernel). If you don't control/monitor these boundary conditions you get different results and it won't even help to run every single test variation ten times and use the average values.

Unless you cannot ensure that the system is just doing the benchmark the results are worthless since many times the limiting factor is CPU power. And if you run a specific benchmark and have also an eye on ressource utilisation ("iostat 5" at least) then you will get a clue what's wrong with benchmarks compared to 'real world scenarios'. If the system is running out of CPU when doing just a single storage test then you will know that you will never get the values measured in reality since then the CPU has to do other things as well. For me this is one of the most important results of a benchmark: An indication of whether the values measured apply to 'real stuff' or just to a synthetic benchmark scenario describing plain nothing.

Excellent point about PCB layout affecting the maximum reliable memory clock.

I think in this case what I ascribed to possible low level firmware difference is can likely be ascribed to a dram clocking difference between the Banana Pi and Pro... which frequency would be set during the boot process deciding to configure for Banana Pi or Banana Pro... (which could be called a firmware difference)

I controlled the other variables, and did not have any other processes running, just an SSH session to run the tests. Mind you, I did not check to see if Raspbian kicked off any occasional cron tasks.

For what I wanted to measure, the default install performance with various drives, my results are useful - I never claimed I ran perfect "bare metal" benchmarks

tkaiser  
Edited by tkaiser at Tue Mar 17, 2015 03:23
mikronauts replied at Mon Mar 16, 2015 16:50
For what I wanted to measure, the default install performance with various drives, my results are useful


While I greatly appreciate your work I don't agree on that.

People like numbers and benchmarks. They don't care that much what the implication of a specific test is, they only compare some numbers. In reality Banana Pi and Banana Pro perform 100% identical (since they share the same A20 SoC and the same settings: currently 432 MHz DRAM clock). The same applies to SATA and USB performance (both 100% dependant on the SoC used).

What you referred to as 'firmware changes' are tunables instead. If I ensure that the SoC won't overheat using heatsinks or a fan then I can clock it with 1.2 Ghz or above which does not only affect computing performance but also I/O performance. If I do the same with the DRAM chips and they pass DRAM stability tests then I might clock the DRAM with 480 MHz which again not only affects computing performance but also I/O performance. If I or the maker of the distro I use someday decides to tweak the kernel config I will get sometimes slower single thread computing performance but overall faster I/O performance.

And all this stuff doesn't matter at all. Because if the CPU is the limiting factor while running benchmarks the results have no relation to normal use. And every performance difference below 30 percent is nothing anyone could notice. You can measure it but you won't notice. So it's irrelevant at all. If it's not then you probably chose the wrong device and you need a more powerful.

Bottom line: Benchmarks should clarify how they relate to normal usage using some final words. And the differences you measured between Pi and Pro are in fact irrelevant, these devices perform identical (have you tried to run a single benchmark ten times and compare the average results afterwards? Maybe then you would've get nearly the same numbers).

And the most important thing to point out when benchmarks are compared between Bananas and Raspberries is that you can forget about RPi when it comes to networking or I/O because all this stuff is handled with one single ultra slow USB2.0 connection between SoC and USB hub with integrated ultra slow NIC. That's the main difference and this should be pointed out in a few words without relying on numbers at all. Because due to USB only the RPi's interconnections are not only slow but 'expensive' too (wasted CPU cycles for the inefficient USB stuff)

To sum it up: Benchmark testers should educate benchmark readers to not trust into numbers but instead provide a meaning (10% measured difference --> no difference in reality when we speak about SBC)

tkaiser replied at Tue Mar 17, 2015 01:45
While I greatly appreciate your work I don't agree on that.  

People like numbers and benchmark ...

We can agree to disagree

I measured the performance as shipped from the factory, with the same (latest) OS release on the same SD card. This gives the numbers that end users, the vast majority of whom don't understand about dram clock speed etc. can reproduce. I totally understand that if configured 100% the same the Banana Pi and Banana Pro would give the same result - however if they are not so configured "out of the box", and we re-configure them, we are no longer reporting on "out of the box" experience, and the vast majority of users will not be able to reproduce the results.

Tunables are great, and who knows, when I have the time, I may do some tweaking :-) I liked the tuning you did in your NAS posts btw.

I also agree with you that small differences do not matter, I start to care around 20% difference, which I notice IRL.

I do multiple runs on short benchmarks, but with dd I am interested in peak performance "out of the box" not average performance.

I totally agree with you re/ Ethernet and SATA on Banana VS. Raspberry, here is a quote from the conclusion of my Raspberry Pi 2 review:

"Neither the Raspberry Pi 2 Model B nor the ODROID C1 have SATA, however the Banana Pro has both Gigabit Ethernet and SATA, which is a huge advantage for small servers."

http://www.mikronauts.com/raspbe ... i-2-model-b-review/

and regarding the USB bottle neck, here is a quote from my Raspberry Pi 2 NAS experiments:

"In reality, due to USB overhead, interrupt overhead etc., the combined usable bandwidth of a USB2.0 bus is limited to less than 400Mbps, that is, less than 50MB/sec total bandwidth for Ethernet and all USB devices."


http://www.mikronauts.com/raspbe ... s-experiment-howto/

You won't like the hdparm/dd benchmarking, however it suits my needs for "out of the box" (untuned) maximum performance.

You might like my analysis of bottlenecks, which was written for average Pi users.

I re-iterate... our personal viewpoints are probably not that far apart, where we have a gaping chasm is the intent of the testing, and what we are trying to measure.

For my purposes, hdparm and dd are useful, and for your purpose, hdparm and dd are not appropriate.

You have to log in before you can reply Login | Sign Up

Points Rules