NAS

Openmediavault RAID1 NAS Performance Test on Banana Pi

26 20752
tkaiser  
Edited by tkaiser at Tue Jan 20, 2015 05:47

Some LanTest results with the aforementioned settings (Igor's image, Wheezy, 3.19.0-rc5, 1056 MHz CPU clock):

Without tweaking scheduler settings and SMP affinity of processes:

No PM:


With PM:


After further tuning as outlined in http://forum.lemaker.org/thread-7102-2-1-2.html

No PM:


With PM:


The result variations are ok (watch the orange triangles that show minimum and maximum results) and the good news is: In a single disk situation the NAS performance seems to be identical with or without PM. Fortunately this is my PM use case. I want to use the Banana Pi as backup target with different 3.5" HDDs in a round robin fashion. No need for RAID or other insane setups.

tkaiser  
Just a small follow-up: I'm done with this PM stuff since it's totally unreliable: http://forum.lemaker.org/forum.p ... 7&fromuid=33332

But I still wonder whether others who use a cheap JMB321 experience the same set of problems when putting the PM under heavy load and transferring large amounts of data?

@bpiuser: Did you also ran iozone on your RAID-1 to push the PM to the limits? Have you had ata related messages in syslog or dmesg output when generating high I/O load? Did you checked SMART values for CRC checksum errors?

tkaiser  
I think I found the solution to let a JMB321 run reliable. It was apparently a problem of heat generated due too much I/O load: http://forum.lemaker.org/forum.p ... 7&fromuid=33332

I will give the JMB321 a second chance preparing a consecutive test with high I/O load over 2 weeks. If it survives and data integrity is OK (therefore I will use btrfs -- otherwise it's just a joke) I will consider using it as part of a backup appliance.

TooMeeK  
I would like to perform a performance test with enterprise class drive - WD VelociRaptor 2,5" 10k RPM, however drive isn't starting with BPi.
It requires additional 12V 0,3A supply. BPi provides only 5V supply.

tkaiser  
TooMeeK replied at Sat Jan 31, 2015 06:41
I would like to perform a performance test with enterprise class drive - WD VelociRaptor 2,5" 10k RP ...

The VelociRaptors we used some time ago needed at least 1.5A to spin up.

Regarding such a test combining two different worlds: The A20's SATA limitations will be responsible for the write bottleneck (max. 45 MB/sec), the VelociRaptor's max. read speed (between 115 MB/sec on the inner tracks and up to 210 MB/sec on the outer) will be the bottleneck in the other direction. And if a cheap PM sits in between everything will get a bit slower (sometimes even more when the PM leads to just SATA 1.0 negotiation). I'm curious what you will find out when you solved the PSU problems :-)

tkaiser replied at Fri Jan 30, 2015 03:50
I think I found the solution to let a JMB321 run reliable. It was apparently a problem of heat gener ...

Would love to hear your update on this. I am planning on building the banaNAS for some basic fault tolerance with additional USB backup. I'd rather not waste the drives if the RAID is unreliable, does rebuilding still take 20+ hours?

tkaiser  
Edited by tkaiser at Tue Mar 3, 2015 03:16
blindpet replied at Fri Feb 20, 2015 15:45
Would love to hear your update on this. I am planning on building the banaNAS for some basic fault ...


In my opinion forget about RAID at all when you want to combine it with a JMB321. RAID-5 ist crap with current consumer hard disks with large capacities especially when you use md-raid since md-raid works on block devices and therefore has to rebuild each and every single block which takes ages since the SATA speed limitation and CBS based switching will slow things unnecessarily down.

Imagine one disk fails and you've just single redundancy (RAID-5). Then all that is needed to loose your whole array is a single failure of a single block on another disk or a failure of the port multiplier itself. Please read the "RECOVERY" section in the md manual page:

If either the write or the re-read fail, md will treat the error the same way that a write error is treated, and will fail the whole device


Given the unrecoverable error rates of desktop drives and a single point of failure between the BPi's SATA controller and all disks (the port multiplier) this might happen very likely when the rebuild is running.

You would need RAID-6 or double redundancy (will further slow down especially writes since you will get 8 MB/s under best conditions and with write requests smaller than the RAID stripe size even way lower since then the 'read all blocks from all disks, calculate new checksums, write back to all disks' typical RAID rewrite cycle has to happen).

In mdraid-6 there was a bug present for over 5 years leading to potential data corruption: https://lkml.org/lkml/2014/8/18/17. This bug has been fixed half a year ago and I doubt anyone backported the fix to the aging sunxi kernel 3.4.x, so you would have to use mainline kernel for RAID-6. If one uses mainline kernel the far better alternative would be btrfs (all features/fixes in kernel so you need a recent kernel version otherwise you might encounter known bugs and so on. And you need the appropriate version of user-space tools, on Debian wheezy for example they still ship the horribly outdated 0.19 version)

While btrfs would be the much better alternative (see the one exception below) since it uses checksums to ensure data integrity, writes these checksums on both disks in RAID-1 (and if you want in RAID-0 as well -- see below), can create an nearly unlimited number of snapshots without performance impacts and contains also a volume manager and an own RAID-implementation that is file and not block based (shorter rebuild times and less stress when disks aren't full) the raid5/6 support in btrfs is still called experimental.

But since performance would drop drastically and a RAID can not improve data security (just availability and in case of btrfs' RAID implementation also data integrity due to distributed checksums) this sort of redundancy is pretty useless.

Better go with 2 disks: The main disk on SATA that will contain the data and a larger disk on USB. Create regular snapshots on the SATA disks and send them to the USB disks using 'btrfs send/receive' (way more efficient compared to rsync and the like because just the changed file chunks will be sent to another device, that might be even a second Banana Pi located in another room, city or country)

The main reason RAID is unreliable with a JMB321 can anyone simply experience on his own: do not trust that things work as expected (especially when you try to waste HDDs for redundancy) but simulate worst case scenarios. I created a btrfs raid-1 containing 2 disks: A WD Green on PM port 0 (ata1.00) and a Seagate Barracuda on PM port 2 (ata1.02). I started an iozone test on the RAID array and simply removed power from the WD on port 0. The kernel dropped a message that the JMB321 lost control over all its 5 ports, especially the remaining disk on port 2 threw errors and after several attempts to recover from this situation the filesystem was damaged: http://pastebin.com/eveQVsn9. This is what you can expect when one disk is dying in your RAID array. Your cheap and unreliable PM will throw the remaining disks instantly also away and all your data has vanished in a second.

I made also a few tests with a btrfs-raid-1 with one SATA and one USB disk. Write speeds will be limited to typical USB 2.0 performance (30 MB/s) but since btrfs chooses the disk to read data from a RAID-1 array randomly (depends on the process ID of the btrfs-worker-thread in question) you get read speeds between 30 MB/s and 200 MB/s depending on from which disk btrfs reads. Unfortunately this performance drop also applies to rewrites that are smaller than btrfs' block size (by default 4K) since btrfs always has to calculate checksums. And then you run into the 'read the whole block from disk, exchange a few bytes, calculate new checksum, write back to disk' cycle that slows things down massively if the disk that is read from is connected via USB (not only due to USB2 being limited to 30 MB/s throughput but also since USB's BOT mode is highly inefficient you will also have increased system load and especially more %iowait compared to SATA)

I tried an mdraid-1 using "--write-mostly' to setup a RAID-1 where the data is written to both disks but read only from the SATA member as long as this disk is available. I put a btrfs on top (checksumming, snapshots!) and while this provides constant throughput of 30/150 MB/s (write/read) I still believe that RAID-1 is a waste of disks in this situation.

Forget about el cheapo port multipliers when you try to access different disks in parallel (since one failing disk might affect other disks as well), use mainline kernel and btrfs and a data security approach instead of useless plain redundancy (AKA RAID). Snapshots that are sent to another device, regular scrubs and duplicated metadata (checksums) will help way more than crappy RAID.

If you want both maximum performance, maximum data security and data integrity utilising Bananas you need 2 Banana Pi/Pro and create a mixture of a RAID-1 and RAID-0 between a SATA disk and an USB disk on the first (data will be striped between both disks and metadata/checksums will be mirrored). Then create regular snapshots of this filesystem and scrub at least weekly (since btrfs using checksums it will notify you when disk read errors occur so you can restore the corrupted data from backup). And then transfer the snapshots using "btrfs send/receive" to the second Banana Pi/Pro that has access to the same 2 disk setup or to a single but larger SATA disk. The capacity of the second system should exceed the first system by at least 30% to have enough space to keep a larg number of snapshots to go back into time.

I created such a mix of mirror and stripe set:
  1. mkfs.btrfs -f -d raid0 -m raid1 -L"mirrored and striped" /dev/sda /dev/sdb
  2. mount -v -t btrfs -o noatime /dev/sda /mnt/mirrored-and-striped
Copy the Code
Write speed will increase from 40-45 to 60 MB/s compared to SATA, read speed will decrease also to this same value since RAID-0 and the limited throughput of the USB connected RAID member will be the limitation. But since the network is the real bottleneck in NAS situations only ~58 MB/s are enough. But all write and especially rewrite requests smaller than the btrfs block size will be slower. I used my standard iozone test for such a setup (iozone -a -g 2000m -s 2000m -i 0 -i 1 -r${recordsize}k) with different record sizes between 1K and 16M:
  1.       1   31363    6911    64752    65785
  2.       2   47851    7381    65603    66077
  3.       4   59372   59087    58151    58888
  4.      32   60735   60534    58197    58628
  5.     512   55922   58256    58062    58341
  6.   16384   58466   57941    58224    57824
Copy the Code
Bonus tip: When finished with benchmarking use '-o noatime,compress=lzo' as mount options (btrfs features also transparent filesystem compression) and use a heatsink on the A20 SoC. This increases the CPU load but helps with real-world data that is compressible. Since less data has to be written to and read from disk the throughput automagically increases.

Final note: since the Bananas lack ECC RAM I would use a Banana Pi as NAS only for backup data and not for anything that isn't already stored elsewhere safely. The problem is called bit rotting and without ECC RAM it's impossible to ensure data integrity. Please compare with the last paragraph in the BPi as file server thread

You have to log in before you can reply Login | Sign Up

Points Rules