NAS
BananaPi as a fileserver -- some personal thoughts and experiences.
18
42295
View: 42295|Reply: 18
|
[NAS]
BananaPi as a fileserver -- some personal thoughts and experiences.
[Copy link]
|
|
Edited by tkaiser at Mon Feb 9, 2015 06:35
The basics
The Banana Pi is a small single board computer. Both its name and board layout might suggest it's compatible with the well known Raspberry Pi but that's definitely not the case. At its heart is a different SoC (system on a chip) which features a different GPU than the RasPi and more importantly contains both SATA and GBit Ethernet. The RasPi lacks both and its Ethernet chip is connected to an internal USB hub so all USB ports and the network adapter share the bandwidth of the single USB port the RasPi's BCM2835 SoC provides.
Ethernet as well as SATA connector of the Banana Pi's A20 SoC are connected directly and not via USB. They're able to negotiate at GBit link speed and SATA II (Serial ATA 3,0 Gbit/s, SATA Revision 2.x) but neither CPU power nor hardware features are sufficient to reach the theoretical maximum of both interfaces. Typical SATA throughput without extensive tuning is in the range of approx. 40/200 MB/sec and GMAC's network speed between 470/550-700 Mbits/sec (write/read -- with mainline kernel it seems to be possible to reach 700/940 Mbits/sec). If both interfaces are in use concurrently as it will happen most of the time in a file server setup then additional performance decreases will occur since chipset limitations and CPU power become a bottleneck.
The A20 SoC features 2 USB 2.0 (EHCI/OHCI) ports as well as an USB 2.0 OTG (USB On-the-go) Micro USB connector. All three ports are connected directly to the A20 SoC and can achieve real-world read/write speeds of approx. 30 MB/sec (this is a hard limitation due to USB 2.0's USB Mass Storage Bulk-Only Transfer). If you share USB disks over the network expect lower speeds since there's always some overhead.
CPU frequency stuff
The A20 SoC can be clocked in a wide range (officially between 60 - 1008 MHz -- less MHz means less needed voltage). Clock speed has a direct impact on performance. There exist also different policies how to adjust clock speed dynamically depending on load. If you always need high performance you should choose 'performance'. The drawback is that the CPU will always clock at the upper allowed level defined in /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq even when there's nothing to do at all. A better idea is to use the 'interactive' or 'ondemand' governors which dynamically clock between scaling_min_freq and scaling_max_freq depending on the load generated (you have to look yourself which governor fits your needs best to balance performance with power consumption)
You will need the cpufrequtils package and the governor available in the kernel configuration (eg. 'CONFIG_CPU_FREQ_GOV_ONDEMAND=y' -- compare with 'zcat /proc/config.gz'). Based on my tests slight overclocking is both possible and desirable (use a heatsink and ensure enough air flow):- echo -n 1200000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
- echo -n ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
- echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
- echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
- echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
- echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
Copy the Code Statistics are available below /sys/devices/system/cpu/cpu0/cpufreq/stats/
Heat dissipation
It's important that the relevant parts of the Banana Pi stay cool since the components try to prevent overheating dynamically through lower voltage and clock speeds. So think about using heatsinks for both CPU and the Power Management Unit AXP209. And consider mounting the board vertically to ensure enough air flow. Utilizing the 'chimney effect' might also be a good idea.
The BananaPi's AXP209 PMU has an integrated thermal sensor which can be read (degree Celsius) using- awk '{printf ("%0.1f",$1/1000); }' </sys/devices/platform/sunxi-i2c.0/i2c-0/0-0034/temp1_input
Copy the Code The A20 SoC itself contains also an internal temperature sensor but it's somewhat difficult to read and interpret the uncalibrated values:
http://www.cubieforums.com/index.php/topic=2493.0
http://www.cubieforums.com/index.php?topic=2293.0
Update: Community member FPeter provided a better approach: http://forum.lemaker.org/forum.php?mod=redirect&goto=findpost&ptid=8137&pid=47437
Update: A short overview including an archive with modified files to use RPi-Monitor on the BPi can be found here: http://forum.lemaker.org/forum.php?mod=redirect&goto=findpost&ptid=8312&pid=38582
SMP challenges
Containing a dual core CPU the BananaPi's A20 has both more power as well as more problems compared to a single core implementation: How to assign tasks to the specific CPU cores? Do a web search for 'SMP affinity' and 'IRQ balancing' for details. In short: When all interrupts are handled by one CPU core (usually CPU0) then the CPU might become a bottleneck for network throughput. An approach to this is IRQ balancing: Evenly processing all interrupts on different CPU cores in a round robin fashion. Since this doesn't work quite well in many situations (especially network IRQs) there also exist alternatives like manually controlling the SMP affinity of specific IRQs as well as processes (the latter can be done using the 'tasksel' utility)
The simplest solution on the BananaPi is to assign all network related interrupt processing to CPU1 by setting a specific SMP affinity for this IRQ (eg. in /etc/rc.local):- echo 2 >/proc/irq/$(awk -F":" '/eth0/ {print $1}' </proc/interrupts)/smp_affinity
Copy the Code Further reading:
https://lkml.org/lkml/2012/8/4/51
http://comments.gmane.org/gmane.linux.ports.arm.kernel/102251
https://groups.google.com/forum/#!topic/linux.kernel/pNyi-qX9uz8
http://www.alexonlinux.com/why-i ... t-such-a-good-thing
TCP/IP settings
The default values of many TCP/IP tunables aren't optimal for GBit network speeds. Increasing buffer sizes and queue lenghts helps in most GBit LAN scenarios (please be aware that the following settings might decrease performance on network devices with high latency and low bandwidth)- sysctl -w net/core/rmem_max=8738000
- sysctl -w net/core/wmem_max=6553600
- sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
- sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
- sysctl -w vm/min_free_kbytes=65536
- ip link set eth0 txqueuelen 10000
Copy the Code Scheduler settings and I/O priority
Setting both CONFIG_SCHED_MC=y and CONFIG_SCHED_SMT=y at kernel compile time seems to increase the possible throughputs on a multi-core system like the BananaPi. In case you plan to use a NAS that will neither be used interactively nor concurrently by different users then you might get a performance boost 'at no additional cost' by adjusting the scheduler priority / ionice settings of the processes serving the single network client. You have to get the process ID (PID) of the single process (eg. smbd running under the UID of the client user in question) and could then do aJumbo frames / MTU
While it seems possible to use a MTU of up to 3838 bytes and this really helps using synthetic benchmarks like iperf I didn't managed to get normal network loads stable afterwards and therefore returned to the 'traditional' MTU of 1500. Would be nice if others share their experiences.
File systems
For dedicated NAS storage ext4 seems to be the best choice on SATA/USB. Other file systems lack features (for example xattr, ACL, TRIM support) or are problematic in one way or another. XFS on ARM might lead to data loss if not done right at kernel compile time and while btrfs might seem to be an interesting choice there are two problems associated with it: btrfs heavily depends on the kernel version in use (and at the time of this writing all Banana distros use an outdated 3.4.x kernel) and since it's a checksum based file system with 'end to end data integrity' in mind it must not be used on devices that lack ECC RAM (since intact data on disk might get corrupted while scrubbing due to bit flips in RAM)
Partition alignment
While it's always a good idea to ensure proper partition alignment (taking the drive's sector sizes and Erase Block Size of SSDs into account) you most likely won't see any difference in performance when doing wrong since the A20's SATA implementation or USB's BOT (bulk only transfers) will be the bottleneck.
SATA Port Multipliers
Some people report that they work under certain circumstances. Follow this thread please: http://forum.lemaker.org/thread-9207-1-1.html (links might work or not since the LeMaker guys rearrange the subforums every few days)
Benchmarking
When you do benchmarking then always 'from bottom to top': Measure storage performance and network throughput individually first and only if you did so measure combined throughput. To have a look what's going on behind the scenes (CPU core utilisation and the like) use "htop" and eg. "dstat -cdnpmgs --top-bio --top-cpu --top-mem".
Good benchmarking tools are eg. iozone/bonnie++ to test local/remote storage and eg. iperf/netperf to measure network speeds without storage interaction. These tools provide switches to adjust parameters like record/block/window sizes that might help to fine tune server settings. And they correctly disable caches (one of the main mistakes people make when using eg. 'dd' for tests: measuring not solely disk throughput but mainly buffers/caches instead)
To measure the 'setup as a whole' from a client while considering different performance relevant parameters (not just throughput) I personally prefer Helios' LanTest. Available for free from here: http://webshare.helios.de (user tools, password tools).
Drive health / temperature
Using SATA for storage is not only faster than USB but provides more ways to get health feedback from the drive (this might work with some USB enclosures/bridge-chips as well but with many definitely not).
Using the smartmontools package one can start offline self-tests of the drive or SSD and also read different SMART parameters from the drive (either manually with smartctl or using a special daemon called smartd -- compare with the manual pages). SMART parameters are drive (manufacturer) dependent so you always have to ensure to get the most recent version of smartmontools drivedb.
You should monitor the specific health indicators that apply to your drive/SSD (reallocated sectors, wear leveling count for example) and should always have a look at parameter #199 if you add/change a drive. If the value is above 0 or increases when you try to write to the drive then something's wrong with cabling/connection. Have a look at the well known SMART attributes and their meaning here:
http://en.wikipedia.org/wiki/S.M ... M.A.R.T._attributes
Unfortunately the Debian Wheezy smartmontools package is outdated as hell so that one has to patch the update-smart-drivedb prior to first usage -- do a web search for 'update-smart-drivedb wheezy' to get an idea what needs to be changed.
The same applies to the nice hddtemp package. Most modern drives will be missing from the drive database. But it's easy to add your own drive. Run update-smart-drivedb (and fix it if it complains as outlined above), use 'smartctl -a /dev/sda' to read all available SMART parameters/values (in case of my SSD the interesting parameter is #190: currently reading 24°C):- Device Model: Samsung SSD 840 EVO 120GB
- 190 Airflow_Temperature_Cel 0x0032 ... 24
Copy the Code Then check the exakt name pattern hddtemp needs by doing a(in my case this will output "Samsung SSD 840 EVO 120G B" with a space between 120G and B). Then simply add a new line to /etc/hddtemp.db in the following form :- "Samsung SSD 840 EVO 120G B" 190 C "Samsung SSD 840 EVO 120GB"
Copy the Code Afterwards 'hddtemp /dev/sda' should simply work.
Final thoughts
For a device that cheap the network throughput as a NAS device is fairly good when configuration was done right and the components have been chosen wisely (SATA instead of USB, network infrastructure and so on).
But since it lacks ECC RAM bit rotting will happen over time. This problem can only be adressed using checksum based filesystems that provide 'end to end data integrity' and server grade hardware featuring at least simple ECC memory.
For example, we observe DRAM error rates that are orders of magnitude higher than previously reported, with FIT rates (failures in time per billion device hours) of 25,000 to 70,000 per Mbit and more than 8% of DIMMs affected per year.
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf |
Rate
-
1
View Rating Log
-
|
|
|
|
|
|
|
Wahhh, thanks a lot for documenting here. I think I read all your posts in the Bananian forum but this is a great knowledge perfectly summarized.
Thanks again |
|
|
|
|
|
|
|
Edited by tkaiser at 2014-10-15 02:32
The idea behind this 'article' was to collect some feedback from other users which can then be compiled into a wiki page. BTW: I forgot one section:
|
|
|
|
|
|
|
|
Great job!  |
|
|
|
|
|
|
|
In case I have missed anything but where are this very interesting postings from tkaiser are moved to? |
|
|
|
|
|
|
|
T.S. replied at 2014-10-16 14:41 
In case I have missed anything but where are this very interesting postings from tkaiser are moved t ...
Not that I'm aware of...
The posts seem to be modified by him, so I've been asking him to restore them if possible...
|
|
|
|
|
|
|
|
Post fixed! Thx!  |
|
|
|
|
|
|
|
From your valued analysis I would like to point to the following details:
TCP/IP and Scheduler settings currently are not pivotal because disk writing speed has a bottleneck saturating at 40 MB/s.
Jumbo frames might be interesting in the future. However going from 1500 to the double would not make too big difference. The Ethernet chip allows huge frame sizes and imho it looks like a bug in the Ethernet driver that bigger MTUs do not reliably work.
Health status: Checking the supply voltages would be interesting.
Lack of ECC RAM: Assuming 8% of DIMMs are affected by year, with 2 chips an average of more than 5 years until a failure happens can be expected. This most probably is lower than other glitches on such a board. If reliability of RAM is of concern, intensive RAM testing upon purchase is advisable.
|
|
|
|
|
|
|