Bananian

Samba performance

36 21439
Hi guys,

I'm interested in what kind of read and write speeds you get with Bananian.

I set up a simple samba fileserver (the standard samba 3.6 package from the Debian Wheezy repositories) with my Banana Pi and a 2.5" USB harddrive attached (filesystem is ext4). The performance is good, but I'm surprised to see the read performance falls behind the write performance.

So, these are the values I get for sequential reads and writes from a Windows 7 client to my Banana Pi (read means the Windows client is reading data from the Banana Pi samba server):
Run 1:
Write: 29.6 MB/s
Read: 23.9 MB/s

Run 2:
Write: 30.0 MB/s
Read: 24.2 MB/s

Considering the Banana Pi only features USB2.0, the write performance to the external drive is great. But I'm wondering why the read performance falls 20% behind. So, I'm interested what kind of performance you guys achieve. On most boxes I've used so far, the read performance was better than write speed.

I tried to tweak the performance by playing with the options in smb.conf. However, whenever I try to set different values for SO_SNDBUF and SO_RCVBUF the performance decreases dramatically, something like 6MB/s. I tried different values from 8192 to 262144. And btw, I also tested the performance of the drive itself with dd which gives me transfer rates of 34-35MB/s reading and writing. I also tested a different drive, but that didn't chance anything.

My current options, that gave me the best performance so far, are these in the global section of smb.conf:
  1. workgroup = MYWORKGROUP
  2. server string = MyServer
  3. security = user
  4. load printers = no
  5. printing = bsd
  6. printcap name = /dev/null
  7. disable spoolss = yes
  8. log file = /var/log/samba/samba.log
  9. syslog = 0
  10. max log size = 100
  11. dns proxy = no
  12. socket options = TCP_NODELAY IPTOS_LOWDELAY
  13. use sendfile = yes
Copy the Code
With which options and values to you achieve your best samba performance?


Regards,

Timo
tkaiser  
Post Last Edited by tkaiser at 2014-9-27 00:25

Post Last Edited by tkaiser at 2014-9-27 00:23

You should take into account that you're measuring two different things at once with your approach: Network throughput and file system throughput. To get a clue when results aren't as expected you should measure both things individually (and always have a look at CPU utilization).

'Pure network' speed can be measured with iperf (simply use the Debian package and iperf.exe for Windows). You might notice a dependency with the CPU scaling governor (set to performance for maximum throughput instead of ondemand). Also keep in mind that Samba acts single-threaded on a 'per client' base and that efficient file access has been implementet optionally in USB starting with 3.0 in form of UASP (USB Attached SCSI protocol).

And regarding 'the Banana Pi only features USB2.0'... mine features a SATA port too (48 MB/sec write and 120 MB/sec read, less overhead compared to USB's dumb mass storage protocol) ;-)

You should take into account that you're measuring two different things at once with your approach: Network throughput and file system throughput. To get a clue when results aren't as expected you should measure both things individually (and always have a look at CPU utilization).

I measured the network throughput with iperf in both direction. I get around 40MB/s. So this shouldn't be the bottleneck.

CPU utilization is interesting: When reading from a samba share the CPU utilization is actually much lower than when writing. So, this doesn't explain the difference either. When I write to a share, the cpu utilization of smbd is around 90-100%. When I read from the share it's only 50-60%. (Btw, I tested cpu utilization before when I enabled the "use sendfile" parameter. When it's turned off, the cpu utilization of smbd is higher when reading files, but the read performance is similar.)

And regarding 'the Banana Pi only features USB2.0'... mine features a SATA port too (48 MB/sec write and 120 MB/sec read, less overhead compared to USB's dumb mass storage protocol) ;-)

Sure. But I'm talking about a USB hard drive. Banana Pi only offers USB2.0, but my drive is USB3.0. So there is a bottleneck here obviously. And, no, I cannot take the drive out of the enclosure to attach it to the SATA port directly, because it's one of these drives where the USB port is soldered onto the harddisk itself. But the USB2.0 does not seem to be the problem here, since the write performance shows, it could be faster.

Update:

I'm getting closer. Adding the following lines to smb.conf helped me increase read speed by about 10%:
  1. aio read size = 16384
  2. aio write size = 16384
Copy the Code
So with these lines added I get 26.5MB/s read speed (tested in two runs). Looking good already, but I'm curios whether it's possible to further close the gap.

I also tried the min receivefile size option, but that had a slightly deteriorating effect, so I removed it again.

If anyone has more experience with further tweaking, I'd be glad to hear.

Thanks,

Timo

tkaiser  
Post Last Edited by tkaiser at 2014-9-30 02:41
I get around 40MB/s


With iperf I get stable 470-480 MBits/sec when one single CPU core is utilized 100%. I'm currently in the process comparing Bananian and Igor's new Banana Debian image. Bananian performs better up to now and interestingly sometimes the second CPU core jumps in when reading data from the BananaPi (with Igor's image that doesn't happen at all when traffic goes this direction: iperf results)

There are some tunables (smp_affinity and cpufreq stuff) but the most important part seems to be scheduling across CPU cores. Currently no idea how to improve this.

Hmm... I played with the scaling governors and max frequency a bit. Setting the frequency up to 1.2GHz improved the write performance (~32MB/s) but the read performance is the same, oddly enough. But maybe the problem is not so much the frequency but rather why samba is not utilizing one core 100% (as I said during reads the CPU usage is significantly lower than during writes). The scaling governor had no impact on my Samba performance. (I didn't measure network throughput this time.)

I still have to look into smp_affinity. Haven't tested it yet.

tkaiser  
Setting the frequency up to 1.2GHz improved the write performance (~32MB/s)


What is the real throughput from/to your USB disk ('measuring' with dd is somewhat problematic because unless you use really large file sizes and 'oflag=direct' you're testing buffers/caches instead of disk I/O). I would simply use "cd $disk && iozone -a" with different record sizes.

When you had a look at smbd's CPU utilization were there other processes creating high load (thinking about the gmac driver handling the interrupts)?

BTW: Since you experienced performance decreases while increasing SO_SNDBUF and SO_RCVBUF you might want to have a look with sysctl for system wide TCP/IP tunables eg. net/core, net/ipv4 and of course /proc/sys/kernel (a good starting point for 1G links is http://www.softpanorama.net/Comm ... rmance_tuning.shtml)

tkaiser  
BTW: Regarding CPU utilization dstat might be worth a look. Running both "htop" and "dstat -cdnpmgs --top-bio --top-cpu --top-mem" while performing tests might give a better impression what's going on.

tkaiser  
Just a final note (since I'm giving up tweaking around).

I would check the IRQ balance: "cat /proc/interrupts" and in case eth0 and your USB controller share the same CPU then adjust this.

Since I'm using a SATA SSD I tried to set SATA on CPU 0 and Ethernet on CPU 1 and set the scheduler and CPU frequencies to performance in /etc/rc.local:
  1. echo 1 > /proc/irq/$(cat /proc/interrupts | grep sw_ahci | cut -f 1 -d ":" | tr -d " ")/smp_affinity
  2. echo 2 > /proc/irq/$(cat /proc/interrupts | grep eth0 | cut -f 1 -d ":" | tr -d " ")/smp_affinity
  3. echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus
  4. echo 2 > /sys/class/net/eth0/queues/tx-0/xps_cpus
  5. echo -n 1200000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
  6. echo -n 1200000 >/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
  7. echo -n performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  8. echo -n performance >/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
Copy the Code
I also tuned network buffers a bit:
  1. sysctl -w net/core/rmem_max=8738000
  2. sysctl -w net/core/wmem_max=6553600
  3. sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
  4. sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
  5. sysctl -w vm/min_free_kbytes=65536
Copy the Code
And then set the 3 processes serving my client (I'm not using Samba but Netatalk) to use CPU 0 and real time scheduling:
  1. taskset -p 01 $PIDs
  2. ionice -c1 -p $PIDs
Copy the Code
And especially helpful to increase the txqueuelen (approx. 2.5 MB/sec more):
  1. ip link set eth0 txqueuelen 10000
Copy the Code
Before (no tuning at all, Bananian defaults):



And after:



I believe you should watch for possible IRQ collissions (network/USB) and load distribution between the CPUs using htop.

Thanks for all this input and hints. I will be on a trip until next week, so I don't have time (or access) to look into it now.
Just two notes: When I tested drive throughput with dd I created files of 4-8gb size - so the impact of buffering/caching should not be too big.
When I looked at cpu utilization during samba transfers the last time, the only other notable process was usb-storage, but the cpu utilization during reads stayed below 20% and during writes it was around 10% if I remember correctly.

But I will perform further tests when I'm back.

You have to log in before you can reply Login | Sign Up

Points Rules