Samba performance

29 27312
Have you guys tried running with NFS instead of SMB? I tried it yesterday and I was surprised by how much quicker it seems to be. Documentation also seems to support that the idea that overhead is really low. I even was able to copy files and play a Xivd movie using Mplayer.

Windows support is limited but for Mac OS X/Linux it would be a great speedy alternative.

I have to disagree regarding OS X completely. While you can use NFS in very special situations between Macs and Linux it isn't a good choice for general networking.

OS X implements each and every network protocol on a VFS layer and the one for NFS lacks essential features.

Up until now AFP is the protocol to choose (and netatalk the file sharing daemon to use). When Samba 4.2/4.3 will be out and integrated into Bananian correctly using the vfs_fruit module and you're using OS X 10.9 or above it's also OK to use SMB2.

NFS fails regarding encodings (uses the client encoding -- in OS X' case UTF-8 decomposed which noone else uses), the storage of ressource forks and Finder metadata, lacks reconnect support and so on.

When I tested drive throughput with dd I created files of 4-8gb size - so the impact of buffering/caching should not be too big.

I would give a real I/O benchmark tool like iozone a try since you can then measure the dependency of record/block sizes of transfers (some transfers will be pretty fast with a size of 4K while 512K will be painfully slow and so on). Please remember that USB 2.0 uses a very inefficient protocol which might be timing dependant and maybe some Samba tweaks allow you to further tune read/write settings when you know at which record sizes your USB storage performs best.

Compare with http://electronics.stackexchange ... wer-than-480-mbit-s for example.

BTW: Unfortunately I don't know that much about SMB/Samba tuning since I was able to ignore this stuff completely in the past (no Windows around -- only Unix/Linux/Macs ). But since Apple is moving from AFP to SMB2 (and implements some important proprietary SMB2 add-ons) this will change. If we talk again in half a year I might be able to provide 'ready to use' smb.conf settings as well as the appropriate TCP/IP system wide tunables (always consider buffer sizes and queue lenghts!)

Ok, I'm back and I had some time to further pursue this question.

Thanks to tkaiser for pointing me into the right direction. I reached a point now, where the read speed almost matches the write performance (~29MB/s reading and 29-30MB/s writing).

But a few steps back to help others understand, too.

1) I tested Samba 4.1 from from wheezy-backports (actually for reasons completely unrelated to the performance tests in this thread). Debian Wheezy is still on Samba 3.6 if you use the standard repositories.  But some simple performance tests showed that 4.1 actually performs much worse on Banana Pi. Transfer speeds were around 20MB/s but I didn't spend any time trying to optimize this. Just as a sidenote. And since 4.1 didn't help me with the other issue I was testing it for, I went back to Samba 3.6.

2) I checked the interrupts as tkaiser suggested. It seems almost all interrupts are handled by the first CPU core, including usb and eth0.
So, I ran these commands to assign some work to the second core:
  1. echo 2 > /proc/irq/$(cat /proc/interrupts | grep eth0 | cut -f 1 -d ":" | tr -d " ")/smp_affinity
  2. echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus
  3. echo 2 > /sys/class/net/eth0/queues/tx-0/xps_cpus
Copy the Code
And this resulted in getting read speeds of ~29MB/s which is almost the same than the write performance. That's a point where I think it's not worth spending additional time into further tweaking (this may be different for someone who uses sata instead of usb).

3) I'm using a different harddrive now (still USB, not SATA). hdparm in direct mode shows maximum read speads of 30-31MB/s. So, even if I'd test further tweaks, I'm almost at the limit here anyway.

@LaurensBER: My clients are mostly Windows 7. Therefore, and for better ontrol over security parameters, I prefer Samba over NFS.

And one more question to tkaiser: Why do you set smp_affinity for sw_ahci during boot as well? Do IRQs change after rebooting the system if you don't set this to a fixed value?



Reply 14# silentcreek

one more question to tkaiser: Why do you set smp_affinity for sw_ahci during boot as well?

Trial & error, that's all.

I still don't understand the whole thing since normally we do optimization stuff on a different hardware platform with totally different boundary conditions.

On x86 there exists an IO-APIC which is able to handle hardware interrupts and 'assign' them to specific CPU cores. The methods to balance IRQs across different CPU cores differ but it's common knowledge to better assign network IRQs to specific CPU cores manually (since cache misses are costly especially when you're running networks with 10 GbE or above). So if one uses irqbalance on x86 you ban the IRQs for your network adapters (--banirq/--banscript) and assign them manually via smp_affinity.

On ARM at least with kernel 3.4.x all hardware IRQs are handled by CPU0 by default. And irqbalance seems to be broken in two ways: Not working in terms of distributing IRQs across available CPU cores and having a memory leak in versions prior to 1.0.7 on platforms that do not have PCI available (like in our case with the Allwinner SoC).

The best way to distribute IRQs across CPU cores (which might have negative influence on power consumption) seems to be manually. And in fact it makes no difference whether you assign a smp_affinity of 3 (this would mean: use all CPU cores) or 1 (CPU0).

I did some more tests also assigning specific daemons (the Netatalk processes serving my MacBook Pro) to CPU1 and changing the scheduler class which further improved combined throughput (network/disk). But it seems one can gain more throughput more easily by tuning TCP/IP parameters (window sizes and queue lenghts) and slightly overclocking the Banana:
  1. echo -n 1200000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
  2. echo -n ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Copy the Code

I see, thanks.

Btw. I guess you noticed the interrupts caused by 'sunxi lcd0'. Did you happen to try to get rid of these? My Banana PI runs headless, so I'm wondering if these interrupts can be avoided. I only found this thread on cubieforums, but they haven't found a solution.

Reply 16# silentcreek

(lcd0 IRQs) ... Did you happen to try to get rid of these?

No idea at all since the fex modifications don't work. But this is something the LeMaker guys should be able to answer

Post Last Edited by silentcreek at 2014-10-9 16:29

Hey again,

I experimented a bit today. And as it turns out, the fex modifications DO work. Following the fex guide from sunxi, I successfully disabled the display output. So, I get zero interrupts from 'sunxi lcd0'
If anybody is interested, I put my fex file up on Pastebin:
Be advised: I also disbled some other stuff I don't use, like e.g. audio. The relevant parts for disabling display output should be disp_init, lcd0_para and hdmi_para.

I also set disp.screen0_output_mode=0 in my uEnv.txt, even though I haven't tested whether that's really necessary.



Reply 18# silentcreek

Good to know, thx for your efforts. Would you mind sharing the output of 'cat /proc/interrupts'?

Sure, I'll post it tonight, when I'm back home.

You have to log in before you can reply Login | Sign Up

Points Rules