Bananian

SAMBA CPU usage high

21 10183
Suman  
Hi all,

I had run some samba performance test with a 5.5 GB file transfer in the following topology


                                              Link 1              Link 2
Banana PRO Samba Server  ------> Router --------> Windows 7 client PC

The router is Asus RT-AC68U (WLAN network is running in only 802.11n Wifi mode). The Banana-PRO has a  Hitachi 160 GB SATA 2 drive attached. It runs the latest Bananian Linux v15.01

The results were:
(1) If both Link1 and Link2 are Wifi, I get a very poor performance of 2 MB/S write and 3 MB/s read, the CPU usage (reported by Top) of smbd is around 10%
(2)  If link 1 is GbE, but link 2 is Wifi, their is a improvement to 4.2 MB/s write and 6 MB/s read, the CPU usage of smbd is around 20%
(3) If both Link 1 and Link 2 are GbE, then I get 31 MB/s write and 37 MB/s read, which is almost inline with results reported in other threads. But smbd CPU usage is 85-95% in this case

Obviously my 2. 4 Ghz Wifi network is a huge problem possibly due to interference from neighboring WLAN networks in the apartment complex  (Its quite congested if I see Android Wifi Analyzer output)

Are other folks getting similar results for case (3) ?  And whether their is way of tuning samba to reduce the CPU usage ?

I want to be able to run other daemon services in parallel (like deluge, sabnzbdplus, ssh sessions etc) in parallel with occasional unpredictable samba file server access). This looks like I will either lose out on SAMBA performance or cause some temporary disruption to other daemon service.

Regards
Suman
tkaiser  
Edited by tkaiser at Wed Apr 1, 2015 08:34

Please post the output of
  1. cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
  2. cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Copy the Code
The values you reported sound normal (the A20 SoC has two CPU cores so I would suspect that one was busy serving smbd and the other not that much doing low level stuff like network packet processing and SATA).

You can have a look at some possible tuning (cpufreq, network settings, process scheduler) outlined here: http://forum.lemaker.org/forum.php?mod=viewthread&tid=7102

And in this thread there are some smb.conf settings mentioned that help with performance and CPU utilisation: http://forum.lemaker.org/forum.php?mod=viewthread&tid=5802

BTW: Would be interesting to get a 'bigger picture' if you could provide the output of HELIOS Lantest since this does not only measure sequential transfer speeds but other performance relevant actions as well (the results posted here by others, for example in this thread, aren't that great between Windows and Samba on the Banana Pi/Pro)

Suman  
(1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq  output is 1008000
(2) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor  output is  ondemand
(3) I have incorporated the samba tune settings in  http://forum.lemaker.org/forum.php?mod=viewthread&tid=5802. The global section of my /etc/samba/smb.conf file looks like:

    netbios name = bananapro
    server string = Banana PRO Single Board Computer
    workgroup = J602
    #usershare allow guests = yes
    #security=share
    security=user
    load printers = no
    printing = bsd
    printcap name = /dev/null
    disable spoolss = yes
    log file = /var/log/samba/samba.log
    syslog=0
    max log size = 100
    dns proxy = no
    socket options = TCP_NODELAY IPTOS_LOWDELAY
    use sendfile = yes
    aio read size = 16384
    aio write size = 16384
    follow symlinks = yes
    wide links = no
    unix extensions = no
    lock directory = /var/cache/samba

I will try out the other stuff you suggested.

tkaiser  
Edited by tkaiser at Wed Apr 1, 2015 09:49
Suman replied at Wed Apr 1, 2015 08:49
(1) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq  output is 1008000


Ok, in case you can ensure appropriate airflow on the SoC's surface then you might be able to adjust this to 1200000 (you should ensure using eg. the 'stress' utility that your Banana Pro is able to operate at this speed for at least an hour as outlined here). The difference between A20 without cooling in an enclosure with bad thermal design and a good heatsink and enough airflow possible might be above 10°C)

And it might help others if you could provide LanTest results prior to "tuning" (I would call it adjust settings accordingly) and after that here.

Suman  
The results of the LANTest tool on the un-tuned system are shown below:

LanTest_Report_BPRO_NoTuning.jpg

The results are not very different to that I got with the 5.5 GB transfer test., but I noticed that for the 300 MB Read and write tests done by this tool, the CPU usage stays in the 50-70% band.

Currently I put the Banana PRO and the SATA HDD in the standard Acrylic case.  I see very little clearance between the Board and the Case body on the lower side to be able to install even  a passive heatsink and rather questionable airflow.  So I do not think overclocking is a very good idea till I can find (assemble) a better case. Besides I am trying to use this board as an always ON device (24x7x365) and I am unsure of what overclocking will do on the life of the hardware.

tkaiser  
Edited by tkaiser at Thu Apr 2, 2015 00:16
Suman replied at Wed Apr 1, 2015 20:36
The results of the LANTest tool on the un-tuned system are shown below:


The different LanTest settings do not only affect the amount of data but the block size as well: http://www.helios.de/web/EN/support/TI/157.html (the block size plays an important role and there's also explained how these synthetic benchmarks should be compared to real world scenarios.

And regarding enclosures: Unfortunately all the commercially available are crap since they ignore the thermal challenges of Banana Pi/Pro with all the hot stuff on the lower side of the PCB. Please compare the temperatures of this 'solution' with the values I got when using heatsink/convection http://forum.lemaker.org/forum.php?mod=viewthread&tid=9677&

Fun fact: SinoVoip's BPi-M1+ will fit far more perfectly in LeMaker's 'Banana Pro' enclosure since the relevant chips are on the upper side of the PCB.

tkaiser  
Edited by tkaiser at Thu Apr 2, 2015 04:28
Suman replied at Wed Apr 1, 2015 20:36
The results of the LANTest tool on the un-tuned system are shown below:


Regarding write speed: You got a variation of results between 24 and 38 MB/s so there's something wrong or let's better say potential improvements possible since the average result is closer to the lower mark than the upper. Personally I would continue with individual measurements of storage and network (iozone/iperf) to get a clue where the bottleneck is. And adjust network settings since this really helps (the defaults aren't suitable for GBit networking)

Regarding "overclocking": In fact there's no such thing because the A20 SoC has no defined upper limits. It's not an Intel CPU that will be selled in different configurations for hundreds of bucks and where they individually test each and every CPU and GPU core on the die for its limits and will later on decide which CPU will be sold under which name and with which features for different prices. It's a really cheap SoC where you as the end user has to do this test or otherwise don't try to push your device to the limits.

The LeMaker guys shipped their first OS images with fantasy CPU governor leading to the Banana Pi operating most of the times in single threaded operation with just 336 MHz (explained in detail here). They then adjusted that and now with LeMaker OS images the Banana Pi might increase CPU clock up to 912MHz (while some people complained about 'overclocking' ).

The Bananian makers decided to increment this to 1008.

And if you're able to ensure enough airflow then it might be safe to use 1200 MHz (some people reported even more but that might work with their A20 and not with yours). But since there exists no spec claiming the maximum CPU clock all this is based on experience. Same with DRAM. The LeMaker guys started with a DRAM clock of 480 MHz (makes a difference especially regarding I/O -- both network and SATA performance scale almost linearly with DRAM clock) but reduced this later to 432 MHz to be on the safe side (and while the majority of Banana Pi's might run flawlessly with 480 MHz DRAM clock this seems to be a good decision since data integrity matters and crashes arent't worth the few percent more performance).

The linux-sunxi wiki outlines in detail what's necessary to get a clue where the hardware limits are: http://linux-sunxi.org/Hardware_Reliability_Tests.

And there is a relationship between expected lifespan and higher clock rates: The voltage needed (the higher you clock the SoC the more voltage is needed, have a look at "dvfs_table" eg. in the Banana Pro's official fex file). I read in some developer conversations that more voltage increases stability but might decrease lifespan of the SoC. But since you're using the ondemand governor the SoC will be operated with the lowest configured voltage most of the times and only when there's peak performance needed will increase voltage to the upper limit.

And maybe dynamic switching CPU clock (and voltage) might help increase battery life (the A20's target area are tablets) it might also decrease the lifespan of the SoC so staying with just one clock speed and voltage might increase lifespan? Nobody knows. But it can be taken for granted that a long lifespan of the SoC wasn't the primary design goal of this chip design.

Suman  
The scaling_max_freq setting for some other linux computers I have access to:

x86 PC (Intel 3.4 Ghz Core i7-2600K, Ubuntu 14.04.02) --->  3401000
Bealgle Bone Rev. C (TI AM3358 Sitara 1 Ghz CPU, Debian) ---> 1000000
Odroid C1 (AmLogic S805 1.5 Ghz CPU, Lubuntu 14.04) ---> 1536000
raspberry Pi 2 Mod B (900 Mhz Broadcom CPU, Raspian) ---> 900000

So what does this setting actually mean ?  What does it do ?

If I look at the above data, I think it represents the max. CPU clock frequency as rated by the SoC/CPU vendor. A20 is 1 Ghz, and hence a value of 1008000 on Bananian for Bpro looks reasonable & inline with other systems to me.

Suman  
tkaiser replied at Wed Apr 1, 2015 08:32
Please post the output ofThe values you reported sound normal (the A20 SoC has two CPU cores so I wo ...

You suspicion is correct. I just noticed that  with the IRIX  Mode = ON, the CPU usage of a Samba file transfer is 90-95% and with IRIX Mode=OFF, the CPU usage is around 45%.This clearly means that only one cores is being used in the Samba File transfer task.

I could run a big loop running a computational in a small  C single-threaded program to keep 1 core 100% busy, while parallel I could start the file transfer using the other core and still get more than 24 MB/S write and 32 MB/s read. So maybe its possible to have two parallel samba file transfer sessions running and achieving such transfer rates simultaneously on both.

Apologies to everyone for creating confusion.

While its not critical for what i am doing, Maybe i could still see if some SAMBA tuning can still lower some CPU usage or better if we can still jump the file transfer rate for a single client by somehow using both the cores on the server side. At present it seems rate limited by the single core CPU speed.

And definitely as you say, any level of frequency tuning upwards (1200) will most likely jump performance.


Suman  
The iperf client-server throughput test on topology (3) [All links on GbE] returns results in 380-400 mbit/s  range. When this test is underway, the CPU usage peaks to 60% on one core (running iperf server) and around 30% on another (ksoftirqd). Effectively 50 MB/s max. This should be the theoretical limit of how much data you can move too and fro from this board using GbE. And I already am able to move 37 MB/s using SAMBA in the read flow.

I have seen faster SAMBA transfer rates using my FreeNAS server on my GbE network. I would say maybe this board will struggle to move beyond 45 MB/s (Read or write) even if we are able to do perfect tuning.

You have to log in before you can reply Login | Sign Up

Points Rules