NAS

Reliable temperature monitoring

56 25088
tkaiser  
pikitsan replied at Wed Jul 29, 2015 00:35
Great! Could you please push the changes to Rpi Monitor github?

[x] done: https://github.com/XavierBerger/RPi-Monitor/pull/110

I also added a simple systemd service for the new temperate daemon (without knowing what I'm doing) and a readme.

tkaiser replied at Wed Jul 29, 2015 10:05
[x] done: https://github.com/XavierBerger/RPi-Monitor/pull/110

I also added a simple systemd servi ...

I confirm that it works under archlinux!

I also edited the following two lines [1] [2] and added square brackets, to remove the "unary operator expected" message i got.
  1. if [[ ${1} -lt 0 ]]; then
  2.         echo -n 0
  3. elif [[ ${1} -gt 100000 ]]; then
  4.         echo -n 100000
Copy the Code

tkaiser  
pikitsan replied at Wed Jul 29, 2015 11:02
I confirm that it works under archlinux!

I also edited the following two lines [1] [2] and added  ...

Can you please elaborate in which situation you get the errors? Please try running with
  1. /bin/bash -x /usr/share/rpimonitor/scripts/sunxi-temp-daemon.sh 2>&1 | curl -F 'sprunge=<-' http://sprunge.us
Copy the Code
press [ctrl]-[c] after 10 seconds and supply the sprunge URL (or use copy&paste and pastebin.com or the like :-) )

Edited by pikitsan at Thu Jul 30, 2015 00:06

It happens when i start the systemd service, or when i run the script from the terminal. Here is a log http://fpaste.org/249654/

tkaiser  
Edited by tkaiser at Thu Jul 30, 2015 01:00
pikitsan replied at Wed Jul 29, 2015 23:16
It happens when i start the systemd service, or when i run the script from the terminal. Here is a l ...

Thanks a bunch. Silly me, I forgot to return from the function if an empty value is supplied. Should better look like
  1. SanitizeValue() {
  2.         # return empty values as empty and keep thermal values in the range of 0°C-100°C
  3.         if [ "X${1}" = "X" ]; then
  4.                 return
  5.         fi
  6.         if [ ${1} -lt 0 ]; then
  7.                 echo -n 0
  8.         elif [ ${1} -gt 100000 ]; then
  9.                 echo -n 100000
  10.         else
  11.                 echo -n ${1}
  12.         fi
  13. } # SanitizeValue
Copy the Code
or even better
  1. SanitizeValue() {
  2.         # keep thermal values in the range of 0°C-100°C
  3.         if [[ ${1} -lt 0 ]]; then
  4.                 echo -n 0
  5.         elif [[ ${1} -gt 100000 ]]; then
  6.                 echo -n 100000
  7.         else
  8.                 echo -n ${1}
  9.         fi
  10. } # SanitizeValue
Copy the Code
But I will rework the disk section anyway since it's not necessary to query (non existing) disks every 5 seconds since their temperatures don't change that often. If /sys/block/sdN isn't existing the whole check will be skipped and otherwise executed only every 30 or even 60 seconds. Maybe keeping the result of $(( $(date "+%s") / 60 )) in a var and only execute the function if it increased by 1.

EDIT: Would be great if you could give it a try and report back: http://pastebin.com/F71PpeYF

Great! The only problem i found is the smartctl command for USB external disks (line 101).

The command returns two results which fails and shows always the "SanitizeValue" 100°C., e.g:
  1. /usr/sbin/smartctl -d sat -a /dev/sda  | awk -F" " '/Temperature_Cel/ '
  2. 190 Airflow_Temperature_Cel 0x0022   050   038   045    Old_age   Always   In_the_past 50 (Min/Max 40/50)
  3. 194 Temperature_Celsius     0x0022   050   062   000    Old_age   Always       -       50 (0 15 0 0 0)
Copy the Code

So we only need the value "Temperature_Celsius":
  1. /usr/sbin/smartctl -d sat -a /dev/sda  | awk -F" " '/Temperature_Celsius/ '
  2. 194 Temperature_Celsius     0x0022   050   062   000    Old_age   Always       -       50 (0 15 0 0 0)
Copy the Code

Changing the line 101 i have now the correct temperature.
  1. /usr/sbin/smartctl -d sat -a /dev/sda  | awk -F" " '/Temperature_Celsius/ {printf ("%0.0f",$10*1000); }'
Copy the Code


tkaiser  
Edited by tkaiser at Thu Jul 30, 2015 08:37
pikitsan replied at Thu Jul 30, 2015 07:37
Great! The only problem i found is the smartctl command for USB external disks (line 101).

The comm ...

Well, this line is commented for two reasons:

1) smartctl always wakes up disks that would otherwise be in standby/sleep mode (if your disk enclosure supports SAT you might be able to configure this using "hdparm -S" even over USB!)

2) Choosing the correct SMART attribute is not easy. With your change my EVO 840 couldn't be read out (since it only supports 190 but not 194)

So it's still left as an excercise for the reader/user. Even "-d sat" wouldn't work everywhere. I try to address this problem using README_sunxi.md which I corrected in the meantime: http://kaiser-edv.de/tmp/PdjVxY/README_sunxi.html

BTW: I fixed a load of other stuff (eg. smoothing temperature graphs for SoC/PMU, less frequent disk checks, configurable check interval and so on) in the meantime: New temp daemon source here: http://pastebin.com/9NXQ0HJY

And the whole package updated (there was also a bug in the config which prevented the disktemp RRD from being created):

http://kaiser-edv.de/downloads/sunxi-monitor.tar.gz (MD5: 23355b03c551039e62540dda6e40e700)

WARNING: I changed the representation of temperature values (former versions used degree * 1000 now just * 10) to get just one decimal place when displaying values (I also introduced a mechanism to 'smoothen' the graphs a bit which really helps especially with the PMU values). So you've to exchange both daemon and config (It's pretty simple and outlined here). You won't need to trash the RRD databases since the new config creates them freshly with slightly different names.

tkaiser  
pikitsan replied at Thu Jul 30, 2015 07:37
Great! The only problem i found is the smartctl command for USB external disks (line 101).

The comm ...

BTW: Have a look how the function to read out temperatures from disks looks now: http://pastebin.com/pxxKaTQ4

I hope the comments together with the README address all needs (obviously for more experienced users only)?

Looks good to me, and the comments are straightforward for someone who needs help.

The only thing i might add, would be a statement to check whether there're disks on the system. When there's no disk, the graph returns a "File stat/null.rrd is not a valid RRD archive" message "freezing" the statistics page. It would be nice to return a value to /tmp/disktemp for that case, even if there's no disk.

tkaiser  
Edited by tkaiser at Fri Jul 31, 2015 02:02
pikitsan replied at Fri Jul 31, 2015 01:05
When there's no disk, the graph returns a "File stat/null.rrd is not a valid RRD archive" message "freezing" the statistics page

Good catch. I thought RPi-Monitor would interpret an empty value as undefined. Can you please give it a try replacing
  1. touch > "${MyTempDir}/${file}"
Copy the Code
with
  1. echo -n "0" > "${MyTempDir}/${file}"
Copy the Code
in the temp daemon and report back? Then disk temperature will always be shown as 0 but more importantly statistic graphs should work.
And don't forget to restart the daemon:
  1. pkill -f '/bin/bash /usr/share/rpimonitor/scripts/sunxi-temp-daemon.sh' && (cd /tmp && nohup /usr/share/rpimonitor/scripts/sunxi-temp-daemon.sh & )
Copy the Code

I'm currently thinking about providing two config files: Renaming the current axp209_cpu_pmu_temp.conf to axp209_cpu_pmu_temp_r1.conf and creating a new axp209_cpu_pmu_temp.conf suitable for 'normal' A20 devices.

The main differences on Lamobo R1 are:

  • SATA disk also fed by/through the PMU therefore more load on the PMU
  • The crappy Micro-USB connector being able to just transport 1.8A/5V becoming a bottleneck very easily when USB peripherals are connected
  • it's best to feed the beast through the LiPo battery connector (since then it doesn't suffer from voltage drops as it happens with many USB cables. And the 1.8A limitation has gone)
  • no way to read out the voltage when feeding the board through the LiPo battery connector if the power source isn't a battery

For 'normal' sunxi devices it might be more helpful to see also the voltage on the status page since this is one of the main problems with Banana Pi/Pro and other boards that use Micro-USB to power the board: voltage instabilities.

This is the very same test (" cd /mnt/sda/ && stress -t 900 -c 2 -m 2 -i 2 -d 2"), first time running when the R1 was powered through the LiPo connector and afterwards through the crappy Micro-USB power-in (consumption peaks and voltage drops down to 4.7V -- the PSU provides 5.0V and with just 20 cm cable and Micro-USB we loose between 0.1-0.3V idle/load)

Powered through battery connector:



Powered the crappy way through Micro-USB connector:







You have to log in before you can reply Login | Sign Up

Points Rules