NAS

Reliable temperature monitoring

56 25015
Edited by seblucas at Sun Nov 2, 2014 13:56

@tkaiser

You said you're using Bananian and Igor's image. I'm thinking for a few days to do that (mainly to test sunxi_ss properly and see if it's as efficient as the crypto module of the kirkwood). Is it as simple as I think :
* Install the new kernel in /dev/mmcblk0p1
* Update the modules /lib/modules

Is there something I'm missing ?

Thanks in advance.

tkaiser  
I don't know. I 'upgraded' Olimex' Debian image for their A20-Lime2 with Igor's kernel and it worked (but it was still way slower in many regards compared to Igor's or Bananian's kernel running on Banana Pi so I still assume there's something missing with hardware initialization in U-Boot).

I would go with a couple of 2 GB SD cards and give it a try. When you mix up too many things you get lost

tkaiser  
Edited by tkaiser at Mon Nov 3, 2014 04:15
T.S. replied at Sun Nov 2, 2014 12:23
One other way to monitor HDD temperature is using hddtemp directly.


In my tests that didn't work reliable when I put the disk (in my case a Samsung SSD) under load since then hddtemp won't get the temperature value from the disk within the short amount of time RPi-Monitor is willing to wait (that's one of the reasons I use some sort of a daemon which collects data from different probes and write the value to files which can then be read by RPi-Monitor. And this has its own disadvantages, especially permantly writing to the rootfs which might wear out faster if it's on a SD card with weak wear leveling algorithm)

May I ask which disk you use? Since I don't experience a thermal value of 0 being returned when my SSD is idle (and in case you're using WD green/blue you should take care that their default behaviour of aggressivly parking the heads might lead to an early death in server use cases -- compare with http://idle3-tools.sourceforge.net if in doubt).

And I would assume neither disk nor Banana Pi are in an enclosure? The PMU's thermal value and the temperature reported by the HDD (almost ambient temperature when idle) match perfectly (PMU 5-6° above ambient when the system is idle and no power hungry devices are connected via USB).

mouse  
tkaiser replied at Sun Nov 2, 2014 03:44
Sorry, no idea what got messed up

I bundled all the config stuff together and outlined the nec ...

thank you!
i will try another OS again.

T.S.  
And I would assume neither disk nor Banana Pi are in an enclosure? The PMU's thermal value and the temperature reported by the HDD (almost ambient temperature when idle) match perfectly (PMU 5-6° above ambient when the system is idle and no power hungry devices are connected via USB).
All is vertical mounted on a wall with "powerstrips", see picture:

Bpi with HDD mounted

Bpi with HDD mounted

T.S.  
May I ask which disk you use?
Its a TOSHIBA MQ01ABD100
  1. root@bpi:~# /usr/sbin/hddtemp -D /dev/sda1

  2. ================= hddtemp 0.3-beta15 ==================
  3. Model: TOSHIBA MQ01ABD100

  4. field(1)         = 0
  5. field(2)         = 0
  6. field(3)         = 202
  7. field(4)         = 92
  8. field(5)         = 0
  9. field(7)         = 0
  10. field(8)         = 0
  11. field(9)         = 243
  12. field(10)        = 0
  13. field(12)        = 54
  14. field(191)       = 0
  15. field(192)       = 1
  16. field(193)       = 141
  17. field(194)       = 26
  18. field(196)       = 0
  19. field(197)       = 0
  20. field(198)       = 0
  21. field(199)       = 0
  22. field(220)       = 0
  23. field(222)       = 211
  24. field(223)       = 0
  25. field(224)       = 0
  26. field(226)       = 13
  27. field(240)       = 0
Copy the Code
Since I don't experience a thermal value of 0 <snip>
It returns "drive is sleeping" or so and an empty string at the second line which is calculated by rpimonitor as '0'.

One other thing I have changed is moved the .rrd files to /var/log/rpimonitor/stat. Its in ram using Igor's image.

tkaiser  
Edited by tkaiser at Tue Nov 4, 2014 02:15

Thanks for the follow-up. Two things to mention regarding disk health:

1) if the disk is not really fixed then rotations might increase access times (you won't notice that using dumb pseudo benchmarks like dd/hdparm but you must use instead appropriate tools to measure access times -- there exists a funny video showing one of the performance/system analytics gurus shouting in a rack which immediately has an impact on the disks: http://www.datacenterknowledge.c ... in-the-data-center/)

2) I don't know about Toshiba and their head parking strategy. Would be a good idea to check with "hdparm -S /dev/sda" whether the drive supports the somewhat old fashioned APM modes and can be set to a less aggressive strategy if "smartctl -a /dev/sda" indicates  a high "Load_Cycle_Count".

And adjusting the path of RPi-Monitor's RRD databases to RAM/ramlog is always a good idea unless you do heavy load testing and are interested in system probes prior to a crash, freeze or emergency shutdown

tkaiser  
Edited by tkaiser at Wed Nov 5, 2014 08:06

After gaining some experiences with 2 DHT22 probes to measure temperatures I wanted to know more about the effectiveness of heatsinks using two different Banana Pi.

Unfortunately I had to discover that the probes that can be read out from the A20's internal thermal sensor seem to be somewhat calibrated and therefore it's almost useless to rely on them without further calibration (or disable calibration -- see below).

I measured temperatures in three different setups always using the same method ("cd /data && stress -t 900 -c 2 -m 2 -i 2 -d 2" where /data points to a Samsung EVO 840 SSD). Expected results when clocking the A20 at 816 MHz since I'm using SATA (then the PMU is not involved in powering the disk in contrast to using a Pi-powered USB disk):

- SoC temp difference between idle and end of test: between 6-9 °C (heatsink or not)
- PMU temp difference between idle and end of test: approx. 6 °C
- SSD temp difference between idle and end of test: approx. 10/11 °C

These probes are all internal. The green DHT22 graphs below represent ambient temperature and the second/purple DHT22 graphs are roughly the ambient temperature measured 3-4 cm above DRAM/CPU (I ran the Bananas vertically to ensure air flow)

The expected temperature deltas apply to all three tests but the reported absolute internal SoC temperature differs a lot:

1) 1st Banana Pi with SMD heatsink / thermal paste

SoC temp idle/stress: 42°/48°



2) 2nd Banana Pi without heatsinks

SoC temp idle/stress: 47°/55°



3) The very same 2nd Banana Pi a few minutes later with a different heatsink

SoC temp idle/stress: 23°/29° (that's a whopping 25°C lower than before)



An internal temperature of the main chip that is both below ambient temperature and the temperature measured a few centimeters away is impossible:



To me it seems that the A20 does some sort of internal thermal calibration when being initialized, maybe combined with a power on self test running at high speeds and measuring how fast the internal thermal sensor increases the temperature? And an applied heatsink leading to faster heat dissipation let the A20 believe it's colder outside than it is?

No idea, but I found in the A20's user manual a register called CHOP_TEMP_EN (Chop temperature calibration enable: 0: Disable, 1: Enable). But unfortunately I've neither an idea how this register is set by default nor how it can be read/set

But unless there's further investigation done on how the A20's thermal register can be read noone should trust the values extracted at all, especially when operating the A20 in a mode it is not designed for (using a heatsink which in fact helps in load situations but seems to influence the temperature reported from inside the A20)

Indeed, I understand better your post in linux-sunxi group.

tkaiser  
I think we made some progress. I've been completely wrong regarding the "calibration" of the A20's internal thermal sensor (for details see here). The method I used before (based on Heiko's work over at cubieforums cleared the CHOP_TEMP_EN (bit 7, 1 by default).

Preparing the registers with only bit 4 set produced strange readouts -- depending on an applied heatsink or not (see the graphs in this thread above):
  1. echo 'f1c25004:10' > /sys/devices/virtual/misc/sunxi-dbgreg/rw/write;
Copy the Code
If I also set bit 7 back to its default (Chop temperature calibration: enable = 1) by using this instead:
  1. echo 'f1c25004:90' > /sys/devices/virtual/misc/sunxi-dbgreg/rw/write;
Copy the Code
I get temperature values that look way better:

This is my 1st Banana Pi with SMD heatsinks (first run 30 minutes, second 15 min.):




This is my second Banana Pi with a different heatsink:



The purple graphs (temperatures measured a few centimeters above SoC/heatsink) can not be compared directly due to different positions of the DHT22 sensors.

But the 'big picture' looks way better now: The idle/load temperatures of the SoC are approx. 10°/16° (1st Banana) and 12°/19° (2nd Banana) above ambient temperature which is an indicator that this combination of SMD heatsink and thermal paste outperforms a more simple heatsink.

I've one A20 board left without heatsink: My A20-OLinuXino-LIME2. Will try temperatures this weekend with both \x10 and \x90 register settings and with and without heatsink and will then report back.

You have to log in before you can reply Login | Sign Up

Points Rules