Hello,
I'm trying to understand why my pine64+ keeps hanging after 3 days or so.
I'm using it as Plex Server (direct play) and it's works very well but when it became unreacheable from the network and I suspect the board hangs. How can I debug the problem? which logs can help me?
Maybe a temperature problem?
Any hint would be great.
I've installed an ubuntu and plex as this guide says:
http://jez.me/article/plex-server-on-a-pine64-how-to. In fact I've a rpi-monitor installed too and sometimes the temperature is near 90ºC but normally is 70-75ºC in playing mode.
thanks in advance.
[
attachment=1052]
(09-18-2017, 05:17 PM)XaRz Wrote: [ -> ]I'm trying to understand why my pine64+ keeps hanging after 3 days or so ... when it became unreacheable from the network and I suspect the board hangs. How can I debug the problem?
Any hint would be great.
One simple thing you can do is to setup a heart-beat monitor ( blinking LED ) on the system LED pins ; directly next to the IR port. Use the code below:
sysled_heartbeat.sh
Code:
#!/bin/sh
echo $1 > /sys/class/gpio/export
echo out > /sys/class/gpio/gpio$1/direction
COUNTER=0
while [ $COUNTER -lt 10 ]; do
echo 0 > /sys/class/gpio/gpio$1/value
sleep .35
echo 1 > /sys/class/gpio/gpio$1/value
sleep .65
done
echo $1 > /sys/class/gpio/unexport
You can run this code as sudo with :
sudo ./sysled_heartbeat.sh 359
This assumes you have the system LED plugged in; the ballast resistor is built-in; use a 3mm low power LED ( white is nice )
The idea is simple, if the board hangs the light will stop blinking. The blinking light requires the OS to be functional to provide the ON|OFF sleep cycles.
I suspect your board is NOT hanging ( the light will prove that ). More likely your network connection has dropped for some reason... is this a wifi connection ? If so, sometimes in idle states the wifi will shutdown to conserve power and the connection will drop. Sometimes the eth connection will do the same; one way to put a stop to this temporarily while you're getting a handle on the problem is to setup a script that wakes up once every few minutes to send three pings to your router ( use crontab ). Another thing you can do is to send part ( or all ) of your dmesg log to another machine using scp every so many minutes ( say thirty ).
If your OS is hanging something is really wrong ( probably corrupted SD card ). My PineA64+ boards ( both of them ) run 24-7-365 ... I rarely reboot them and they have both been running for several months now. I have heart-beat monitors on all my boards, and I have a function monitor on my main server.
Thanks for all the information.
reading your ideas, I've remembered that I've one solution for determinate the problem more quicky: A spare 7" tft fot troubleshooting if its the board or the OS.
I'll post my investigations and my outcomes!
Thanks again!
Ok, now I'm fighting with cron to run a script for resseting the eth0 because I suspected that the problem was an eth0 dropping. The script is this:
Code:
#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/pere/bin
LOGFILE=/home/pere/network-monitor.log
if ifconfig eth0 | grep -q "inet addr:" ;
then
echo "$(date "+%m %d %Y %T") : Ethernet OK" >> $LOGFILE
else
echo "$(date "+%m %d %Y %T") : Ethernet connection down! Attempting reconnection." >> $LOGFILE
ifup --force eth0
OUT=$? #save exit status of last command to decide what to do next
if [ $OUT -eq 0 ] ; then
STATE=$(ifconfig eth0 | grep "inet addr:")
echo "$(date "+%m %d %Y %T") : Network connection reset. Current state is" $STATE >> $LOGFILE
else
echo "$(date "+%m %d %Y %T") : Failed to reset ethernet connection" >> $LOGFILE
fi
But debuggin why is cron not executting it, I've seen these lines in syslog:
Code:
21 10:51:18 pine64 kernel: [ 114.459270] Mali: Set gpu frequency to 144 MHz
Sep 21 10:51:19 pine64 kernel: [ 115.440931] CPU Budget:update CPU 0 cpufreq max to 1056000 min to 480000
Sep 21 10:51:19 pine64 kernel: [ 115.440967] CPU Budget hotplug: cluster0 min:0 max:4
Sep 21 10:51:19 pine64 kernel: [ 115.440981] gpu cooling callback set freq limit 360
Sep 21 10:51:19 pine64 kernel: [ 115.441037] Mali: Set gpu frequency to 360 MHz
Sep 21 10:51:19 pine64 kernel: [ 115.932942] CPU Budget:update CPU 0 cpufreq max to 1008000 min to 480000
Sep 21 10:51:19 pine64 kernel: [ 115.935187] CPU Budget hotplug: cluster0 min:0 max:4
Sep 21 10:51:19 pine64 kernel: [ 115.935201] gpu cooling callback set freq limit 144
Sep 21 10:51:19 pine64 kernel: [ 115.935256] Mali: Set gpu frequency to 144 MHz
Is there a heat problem with my pine64A+?
By the way , anyone can help me understand why cron is not executing this script?
I set up this with the command : sudo crontab -e
Code:
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
*/1 * * * * root /bin/bash /home/pere/bin/./network-monitor.sh
Ok solved the cron
no need of root in the crontab line if I just edit with sudo crontab -e.
Now what about my syslog? it's a heat problem?
Sep 21 18:54:01 pine64 CRON[2144]: (root) CMD (bash /home/pere/bin/network-monitor.sh)
Sep 21 18:54:19 pine64 kernel: [28631.085162] CPU Budget:update CPU 0 cpufreq max to 1104000 min to 480000
Sep 21 18:54:19 pine64 kernel: [28631.087406] CPU Budget hotplug: cluster0 min:0 max:4
Sep 21 18:54:19 pine64 kernel: [28631.087429] CPU Budget:update CPU 0 cpufreq max to 1056000 min to 480000
Sep 21 18:54:19 pine64 kernel: [28631.089643] CPU Budget hotplug: cluster0 min:0 max:4
Sep 21 18:54:19 pine64 kernel: [28631.089657] gpu cooling callback set freq limit 360
Sep 21 18:54:19 pine64 kernel: [28631.089713] Mali: Set gpu frequency to 360 MHz
Sep 21 18:54:20 pine64 kernel: [28632.069082] CPU Budget:update CPU 0 cpufreq max to 1104000 min to 480000
Sep 21 18:54:20 pine64 kernel: [28632.069116] CPU Budget hotplug: cluster0 min:0 max:4
Sep 21 18:54:20 pine64 kernel: [28632.069131] gpu cooling callback set freq limit 0
Sep 21 18:54:20 pine64 kernel: [28632.069188] Mali: Set gpu frequency to 408 MHz
Sep 21 18:54:22 pine64 kernel: [28634.037028] CPU Budget hotplug: cluster0 min:0 max:4
(09-21-2017, 10:55 AM)XaRz Wrote: [ -> ]Ok solved the cron
no need of root in the crontab line if I just edit with sudo crontab -e.
Now what about my syslog? it's a heat problem?
Usually all that is needed is a passive cooling device; generally a 14mm x 14mm aluminum heatsink with 3M thermal tape adhesive. All of my boards are being used as servers of one type or another , and all of them have active cooling; soft-pwm driven 5v brushless fan on either a PN2222 or 2N2222 transistor; a 4N35 optical coupler may be used as well.
The lack of passive cooling may or may not have anything to do with the eth drop; although, throttling can be a networking problem, not necessarily. Try the heatsink first ( any Raspberry PI heatsink will do ) and see what happens, then decide if you need active cooling also; if you do, I can help you set that up.
Thanks. I've ordered the parts.
I'll post my results here when they are applied!
Thanks for all!
Well, yesterday I applied finally the passive cooling device: 14x14mm aluminum heatsink with 3M thermal tape.
And the system hanged at 01:55AM with no one using plex (then no high temperature levels there). And I don't know how to know why my pine64 A+ keeps hanging.
Any hints?
Seems a False alarm!!!
6 days non stop and counting! Seems that cooling was the problem at the end.
Thanks!