(10-17-2019, 09:16 AM)venix1 Wrote:(10-17-2019, 06:30 AM)Unkn0wn Wrote: Thanks for your help. Would you say running hwclock -s is a reliable temporary solution?
Those kernel messages are interesting though.
Moved from link see for background details. Unfortunately, I've had varied results with hwclock -s. If it works it's better than nothing but it can throw the clock a few microseconds in either direction and software may not like that. However, while module 1 appears to be completely cured module 2 is still going down. If we look at the PSU as changing parameters, then I believe the attached serial wires on module 1 may be affecting this as well. If this is true, then it's very possibly a hardware issue with the board itself. Changing PSU and having dangling wires could change the noise and parasitic capacitance causing the SoC to misbehave. I'm not an EE and lack the tools to properly investigate that line of thinking.
I've updated module 2 to use hwclock -w; hwclock -s . This should minimize clock jumps by first saving the RTC and then loading it but it's only been 24 hours. Next time it goes down, I'm pulling the serial wires and moving them to module 2 and observing what happens.
I'm no EE either, but shouldn't the electrical noise from the PSU be filtered out? Anyway, I had varying results with what node went haywire first.
In your other comment you said this:
Quote:However, in my case the time jump results in a network outage so I believe both issues are symptoms of the same underlying root problem.
When one of my nodes is affected, it does remain accessible to the network. I'm able to SSH in and it still has a valid IP address. The OS more or less keeps working, just everything using time stops working (certificates, apt, ssl, kubernetes). Are you completely unable to access a node over the network?