Network problems (actually bad power supply)
#12
(10-14-2019, 08:20 AM)venix1 Wrote: Some additional notes, and a possible fix that doesn't involve an ATX power supply.  During the last week, I ran module 1 with a cron to reset the system clock from RTC by running /sbin/hwclock -s at 5 minute intervals.  The module would usually go down in 24 hours but has not.  Instead module 2 is the first and only one to go down.

The systems also run chronyd to maintain time and is set to synchronize to the RTC clock every 11 minutes.  This by itself was not sufficient. After adding the cron to module 2 both are on 2 days of uptime.  So it has had a positive effect.  Five minutes is arbitrary, I tried with 1 minute but it confused chronyd.  Five minutes has the affect of keeping the "Update Interval" to 60 seconds.  So a internal time server is probably recommended with this to avoid frequent polling of external servers.

Even with this, I still get the following errors which may be related to the underlying issue.  Those messages have only appeared on module 1 and 2 which so far have been the only devices to exhibit time jumps and network outages.
Code:
[Mon Oct 14 08:00:36 2019] rcu: INFO: rcu_sched self-detected stall on CPU
[Mon Oct 14 08:00:36 2019] rcu:         1-...!: (102 GPs behind) idle=23e/0/0x1 softirq=5005523/5005524 fqs=12 
[Mon Oct 14 08:00:36 2019] rcu:          (t=259866 jiffies g=12543365 q=28)
[Mon Oct 14 08:00:36 2019] rcu: rcu_sched kthread starved for 259842 jiffies! g12543365 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3
[Mon Oct 14 08:00:36 2019] rcu: RCU grace-period kthread stack dump:
[Mon Oct 14 08:00:36 2019] rcu_sched       I    0    10      2 0x00000028
[Mon Oct 14 08:00:36 2019] Call trace:
[Mon Oct 14 08:00:36 2019]  __switch_to+0x94/0xd8
[Mon Oct 14 08:00:36 2019]  __schedule+0x1e8/0x640
[Mon Oct 14 08:00:36 2019]  schedule+0x24/0x80
[Mon Oct 14 08:00:36 2019]  schedule_timeout+0x90/0x398
[Mon Oct 14 08:00:36 2019]  rcu_gp_kthread+0x550/0x8f8
[Mon Oct 14 08:00:36 2019]  kthread+0x128/0x130
[Mon Oct 14 08:00:36 2019]  ret_from_fork+0x10/0x1c
[Mon Oct 14 08:00:36 2019] Task dump for CPU 1:
[Mon Oct 14 08:00:36 2019] swapper/1       R  running task        0     0      1 0x0000002a
[Mon Oct 14 08:00:36 2019] Call trace:
[Mon Oct 14 08:00:36 2019]  dump_backtrace+0x0/0x1a0
[Mon Oct 14 08:00:36 2019]  show_stack+0x14/0x20
[Mon Oct 14 08:00:36 2019]  sched_show_task+0x160/0x198
[Mon Oct 14 08:00:36 2019]  dump_cpu_task+0x40/0x50
[Mon Oct 14 08:00:36 2019]  rcu_dump_cpu_stacks+0xc0/0x100
[Mon Oct 14 08:00:36 2019]  rcu_check_callbacks+0x594/0x780
[Mon Oct 14 08:00:36 2019]  update_process_times+0x2c/0x58
[Mon Oct 14 08:00:36 2019]  tick_sched_handle.isra.5+0x30/0x48
[Mon Oct 14 08:00:36 2019]  tick_sched_timer+0x48/0x98
[Mon Oct 14 08:00:36 2019]  __hrtimer_run_queues+0xe4/0x1f8
[Mon Oct 14 08:00:36 2019]  hrtimer_interrupt+0xf4/0x2b0
[Mon Oct 14 08:00:36 2019]  arch_timer_handler_phys+0x28/0x40
[Mon Oct 14 08:00:36 2019]  handle_percpu_devid_irq+0x80/0x138
[Mon Oct 14 08:00:36 2019]  generic_handle_irq+0x24/0x38
[Mon Oct 14 08:00:36 2019]  __handle_domain_irq+0x5c/0xb0
[Mon Oct 14 08:00:36 2019]  gic_handle_irq+0x58/0xa8
[Mon Oct 14 08:00:36 2019]  el1_irq+0xb0/0x140
[Mon Oct 14 08:00:36 2019]  arch_cpu_idle+0x10/0x18
[Mon Oct 14 08:00:36 2019]  do_idle+0x1d4/0x298
[Mon Oct 14 08:00:36 2019]  cpu_startup_entry+0x24/0x28
[Mon Oct 14 08:00:36 2019]  secondary_start_kernel+0x18c/0x1c8

Thanks for your help. Would you say running hwclock -s is a reliable temporary solution?
Those kernel messages are interesting though.

Also, would everyone mind continueing in the other thread, as the original problem of this one has been fixed Smile  (link)


Messages In This Thread
RE: Network problems - by Dreamwalker - 09-11-2019, 10:31 AM
RE: Network problems - by Unkn0wn - 09-11-2019, 12:45 PM
RE: Network problems - by Unkn0wn - 09-27-2019, 04:07 AM
RE: Network problems - by Dreamwalker - 09-17-2019, 10:38 AM
RE: Network problems - by Unkn0wn - 09-17-2019, 03:50 PM
RE: Network problems (actually bad power supply) - by Unkn0wn - 10-17-2019, 06:30 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Sad Version/Date of last armbian build that came with network patches? Bazmundi 0 255 12-07-2023, 03:23 PM
Last Post: Bazmundi
  Creating a current armbian-Image with network-fix clusterDude 14 24,204 12-12-2022, 01:30 AM
Last Post: langerma
  Clusterboard not getting IP address after network fix Norlark 14 12,682 08-30-2021, 05:00 PM
Last Post: poVoq
  ArchLinux Network Booting xblack86 2 3,905 02-25-2021, 08:42 AM
Last Post: xblack86
  sopine socket power problem cgiraldo 1 3,320 06-17-2020, 02:10 PM
Last Post: cgiraldo
  Clusterboard networking problems BryanS 25 30,241 03-31-2019, 04:06 PM
Last Post: aww
  Power Switch AZClusterboard 1 2,854 02-16-2019, 06:55 AM
Last Post: mdmbc
  Individual SOPINE Power On After Shutdown? Pine 2 3,869 01-30-2019, 08:04 AM
Last Post: mdmbc
  Question on the power resistors bergera 2 4,101 02-15-2018, 08:20 AM
Last Post: bergera

Forum Jump:


Users browsing this thread: 1 Guest(s)