05-10-2021, 12:41 AM
Unfortunately, none of those suggestions are going to help. The problem has nothing to do with timezone, and nothing to do with DRAM. There may be a relationship to CPU clocks or thermals, but if there is one, it is only a weak effect. The issue is a bug with the timer in the A64 SoC (all of them, to some degree -- a replacement board could be the same). The timer tends to jump backwards, which the kernel (incorrectly) interprets as it wrapping around, so the date jumps 2^56/24000000 seconds forward.
There is already a workaround in Linux to filter out those timer jumps. However, as you can plainly see, the workaround is insufficient. I have been aware of this for a while, due to reports from users like yourself; but I had so far been unable to trigger any further jumps on my handful of A64 boards/devices. So I didn't know how to improve the workaround. I recently improved my testing tools, which are now able to catch some jumps on my PinePhone that the kernel workaround misses.
However, the likelihood of any one timer jump causing the date to skip forward (and thus an RCU stall) is incredibly low. Since you have been experiencing many of these, it appears you have close to a "worst-case" A64 chip as it relates to this bug. This is helpful, as making the workaround work for you will ensure it covers as many chips as possible.
Can you try running the tool at https://github.com/smaeul/timer-tools and paste the output here? On the PinePhone, you should be able to build it with "make". Then please try running it for an hour or so: "for i in $(seq 30); do ./build/target/src/timer_test -Cc; done". You'll want to do this with the phone plugged in, since it will use quite a bit of CPU. If there are any lines that say "Failed after XXXXX reads", that means something got past the kernel's filter.
Thanks!
There is already a workaround in Linux to filter out those timer jumps. However, as you can plainly see, the workaround is insufficient. I have been aware of this for a while, due to reports from users like yourself; but I had so far been unable to trigger any further jumps on my handful of A64 boards/devices. So I didn't know how to improve the workaround. I recently improved my testing tools, which are now able to catch some jumps on my PinePhone that the kernel workaround misses.
However, the likelihood of any one timer jump causing the date to skip forward (and thus an RCU stall) is incredibly low. Since you have been experiencing many of these, it appears you have close to a "worst-case" A64 chip as it relates to this bug. This is helpful, as making the workaround work for you will ensure it covers as many chips as possible.
Can you try running the tool at https://github.com/smaeul/timer-tools and paste the output here? On the PinePhone, you should be able to build it with "make". Then please try running it for an hour or so: "for i in $(seq 30); do ./build/target/src/timer_test -Cc; done". You'll want to do this with the phone plugged in, since it will use quite a bit of CPU. If there are any lines that say "Failed after XXXXX reads", that means something got past the kernel's filter.
Thanks!