![]() |
Best way to avoid SMP internal errors when building RAID? - Printable Version +- PINE64 (https://forum.pine64.org) +-- Forum: ROCKPRO64 (https://forum.pine64.org/forumdisplay.php?fid=98) +--- Forum: Linux on RockPro64 (https://forum.pine64.org/forumdisplay.php?fid=101) +--- Thread: Best way to avoid SMP internal errors when building RAID? (/showthread.php?tid=13242) |
Best way to avoid SMP internal errors when building RAID? - kuleszdl - 02-26-2021 Hi, as some of you might know there is a known issue with handling of PCIe errors on the RP64 as discussed here: https://forum.pine64.org/showthread.php?tid=8374 https://forum.pine64.org/showthread.php?tid=6329 I am getting this error when I try to build/rebuild my RAID for the first time. Sometimes I am lucky and it works, but most of the time it does not and I don't have much confidence in putting my backup on a machine with a malfunctioning PCIe interface. If it happens, I see entries like these in the logs: Code: kernel:[ 658.490457] Internal error: synchronous external abort: 96000210 [#1] SMP As I am still encountering these issues when running the latest Debian unstable kernel, I fear that this issue won't be fixed in the upcoming Debian stable either (because it's a hardware issue and not Debian's fault). Now, I wonder what the best workaround could be. Recompiling the kernel with the hack discussed in said thread seems to work, but it's not a longterm solution if we want to get regular security updates for our kernel without the need for manual patching and recompiling... Is there any workaround we could apply in software to avoid these issues? I would be happy with anything, even at the cost of performance like disabling all but one CPU core etc. Thank you! I tried the most radical approach and completely disabled SMP by adding the following kernel command line parameter: Code: nosmp As a result, my RP64 now runs with only one cortex a53 core. Yet, the performance seems to be enough to build the RAID: Code: %Cpu(s): 0.7 us, 44.6 sy, 0.0 ni, 49.4 id, 0.0 wa, 0.0 hi, 5.2 si, 0.0 st At least, I didn't get any SMP error yet, so I am optimistic this will work... Suggestions for less drastic and more performant workarounds (e.g. enabling one of the A72 cores) welcome! Bad news and correction - I encountered the same issue now even with SMP disabled :-( Code: kernel:[ 922.683235] Internal error: synchronous external abort: 96000210 [#1] SMP Details in dmesg: Code: [ 922.696664] CPU: 0 PID: 171 Comm: scsi_eh_1 Not tainted 5.10.0-3-arm64 #1 Debian 5.10.13- RE: Best way to avoid SMP internal errors when building RAID? - kuleszdl - 02-26-2021 One more update: After I patched dtb file that uses the faster gen2 link speed, my resync is progressing since a half hour without issues so far (before, it usually failed after aprox 5-10 minutes). Marking as solved, since this does the trick for me. |