PINE64
Kernel oops after big-ish writes - Printable Version

+- PINE64 (https://forum.pine64.org)
+-- Forum: ROCKPRO64 (https://forum.pine64.org/forumdisplay.php?fid=98)
+--- Forum: Linux on RockPro64 (https://forum.pine64.org/forumdisplay.php?fid=101)
+--- Thread: Kernel oops after big-ish writes (/showthread.php?tid=12844)



Kernel oops after big-ish writes - gaeb - 01-21-2021

I've been trying to set up a rockpro64 for the purposes of a home server for backups; I have flashed the SPI so as to be able to boot off a USB. I had a Debian official installation on an SD with encrypted root going back in December when I noticed that any write operations larger than a few MB would give me a kernel oops (using tools like scp and rsync), and when I left it up for the holidays for a couple weeks, the machine crashed on its own. This problem seems to be independent of which OS I use; I've been trying to switch to Manjaro by flashing a USB stick and then rsyncing the contents of the root partition to a luks volume on an sd card, and I get a similar oops.

It seemed very similar to the problem described in this thread, but the solution offered here (compiling my own device tree blobs with trusted firmware) didn't yield any different behavior; although the thread leads me to suspect that it may have something to do with my having flashed the SPI partition, and there's firmware from when the rockpro64 booted off SPI still running that the kernel bumps into? However, I've used ayufan's as well as sigmaris' latest releases (sigmaris seems to be using atf for the dtb), and I continue to get kernel crashes.

What follows are the kernel messages from the most recent crash. I'm fairly new to dealing with things like this, so please let me know if I'm missing information that would be helpful.

Code:
#### lots of rsync to sd happening before this, when suddenly... ####

[  283.352255] Unable to handle kernel paging request at virtual address 000000000000b200
          3,797 100%   14.[  283.353084] Mem abort info:
60kB/s    0:00:00 (xfr#7615, ir-chk=1015/10770)
rsync: connecti[  283.353853]   ESR = 0x96000004
on unexpectedly closed (478578 bytes received so far) [generator[  283.354546]   EC = 0x25: DABT (current EL), IL = 32 bits
]
rsync error: error in rsync protocol data stream (code 12) at[  283.355484]   SET = 0, FnV = 0
io.c(228) [generator=v3.2.3]
[  283.356229]   EA = 0, S1PTW = 0
[  283.356735] Data abort info:
[  283.356994]   ISV = 0, ISS = 0x00000004
[  283.357335]   CM = 0, WnR = 0
[  283.357603] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000f5a93000
[  283.358171] [000000000000b200] pgd=0000000000000000, p4d=0000000000000000
[  283.358775] Internal error: Oops: 96000004 [#2] SMP
[  283.359207] Modules linked in: btrfs blake2b_generic xor xor_neon zstd_compress raid6_pq dm_crypt zram des_generic libdes cfg80211 8021q md4 garp mrp stp llc realtek rc_cec hci_uart snd_soc_simple_card dw_hdmi_cec snd_soc_audio_graph_card snd_soc_simple_card_utils btqca pwm_fan btrtl btbcm dw_hdmi_i2s_audio panfrost btintel dwmac_rk gpu_sched stmmac_platform bluetooth rockchip_rga hantro_vpu(C) stmmac rockchip_vdec(C) videobuf2_dma_sg ecdh_generic v4l2_h264 v4l2_mem2mem snd_soc_rockchip_i2s ecc snd_soc_rockchip_pcm videobuf2_vmalloc rfkill dw_wdt mdio_xpcs videobuf2_dma_contig videobuf2_memops phylink videobuf2_v4l2 videobuf2_common snd_soc_es8316 rtc_rk808 rockchip_saradc snd_soc_hdmi_codec rockchip_thermal gpio_keys rockchipdrm analogix_dp dw_hdmi cec rc_core dw_mipi_dsi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks
[  283.366001] CPU: 1 PID: 690 Comm: rsync Tainted: G      D  C        5.9.13-1-MANJARO-ARM #1
[  283.366735] Hardware name: Pine64 RockPro64 v2.1 (DT)
[  283.367185] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  283.367688] pc : fsnotify_destroy_marks+0x1f4/0x420
[  283.368124] lr : fsnotify_grab_connector+0x28/0xf0
[  283.368548] sp : ffff800010c03d60
[  283.368843] x29: ffff800010c03d60 x28: ffff0000f6f51c80
[  283.369315] x27: 0000000000000000 x26: 0000000000000000
[  283.369786] x25: 0000000000000000 x24: 0000000000000000
[  283.370258] x23: ffff800012daa000 x22: ffff800012daa0c8
[  283.370730] x21: 0000000000000000 x20: ffff800013004020
[  283.371201] x19: 000000000000b200 x18: 0000000000000000
[  283.371673] x17: 0000000000000000 x16: 0000000000000000
[  283.372143] x15: 0000000000000000 x14: 0000000000000000
[  283.372614] x13: 0000000000000000 x12: 0000000000000000
[  283.373085] x11: 0000000000000000 x10: 0000000000000000
[  283.373555] x9 : 0000000000000000 x8 : 0000000000000000
[  283.374027] x7 : 000000000000003f x6 : ffff0000e0cd5600
[  283.374498] x5 : 0000000000000305 x4 : 0000000000000000
[  283.374970] x3 : 0000000000000001 x2 : 0000000000000001
[  283.375440] x1 : 0000000000000000 x0 : 0000000000000000
[  283.375912] Call trace:
[  283.376137]  fsnotify_destroy_marks+0x1f4/0x420
[  283.376540]  fsnotify_find_mark+0x1c/0xa4
[  283.376899]  dnotify_flush+0x58/0x180
[  283.377229]  filp_close+0x50/0x90
[  283.377528]  __close_fd+0x24/0x40
[  283.377826]  __arm64_sys_close+0x24/0x60
[  283.378178]  el0_svc_common.constprop.0+0x6c/0x170
[  283.378605]  do_el0_svc+0x24/0x90
[  283.378905]  el0_sync_handler+0x90/0x19c
[  283.379256]  el0_sync+0x158/0x180
[  283.379559] Code: 88037c82 35ffff83 17fffaba f9800271 (885ffe60)
[  283.380100] ---[ end trace 051f29ebb0bb4ed4 ]---
[  330.448063] kauditd_printk_skb: 9 callbacks suppressed
Violación de segmento



RE: Kernel oops after big-ish writes - gaeb - 02-11-2021

Still haven't figured it out. Tried some new things, found some new things:

I erased the SPI partition on the rp64. Still crashed.
I compiled my own idbloader.img and u-boot.itb and dd'd them onto the sd card. Still crashed.
Switched to Manjaro, tried a similar encrypted set up. Dd'd the aforementioned firmware. Still crashed.
        (^^^Found out that upgrading from the 5.9 kernel in the image to 5.10 breaks cryptsetup and makes me unable to unlock a LUKS volume which I could otherwise unlock on my desktop. ...should I report this???)
I tried the official Manjaro image with no encryption at all. Also dd'd the firmware. Still crashed.

On the non-fancy Manjaro install, I found something interesting. I logged in over serial, set dmesg -n 8, and then ssh'd in and did a continuous hexdump of the sdcard; Manjaro crashed after about 15 minutes. But now that I've tried the hexdump loop, but this time logged in over serial only, it's been going for well over an hour with no issues.

Does anybody have any advice?
Can anybody recommend a distribution that will let me back up my files?