Freezes and kernel panics with Debian trixie - jssfr - 01-13-2026
Hi there,
after running quite smoothly for several years, since the upgrade to Debian trixie, my ROCKPro64 has become quite unstable.
I am using a JMicron Technology Corp. JMB58x AHCI SATA controller in the PCIe slot. The first symptom I had was that after the reboot after the upgrade, the status LEDs on the controller did not turn on. I connected a monitor on HDMI and there was no signal.
I have now advanced a couple iterations of debugging and here is what I have.
If the PCIe card is in, I get panics related to PCIe, such as:
Code: [ 4.965205] SError Interrupt on CPU5, code 0x00000000bf000002 -- SError
[ 4.965230] CPU: 5 UID: 0 PID: 52 Comm: kworker/u25:3 Tainted: G M 6.12.63+deb13-arm64 #1 Debian 6.12.63-1
[ 4.965249] Tainted: [M]=MACHINE_CHECK
[ 4.965253] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 4.965260] Workqueue: events_unbound deferred_probe_work_func
[ 4.965285] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 4.965297] pc : rockchip_pcie_rd_conf+0x194/0x2c0
[ 4.965315] lr : rockchip_pcie_rd_conf+0x188/0x2c0
[ 4.965326] sp : ffff8000828037a0
[ 4.965331] x29: ffff8000828037a0 x28: ffff000001fbf800 x27: 0000000000000001
[ 4.965348] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
[ 4.965362] x23: ffff800082485000 x22: 0000000000000000 x21: ffff8000828037e4
[ 4.965377] x20: 0000000000000000 x19: 0000000000000004 x18: ffffffffffffffff
[ 4.965390] x17: 30302f30303a3030 x16: 30306963702f6569 x15: 63702e3030303030
[ 4.965404] x14: ffff8000824bb460 x13: 0000000000000326 x12: 0000000000000000
[ 4.965418] x11: 0000000000000001 x10: 0000000000000000 x9 : ffff8000808698d0
[ 4.965431] x8 : 0000000124f798bc x7 : ffff000005740380 x6 : ffff000005747000
[ 4.965445] x5 : ffff000001fbf800 x4 : ffff800087000000 x3 : 0000000000c00008
[ 4.965458] x2 : 000000000080000a x1 : ffff800087c00008 x0 : ffff800087c0000c
[ 4.965475] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 4.965481] CPU: 5 UID: 0 PID: 52 Comm: kworker/u25:3 Tainted: G M 6.12.63+deb13-arm64 #1 Debian 6.12.63-1
[ 4.965496] Tainted: [M]=MACHINE_CHECK
[ 4.965500] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 4.965505] Workqueue: events_unbound deferred_probe_work_func
[ 4.965517] Call trace:
[ 4.965521] dump_backtrace+0xd8/0x130
[ 4.965534] show_stack+0x20/0x38
[ 4.965543] dump_stack_lvl+0x60/0x80
[ 4.965556] dump_stack+0x18/0x28
[ 4.965566] panic+0x164/0x378
[ 4.965582] nmi_panic+0x90/0x98
[ 4.965598] arm64_serror_panic+0x78/0x90
[ 4.965608] do_serror+0x30/0x80
[ 4.965617] el1h_64_error_handler+0x30/0x48
[ 4.965629] el1h_64_error+0x64/0x68
[ 4.965638] rockchip_pcie_rd_conf+0x194/0x2c0
[ 4.965650] pci_bus_read_config_dword+0x8c/0x140
[ 4.965663] pci_bus_generic_read_dev_vendor_id+0x38/0x178
[ 4.965678] pci_scan_single_device+0xb4/0x120
[ 4.965691] pci_scan_slot+0x60/0x230
[ 4.965703] pci_scan_child_bus_extend+0x4c/0x2e0
[ 4.965717] pci_scan_bridge_extend+0x180/0x5a8
[ 4.965731] pci_scan_child_bus_extend+0x1c4/0x2e0
[ 4.965744] pci_scan_root_bus_bridge+0x6c/0xe8
[ 4.965758] pci_host_probe+0x38/0xe0
[ 4.965771] rockchip_pcie_probe+0x3a0/0x530
[ 4.965782] platform_probe+0x70/0xe8
[ 4.965796] really_probe+0xc8/0x3a0
[ 4.965806] __driver_probe_device+0x84/0x160
[ 4.965815] driver_probe_device+0x44/0x130
[ 4.965825] __device_attach_driver+0xc4/0x170
[ 4.965836] bus_for_each_drv+0x90/0x100
[ 4.965845] __device_attach+0xa8/0x1c8
[ 4.965854] device_initial_probe+0x1c/0x30
[ 4.965864] bus_probe_device+0xb0/0xc0
[ 4.965873] deferred_probe_work_func+0xbc/0x120
[ 4.965883] process_one_work+0x178/0x3e0
[ 4.965895] worker_thread+0x204/0x3f0
[ 4.965907] kthread+0xe8/0xf8
[ 4.965916] ret_from_fork+0x10/0x20
[ 4.965929] SMP: stopping secondary CPUs
[ 4.965945] Kernel Offset: disabled
[ 4.965948] CPU features: 0x08,00002082,c0200000,4200421b
[ 4.965957] Memory Limit: none
[ 4.994272] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
If I boot without the PCIe card, I got what looked like a freeze on HDMI, but the UART logged this kernel panic:
Code: [ 106.672016] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000037
[ 106.676856] Mem abort info:
[ 106.681157] ESR = 0x0000000096000004
[ 106.685537] EC = 0x25: DABT (current EL), IL = 32 bits
[ 106.690074] SET = 0, FnV = 0
[ 106.694416] EA = 0, S1PTW = 0
[ 106.698736] FSC = 0x04: level 0 translation fault
[ 106.703208] Data abort info:
[ 106.707515] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 106.712069] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 106.716632] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 106.721166] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000df71000
[ 106.725827] [0000000000000037] pgd=0000000000000000, p4d=0000000000000000
[ 106.730563] Internal error: Oops: 0000000096000004 [#1] SMP
[ 106.735036] Modules linked in: nft_limit nft_masq nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables binfmt_misc snd_soc_hdmi_codec hantro_vpu aes_ce_blk rockchip_vdec(C) hci_uart v4l2_jpeg aes_ce_cipher crct10dif_ce v4l2_vp9 btqca polyval_ce v4l2_h264 polyval_generic rockchip_rga btrtl videobuf2_dma_contig btintel ghash_ce videobuf2_dma_sg gf128mul v4l2_mem2mem btbcm sha2_ce videobuf2_memops sha256_arm64 videobuf2_v4l2 snd_soc_audio_graph_card snd_soc_simple_card sha1_ce panfrost snd_soc_rockchip_i2s bluetooth snd_soc_spdif_tx snd_soc_es8316 snd_soc_simple_card_utils snd_soc_core videodev ofpart gpu_sched dw_hdmi_i2s_audio gpio_ir_recv des_generic dw_hdmi_cec pwm_fan snd_compress ecdh_generic leds_gpio spi_nor snd_pcm_dmaengine rk_crypto rfkill drm_shmem_helper snd_pcm videobuf2_common pwrseq_core crypto_engine snd_timer mtd mc libdes snd rockchip_saradc coresight_cpu_debug industrialio_triggered_buffer soundcore coresight_etm4x kfifo_buf rockchip_thermal industrialio coresight
[ 106.735344] cpufreq_dt evdev dm_mod nfsd auth_rpcgss nfs_acl lockd grace sunrpc efi_pstore configfs nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c crc32c_generic raid1 raid0 realtek md_mod xhci_plat_hcd xhci_hcd dwc3 rockchipdrm fusb302 udc_core rk808_regulator dw_hdmi tcpm cec ulpi dwmac_rk typec rc_core stmmac_platform fan53555 stmmac dw_mipi_dsi analogix_dp pwm_regulator gpio_rockchip drm_display_helper pcs_xpcs dwc3_of_simple phylink ohci_platform gpio_keys sdhci_of_arasan ohci_hcd mdio_devres drm_dma_helper ehci_platform phy_rockchip_inno_usb2 ehci_hcd of_mdio drm_kms_helper phy_rockchip_emmc sdhci_pltfm phy_rockchip_typec fixed_phy phy_rockchip_pcie usbcore nvmem_rockchip_efuse pl330 drm dw_wdt fwnode_mdio pwm_rockchip io_domain rockchip_dfi libphy cqhci dw_mmc_rockchip i2c_rk3x usb_common spi_rockchip dw_mmc_pltfm sdhci dw_mmc fixed
[ 106.806464] CPU: 2 UID: 0 PID: 900 Comm: nft Tainted: G C 6.12.63+deb13-arm64 #1 Debian 6.12.63-1
[ 106.811298] Tainted: [C]=CRAP
[ 106.815326] Hardware name: Pine64 RockPro64 v2.1 (DT)
[ 106.819469] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 106.823730] pc : nf_ct_iterate_cleanup+0xd4/0x240 [nf_conntrack]
[ 106.827937] lr : nf_ct_iterate_cleanup+0xc0/0x240 [nf_conntrack]
[ 106.832067] sp : ffff8000832f33b0
[ 106.835817] x29: ffff8000832f33b0 x28: ffff8000832f3450 x27: 0000000000000000
[ 106.839909] x26: ffff80007b451680 x25: ffff00000b594a00 x24: ffff80007b443538
[ 106.844018] x23: ffff80007b452688 x22: ffff80007b451c40 x21: 000000000001eb80
[ 106.848126] x20: 0000000000003d70 x19: ffff000020700000 x18: ffffffffffffffff
[ 106.852210] x17: 000000000f7574be x16: 0000000094c09be4 x15: ffff00000529f895
[ 106.856295] x14: ffff8000832f3240 x13: 0000000000000801 x12: ffff0000f77e0178
[ 106.860330] x11: 000000007fffffff x10: 0000000000000064 x9 : ffff80007b43b308
[ 106.864328] x8 : ffff0000f1072920 x7 : ffff0000f105a9c0 x6 : ffff80007b47eb10
[ 106.868342] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 106.872386] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
[ 106.876346] Call trace:
[ 106.879844] nf_ct_iterate_cleanup+0xd4/0x240 [nf_conntrack]
[ 106.883757] nf_ct_iterate_cleanup_net+0x50/0x70 [nf_conntrack]
[ 106.887678] nf_ct_netns_do_get+0x1c0/0x220 [nf_conntrack]
[ 106.891556] nf_ct_netns_get+0xc8/0x100 [nf_conntrack]
[ 106.895426] nft_ct_get_init+0xa8/0x1b0 [nft_ct]
[ 106.899134] nf_tables_newrule+0x2d4/0x898 [nf_tables]
[ 106.902984] nfnetlink_rcv_batch+0x698/0x960 [nfnetlink]
[ 106.906751] nfnetlink_rcv+0x16c/0x1b0 [nfnetlink]
[ 106.910483] netlink_unicast+0x304/0x380
[ 106.914126] netlink_sendmsg+0x1ac/0x410
[ 106.917709] __sock_sendmsg+0x64/0xc0
[ 106.921245] ____sys_sendmsg+0x270/0x308
[ 106.924786] ___sys_sendmsg+0xb8/0x118
[ 106.928209] __sys_sendmsg+0x90/0x100
[ 106.931533] __arm64_sys_sendmsg+0x2c/0x40
[ 106.934808] invoke_syscall+0x6c/0x100
[ 106.937919] el0_svc_common.constprop.0+0x48/0xf0
[ 106.941022] do_el0_svc+0x24/0x38
[ 106.943932] el0_svc+0x38/0x150
[ 106.946699] el0t_64_sync_handler+0x120/0x138
[ 106.949477] el0t_64_sync+0x190/0x198
[ 106.952089] Code: 3600009b 1400003b f940037b 3700073b (3940df60)
[ 106.954847] ---[ end trace 0000000000000000 ]---
[ 106.957390] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 106.960080] SMP: stopping secondary CPUs
[ 106.962488] Kernel Offset: disabled
[ 106.964664] CPU features: 0x08,00002082,c0200000,4200421b
[ 106.966962] Memory Limit: none
[ 106.968990] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
These two panics were captured with the Debian trixie kernel 6.12.63+deb13-arm64, but I managed to get the same HDMI-level symptoms (cursor stops blinking) as with the second panic with the Debian bookworm kernel 6.1.0-42-arm64.
I am currently running memtest86.com on the board, but so far (58% of the first pass) I see no errors. The panics do not always occur. Sometimes I can get it to boot through completely, at which point it seems to be stable for multiple days. The likelihood of a successful boot is lower if the SATA controller is in, to the point that I haven't yet checked if (other) panics occur if I manage to boot through with the SATA controller installed. I don't want to find out, because that'd likely risk the data on the attached disks.
(there are no peripherials attached to the Pi header, except a Raspberry Pico-based UART adapter on UART0. there's nothing connected to any other port except a keyboard on USB, a display on HDMI, and a network cable on the 8P8C/RJ-45 port.)
Once the memtest86 is done, I'll try to capture tracebacks with the 6.1.0 kernel. As mentioned, though, the system used to run fine (as far as I can tell: there *were* issues where reboots got stuck, but I had those attributed to something on UART0 interrupting u-boot. there's a chance >0 that there were, in fact, similar issues before the trixie upgrade).
One thing I already investigated is the kernel_comp_size variable for u-boot, which I found as a possible cause for funny crashes in another thread. I raised it to 128 MiB, which initially seemed to fix things, but then I managed to create the errors again. That seems plausible, because the distance between initramfs and kernel (according to the kernel_addr_r and ramdisk_addr_r in u-boot) was ~94 MiB anyway and the trixie kernel is only ~38 MiB in size. I'm running u-boot from June 2021 from here: https://github.com/sigmaris/u-boot/releases
To me, this looks like some kind of hardware fault, most likely bad RAM. Does anyone have another idea?
RE: Freezes and kernel panics with Debian trixie - jssfr - 01-13-2026
So memtest86 finished a complete pass successfully ("Finished pass #1 (of 4) (Total errors: 0, ECC errors: 0)"). Given the rate at which boots fail, I don't consider a second pass sensible.
Also ran dpkg -V to see if maybe the kernel image was corrupted or something, but that doesn't seem to be the case either.
RE: Freezes and kernel panics with Debian trixie - jssfr - 01-13-2026
Okay, more insights:
- The bookworm kernel 6.1.0-39 manages to enumerate the SATA card reliably if I limit the number of CPUs to 2.
- The trixie kernel 6.12-something reliably fails to boot with the SATA card in no matter the number of CPUs allowed.
- Without the SATA card, the trixie kernel boots cleanly with only two CPUs.
For the things dependent on the number of CPUs, I found a suspect:
I suspect this to be L9 or L10 from the schematics (EDIT: it's in fact more likely to be L2000, opening a different thread about that). I'll try to find someone who can help me replace that sucker. Before that, there's probably little sense in trying to resolve the other issue.
RE: Freezes and kernel panics with Debian trixie - jssfr - 01-22-2026
I got the inductor replaced which didn't change the error pattern at all.
* 6.1.0 from Debian bookworm: boots fine with or without PCIe, sometimes crashes when running nft(1) for the first time after a boot.
* 6.12.63 from Debian trixie: boots fine without PCIe, may also have the nft(1) issue, haven't investigated. When the SATA card is in, always crashes pre-HDMI enumerating the SATA card.
* The 6.17.x backports kernel for trixie has the same issue.
|