Rockpro64 not stable... crashes now and then.
#8
Hi
I'm also experiencing issue on my new rockpro64.
To summarize here is what I saw:
- The system is unstable and can crash if you do quick disk read/write. With SATA disk via PCIe I have the oops in no time if I run syncthing. Without SATA disk, you can crash if you do hexdump /dev/your eMMC/sdcard disk. It will take time but it will crash
- If you slow down the read/write, it will (i think) take more time to crash. For example, if I add lots of debug logs to the mmc driver, i will have to wait more time
- a possible (but ugly) work around fix may be to slow down the driver. But I prefer a cleaner/proper fix
- I saw that when we got the error on sd card, DMA read is started but we never get the DMA complete interruption. It results in bus reset. If we are lucky we will get a few error, and bus reset and it will restart to work. But it may also fail and result in (a)synchronous external abort. (meaning we are dereferencing an invalid address outside of the CPU (in SPI bus)
- I saw on kernel.org they fixed BUGs on PCIe rockchip leading to external abort, but its not exactly like my opps. (and we also have the oops on eMMC/sdcard without PCIe). I tried anyway the last kernel 5.5-rc2 and I still have the same issue. In the other hand, they may still have unfound bugs.
- I reported a BUG in manjaro where some patches weren't applied, we will got the fix on next manjaro release, but it still doesnt solve our issue.
- same issue with debian buster or manjaro with arch linux kernel
- I am also afraid to find that the BUG is a hardware BUG. I dont know yet. If it is the case, maybe we can find an acceptable software work around ?

Here is some logs:


let's start by a normal and successful transfer:
[ 1028.378665] dwmmc_rockchip fe320000.dwmmc: start command: ARGR=0x00000100 CMDR=0x20000157

[ 1028.379392] dwmmc_rockchip fe320000.dwmmc: sd sg_cpu: 0xffff800011ee5000 sg_dma: 0xebf6b000 sg_len: 32
[ 1028.380238] dwmmc_rockchip fe320000.dwmmc: start command: ARGR=0x00e67708 CMDR=0x20002352
[ 1028.389514] dwmmc_rockchip fe320000.dwmmc: DMA complete
[ 1028.389993] dwmmc_rockchip fe320000.dwmmc: list empty

And now a bad one, the frist failure I have: We initiate the transfer and we never get the interruption for DMA completion.
So we get a CTO timeout with unexpected state of "data busy" (state 3) and we power off
[ 1028.395158] dwmmc_rockchip fe320000.dwmmc: start command: ARGR=0x00000100 CMDR=0x20000157
[ 1028.395886] dwmmc_rockchip fe320000.dwmmc: sd sg_cpu: 0xffff800011ee5000 sg_dma: 0xebf6b000 sg_len: 32
[ 1028.396739] dwmmc_rockchip fe320000.dwmmc: start command: ARGR=0x00e67808 CMDR=0x20002352
[ 1028.737229] dwmmc_rockchip fe320000.dwmmc: start command: ARGR=0x00000000 CMDR=0x2000414c
[ 1028.772081] dwmmc_rockchip fe320000.dwmmc: Unexpected command timeout, state 3
[ 1029.092087] dwmmc_rockchip fe320000.dwmmc: data error, status 0x00000200
[ 1029.092690] dwmmc_rockchip fe320000.dwmmc: list empty

When you get this error, you will get your oops between 0 and 10 seconds.

[ 1030.613586] SError Interrupt on CPU3, code 0xbf000000 -- SError
[ 1030.613589] CPU: 3 PID: 475 Comm: systemd-journal Tainted: G L 5.5.0-rc2-1-ARCH #7
[ 1030.613591] Hardware name: Pine64 RockPro64 (DT)
[ 1030.613592] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 1030.613594] pc : allocate_slab+0x210/0x460
[ 1030.613595] lr : allocate_slab+0x1f8/0x460
[ 1030.613596] sp : ffff800011f139f0
[ 1030.613597] x29: ffff800011f139f0 x28: 0000000000000005
[ 1030.613601] x27: ffff000009c5c800 x26: 0000000000000010
[ 1030.613603] x25: 0000000000001000 x24: 0000000000000400
[ 1030.613606] x23: ffff000009c5c500 x22: ffff000009c5c000
[ 1030.613609] x21: fffffe0000071700 x20: 0000000000000001
[ 1030.613612] x19: ffff0000ea079c00 x18: 0000000000000000
[ 1030.613615] x17: 0000000000000000 x16: 0000000000000000
[ 1030.613617] x15: 0000000000000000 x14: 0000000000000000
[ 1030.613620] x13: 0000000000000000 x12: 0000000000000000
[ 1030.613623] x11: 0000000000000000 x10: 0000000000000000
[ 1030.613626] x9 : ffff8000102e8788 x8 : 00000000f7e00000
[ 1030.613629] x7 : ffff8000e5f2e000 x6 : ffff8000e5f2e000
[ 1030.613631] x5 : 000000000000507b x4 : 0000000000000000
[ 1030.613634] x3 : 0000000044042000 x2 : 0000000080010400
[ 1030.613637] x1 : 0000000000000000 x0 : 0000000000000010
[ 1030.613640] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1030.613643] CPU: 3 PID: 475 Comm: systemd-journal Tainted: G L 5.5.0-rc2-1-ARCH #7
[ 1030.613644] Hardware name: Pine64 RockPro64 (DT)
[ 1030.613645] Call trace:
[ 1030.613646] dump_backtrace+0x0/0x1b0
[ 1030.613647] show_stack+0x1c/0x28
[ 1030.613648] dump_stack+0xac/0xd4
[ 1030.613650] panic+0x154/0x32c
[ 1030.613651] __stack_chk_fail+0x0/0x20
[ 1030.613652] arm64_serror_panic+0x84/0x90
[ 1030.613653] do_serror+0x88/0x140
[ 1030.613654] el1_error+0x84/0x100
[ 1030.613656] allocate_slab+0x210/0x460
[ 1030.613657] new_slab+0x5c/0xb8
[ 1030.613658] ___slab_alloc.constprop.0+0x308/0x500
[ 1030.613660] __slab_alloc.constprop.0+0x24/0x40
[ 1030.613661] kmem_cache_alloc+0x310/0x320
[ 1030.613662] __alloc_file+0x30/0xf8
[ 1030.613663] alloc_empty_file+0x64/0x108
[ 1030.613665] path_openat+0x4c/0x258
[ 1030.613666] do_filp_open+0x7c/0x100
[ 1030.613667] do_sys_open+0x170/0x220
[ 1030.613669] __arm64_sys_openat+0x28/0x30
[ 1030.613670] el0_svc_handler+0x84/0x190
[ 1030.613671] el0_sync_handler+0x138/0x258
[ 1030.613672] el0_sync+0x140/0x180
[ 1030.614142] SMP: stopping secondary CPUs
[ 1030.614144] Kernel Offset: disabled
[ 1030.614145] CPU features: 0x10002,20006008
[ 1030.614146] Memory Limit: none

To reproduce the BUG with eMMC (and no PCIe):
- connect the serial console and increase log level. (dmesg -n 8). So you will get the oops
- on a ssh connection run this command and wait: while `true`; do sudo hexdump /dev/mmcblk0 ; done
  Reply


Messages In This Thread
RE: Rockpro64 not stable... crashes now and then. - by abdel - 12-19-2019, 08:03 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  irradium (based on crux linux) RockPro64 riscv64, aarch64 mara 7 1,879 11-20-2024, 03:53 PM
Last Post: mara
  New OS for RockPro64 is here, TwisterOS Armbian jtremblant 92 106,009 08-17-2024, 02:32 PM
Last Post: taltamir
  OpenEuler OS on RockPro64 Yuriy Gavrilov 0 289 06-15-2024, 09:38 AM
Last Post: Yuriy Gavrilov
  yocto for RockPro64 Fide 1 1,114 01-16-2024, 10:01 AM
Last Post: Fide
  Installing Ubuntu Server on RockPro64 deutschlmao 2 3,475 10-29-2023, 04:43 PM
Last Post: brotherj4mes
  Vanilla mainline Debian 11 (Bullseye) on the RockPro64 Pete Tandy 22 21,326 08-16-2023, 01:34 AM
Last Post: varac
  slarm64 (unofficial slackware) ROCKPro64 RK3399 (aarch64) mara 54 92,894 08-11-2023, 11:13 AM
Last Post: mara
  How to enable CoreSight ETM trace on RockPro64 shpark 0 864 05-21-2023, 11:34 PM
Last Post: shpark
  Rockpro64 Dead on arrival? quixoticgeek 1 1,370 03-12-2023, 06:55 PM
Last Post: quixoticgeek
  RockPro64 boot questions misterc 3 2,329 01-13-2023, 06:21 PM
Last Post: misterc

Forum Jump:


Users browsing this thread: 9 Guest(s)