12-09-2020, 01:41 AM
(09-30-2020, 02:18 PM)simonsouth Wrote: After installing an NVMe SSD in my Pinebook Pro I began to see Linux crashing periodically with output like the following:
Code:[ 7.153982] SError Interrupt on CPU2, code 0xbf000002 -- SError
[ 7.153986] CPU: 2 PID: 169 Comm: udevd Not tainted 5.8.1-gnu #1
[ 7.153988] Hardware name: PINE64 Pinebook Pro (DT)
[ 7.153989] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--)
[ 7.153991] pc : nvme_submit_cmd+0x11c/0x130
[ 7.153992] lr : nvme_queue_rq+0x43c/0x6b8
[ 7.153993] sp : ffff80001409b6f0
[ 7.153995] x29: ffff80001409b6f0 x28: ffff0000f4716000
[ 7.153998] x27: 0000000000000000 x26: 0000000000001000
[ 7.154002] x25: 0000000000000001 x24: 0000000000001000
[ 7.154004] x23: ffff0000eff62000 x22: 0000000000000000
[ 7.154007] x21: 0000000000000001 x20: ffff0000f4536a40
[ 7.154010] x19: ffff800010d1a000 x18: 0000000000000000
[ 7.154014] x17: 0000000000000000 x16: 0000000000000000
[ 7.154016] x15: 0000000000000000 x14: 0000000000000000
[ 7.154019] x13: 0000000000000000 x12: ffff800010226c88
[ 7.154022] x11: 0000000000000000 x10: 0000000000000000
[ 7.154025] x9 : 0000000000000000 x8 : ffffffffffffffff
[ 7.154028] x7 : 00000000e929d000 x6 : 00000000e929d000
[ 7.154031] x5 : 0000000007ef7ac9 x4 : 0000000000000006
[ 7.154034] x3 : 0000000000000000 x2 : 0000000780000007
[ 7.154037] x1 : ffff0000f4536a48 x0 : 0000000000000000
[ 7.154040] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 7.154042] CPU: 2 PID: 169 Comm: udevd Not tainted 5.8.1-gnu #1
[ 7.154044] Hardware name: PINE64 Pinebook Pro (DT)
[ 7.154044] Call trace:
[ 7.154046] dump_backtrace+0x0/0x1d8
[ 7.154047] show_stack+0x14/0x20
[ 7.154048] dump_stack+0xbc/0xf8
[ 7.154049] panic+0x150/0x348
[ 7.154050] add_taint+0x0/0xa8
[ 7.154051] arm64_serror_panic+0x74/0x80
[ 7.154053] do_serror+0x6c/0x168
[ 7.154054] el1_error+0x84/0x100
[ 7.154055] nvme_submit_cmd+0x11c/0x130
[ 7.154056] nvme_queue_rq+0x43c/0x6b8
[ 7.154058] __blk_mq_try_issue_directly+0x104/0x230
[ 7.154059] blk_mq_request_issue_directly+0x50/0x100
[ 7.154061] blk_mq_try_issue_list_directly+0x58/0xe8
[ 7.154062] blk_mq_sched_insert_requests+0xe0/0x150
[ 7.154064] blk_mq_flush_plug_list+0x11c/0x188
[ 7.154065] blk_flush_plug_list+0xd8/0x108
[ 7.154066] blk_finish_plug+0x30/0xa0
[ 7.154067] read_pages+0x154/0x290
[ 7.154069] page_cache_readahead_unbounded+0x160/0x220
[ 7.154070] __do_page_cache_readahead+0x34/0x48
[ 7.154072] force_page_cache_readahead+0xb4/0x108
[ 7.154073] page_cache_sync_readahead+0xe4/0xf0
[ 7.154074] generic_file_buffered_read+0x5d8/0xa28
[ 7.154076] generic_file_read_iter+0xd0/0x180
[ 7.154077] blkdev_read_iter+0x38/0x48
[ 7.154079] new_sync_read+0xec/0x188
[ 7.154080] vfs_read+0x1bc/0x1d0
[ 7.154081] ksys_read+0x68/0xf8
[ 7.154082] __arm64_sys_read+0x14/0x20
[ 7.154083] do_el0_svc+0x68/0xd0
[ 7.154084] el0_sync_handler+0x16c/0x2a0
[ 7.154086] el0_sync+0x140/0x180
[ 7.154112] SMP: stopping secondary CPUs
[ 7.154113] Kernel Offset: disabled
[ 7.154114] CPU features: 0x200022,01006008
[ 7.154116] Memory Limit: none
The crashes became more and more frequent until eventually the system would fail to boot most times. The exact backtrace varied, but it always referenced the NVMe driver and indicated an "asynchronous system error", pointing to an issue with the hardware itself.
After some research, I've found the solution is to remove this line from the Pinebook Pro device tree:
Code:max-link-speed = <2>;
Since building a new kernel with this change I've yet to see a single crash from the NVMe driver and the system appears completely stable.
What this change does is stop the Linux PCIe driver from trying to operate the PCIe link at rates above the default for RK3399-based devices of 2.5 GT/s, which is the maximum rate Rockchip themselves claim the SoC will support. It seems the RK3399 was originally designed to operate its PCIe bus at the higher, "gen 2" speed, but since the SoC's release the company has downgraded its specifications as (I assume) variances in manufacturing resulted in many parts proving unstable at that speed—as my Pinebook Pro demonstrates.
I suspect this may be the cause of many of the NVMe-related issues other forum members are experiencing, particularly when failures are intermittent or the drive is known to work in other machines.
In fact, between this and the 2.0 GHz CPU frequency (also unsupported by Rockchip) that is enabled in the kernels most people are using, I find it remarkable that most Pinebook Pros have been running out-of-spec by default, which I have to think has something to do with the uneven experiences people are reporting with the machine as well as the general lack of reliability you sense skimming the posts in this forum.
In any case, if your Pinebook Pro seems to be having trouble using an NVMe drive, try bringing it back within the manufacturer's specifications by removing the line above from the device tree (and reverting the 2.0 GHz patch, if you've been using it) and building a new kernel. You may find the problems you've been experiencing disappear completely.
Hey guys,
I recently put an NVMe-ssd in my PBP (Intel 660p M.2 1TB) and experiencing failures after copying larger amount of data. The device just disappears from the device lists, and it takes some reboots before it resurfaces.
I've tried copying at lower speeds, but that doesn't seem to to the trick neither.
I stumbled on this post here and taught I could give it a try, but I'm not really sure how to go about it, not being that proficient in rebuilding kernels et al. Can someone maybe give some pointers on how to go about this?
I'm currently running manjaro ARM 20.10 with kernel 5.9.9-2
Does this solution mean you have to tinker every time the kernel updates?
Thanks a lot