NVMe-related crashes and instability, plus a solution
#9
(09-30-2020, 02:18 PM)simonsouth Wrote: After installing an NVMe SSD in my Pinebook Pro I began to see Linux crashing periodically with output like the following:

Code:
[    7.153982] SError Interrupt on CPU2, code 0xbf000002 -- SError
[    7.153986] CPU: 2 PID: 169 Comm: udevd Not tainted 5.8.1-gnu #1
[    7.153988] Hardware name: PINE64 Pinebook Pro (DT)
[    7.153989] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--)
[    7.153991] pc : nvme_submit_cmd+0x11c/0x130
[    7.153992] lr : nvme_queue_rq+0x43c/0x6b8
[    7.153993] sp : ffff80001409b6f0
[    7.153995] x29: ffff80001409b6f0 x28: ffff0000f4716000
[    7.153998] x27: 0000000000000000 x26: 0000000000001000
[    7.154002] x25: 0000000000000001 x24: 0000000000001000
[    7.154004] x23: ffff0000eff62000 x22: 0000000000000000
[    7.154007] x21: 0000000000000001 x20: ffff0000f4536a40
[    7.154010] x19: ffff800010d1a000 x18: 0000000000000000
[    7.154014] x17: 0000000000000000 x16: 0000000000000000
[    7.154016] x15: 0000000000000000 x14: 0000000000000000
[    7.154019] x13: 0000000000000000 x12: ffff800010226c88
[    7.154022] x11: 0000000000000000 x10: 0000000000000000
[    7.154025] x9 : 0000000000000000 x8 : ffffffffffffffff
[    7.154028] x7 : 00000000e929d000 x6 : 00000000e929d000
[    7.154031] x5 : 0000000007ef7ac9 x4 : 0000000000000006
[    7.154034] x3 : 0000000000000000 x2 : 0000000780000007
[    7.154037] x1 : ffff0000f4536a48 x0 : 0000000000000000
[    7.154040] Kernel panic - not syncing: Asynchronous SError Interrupt
[    7.154042] CPU: 2 PID: 169 Comm: udevd Not tainted 5.8.1-gnu #1
[    7.154044] Hardware name: PINE64 Pinebook Pro (DT)
[    7.154044] Call trace:
[    7.154046]  dump_backtrace+0x0/0x1d8
[    7.154047]  show_stack+0x14/0x20
[    7.154048]  dump_stack+0xbc/0xf8
[    7.154049]  panic+0x150/0x348
[    7.154050]  add_taint+0x0/0xa8
[    7.154051]  arm64_serror_panic+0x74/0x80
[    7.154053]  do_serror+0x6c/0x168
[    7.154054]  el1_error+0x84/0x100
[    7.154055]  nvme_submit_cmd+0x11c/0x130
[    7.154056]  nvme_queue_rq+0x43c/0x6b8
[    7.154058]  __blk_mq_try_issue_directly+0x104/0x230
[    7.154059]  blk_mq_request_issue_directly+0x50/0x100
[    7.154061]  blk_mq_try_issue_list_directly+0x58/0xe8
[    7.154062]  blk_mq_sched_insert_requests+0xe0/0x150
[    7.154064]  blk_mq_flush_plug_list+0x11c/0x188
[    7.154065]  blk_flush_plug_list+0xd8/0x108
[    7.154066]  blk_finish_plug+0x30/0xa0
[    7.154067]  read_pages+0x154/0x290
[    7.154069]  page_cache_readahead_unbounded+0x160/0x220
[    7.154070]  __do_page_cache_readahead+0x34/0x48
[    7.154072]  force_page_cache_readahead+0xb4/0x108
[    7.154073]  page_cache_sync_readahead+0xe4/0xf0
[    7.154074]  generic_file_buffered_read+0x5d8/0xa28
[    7.154076]  generic_file_read_iter+0xd0/0x180
[    7.154077]  blkdev_read_iter+0x38/0x48
[    7.154079]  new_sync_read+0xec/0x188
[    7.154080]  vfs_read+0x1bc/0x1d0
[    7.154081]  ksys_read+0x68/0xf8
[    7.154082]  __arm64_sys_read+0x14/0x20
[    7.154083]  do_el0_svc+0x68/0xd0
[    7.154084]  el0_sync_handler+0x16c/0x2a0
[    7.154086]  el0_sync+0x140/0x180
[    7.154112] SMP: stopping secondary CPUs
[    7.154113] Kernel Offset: disabled
[    7.154114] CPU features: 0x200022,01006008
[    7.154116] Memory Limit: none

The crashes became more and more frequent until eventually the system would fail to boot most times. The exact backtrace varied, but it always referenced the NVMe driver and indicated an "asynchronous system error", pointing to an issue with the hardware itself.

After some research, I've found the solution is to remove this line from the Pinebook Pro device tree:

Code:
max-link-speed = <2>;

Since building a new kernel with this change I've yet to see a single crash from the NVMe driver and the system appears completely stable.

What this change does is stop the Linux PCIe driver from trying to operate the PCIe link at rates above the default for RK3399-based devices of 2.5 GT/s, which is the maximum rate Rockchip themselves claim the SoC will support. It seems the RK3399 was originally designed to operate its PCIe bus at the higher, "gen 2" speed, but since the SoC's release the company has downgraded its specifications as (I assume) variances in manufacturing resulted in many parts proving unstable at that speed—as my Pinebook Pro demonstrates.

I suspect this may be the cause of many of the NVMe-related issues other forum members are experiencing, particularly when failures are intermittent or the drive is known to work in other machines.

In fact, between this and the 2.0 GHz CPU frequency (also unsupported by Rockchip) that is enabled in the kernels most people are using, I find it remarkable that most Pinebook Pros have been running out-of-spec by default, which I have to think has something to do with the uneven experiences people are reporting with the machine as well as the general lack of reliability you sense skimming the posts in this forum.

In any case, if your Pinebook Pro seems to be having trouble using an NVMe drive, try bringing it back within the manufacturer's specifications by removing the line above from the device tree (and reverting the 2.0 GHz patch, if you've been using it) and building a new kernel. You may find the problems you've been experiencing disappear completely.

Hey guys,
I recently put an NVMe-ssd in my PBP (Intel 660p M.2 1TB) and experiencing failures after copying larger amount of data. The device just disappears from the device lists, and it takes some reboots before it resurfaces.
I've tried copying at lower speeds, but that doesn't seem to to the trick neither.
I stumbled on this post here and taught I  could give it a try, but I'm not really sure how to go about it, not being that proficient in rebuilding kernels et al. Can someone maybe give some pointers on how to go about this?
I'm currently running manjaro ARM 20.10 with kernel 5.9.9-2
Does this solution mean you have to tinker every time the kernel updates?

Thanks a lot
  Reply


Messages In This Thread
RE: NVMe-related crashes and instability, plus a solution - by nostro - 12-09-2020, 01:41 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  New Working nVME gilwood 0 155 02-12-2024, 08:46 AM
Last Post: gilwood
  NVME problems 2022 / Intel 660p 1TB Starbug 1 1,419 04-04-2023, 12:16 PM
Last Post: globaltree
Thumbs Up NVMe adapter, Great addition dachalife 2 1,725 11-28-2022, 12:56 PM
Last Post: dachalife
  NVMe drives not detected mattpenn 12 10,038 03-05-2022, 04:53 AM
Last Post: mattpenn
  NVme intall usage? tkudog 2 2,794 03-04-2022, 01:29 AM
Last Post: Tazdevl
  Anyone selling a spare NVMe adapter in Europe? tom.tomasz 1 1,793 01-03-2022, 07:57 AM
Last Post: tom.tomasz
  NVMe SSD testing methodology halogen 1 2,555 07-22-2021, 05:57 PM
Last Post: calinb
Question Battery stops charging and NVMe and other media disconnect randomly Eey0zu6O 4 4,631 07-09-2021, 08:45 PM
Last Post: moonwalkers
  nvme drive disappears after about an hour of uptime codebreaker 25 31,451 02-09-2021, 11:32 PM
Last Post: dsimic
  NVME SPI Update not booting SD Card WZ9V 5 6,248 10-18-2020, 08:36 PM
Last Post: wdt

Forum Jump:


Users browsing this thread: 1 Guest(s)