nvme drive disappears after about an hour of uptime
#1
About a month ago I installed an intel 660p 2280 2tb nvme drive using the adapter available on the pine64 store. It has worked great and I even moved my home directory to it. Two days ago the drive disappeared which caused manjaro to kick me to a login screen which would not allow me to login. Rebooting put me into a command prompt saying home folder wasn't available. I thought it might be the ribbon cable so I re-seated it and rebooted, and the drive was back. But an hour later it disappeared again. This cycle repeated many more times, with me thinking the ribbon cable and connectors were the problem, and constantly re-seating the cable. I have a second nvme adapter, so I swapped it, and I get the same behavior. I also happened to have received another 660p for another project, so I swapped it in, to confirm that the issue was not with the drive, and yes, the drive is fine because the same behavior occurs with the new drive.

The nvme drive ran fine for about a month, but now I can't go beyond about an hour without it disappearing from the system. Perhaps the issue may be related to heat, since if I immediately reboot after the drive disappears, no drive but if I wait a while, the drive appears normally.

Any help or insights are greatly appreciated. Thanks.
  Reply
#2
I've had this issue happen to me a few times, though luckily only between boots. Turned out the issue I was having was not with the seating of the ribbon cable, but some standoffs within the case pinching the cable. Try checking to make sure that your cable isn't getting caught in one of the nearby screw posts when assembling.
  Reply
#3
After more testing, I am fairly convinced that this is a heat issue. If I cold-boot and immediately run the command

dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M status=progress conv=fdatasync

then it will get a good 10 minutes in before freezing. The left palm rest (where the nvme drive is located) is very warm-to-hot at this point. But even when it freezes, it is only temporary, and the progress will eventually start again, but only for a little while. Eventually, it will get to the point where it will write a few blocks every 30 seconds or so.

Unless someone has another suggestion, I plan to return both drives and get a different make and model.
  Reply
#4
(09-13-2020, 07:45 PM)codebreaker Wrote: After more testing, I am fairly convinced that this is a heat issue.

Thanks for sharing this insight! I got my PBP 4 days ago. The first thing I did was install an Intel SSD 660p Series 512GB nvme drive. I have been getting it configured and it's been going well. Today I was compiling pypy from source which is very RAM and CPU intensive, and the laptop froze. When I rebooted I encountered the same login prompt issue you did, and a root recovery shell showed that /dev/mmcblock2 was nowhere to be found.

After powering it off and letting it cool off, it it now boots fine. So I think your theory is true. I can't be certain if the overheating detection that disables the SSD is on the PBP's board or on the SSD itself, my guess would be the latter.

I'm going to purchase some silicon thermal pads to place inside the chassis to distribute heat. I think those will also help tighten up the keyboard/trackpad button action, which is a little bouncy for my taste due to the faceplate being plastic and the empty space underneath.
  Reply
#5
It appears that the CPU uses the entire bottom chassis as a heat sink, via a thick blue thermal pad stuck to the chassis.

Something similar should be possible with the NVMe. I'm still awaiting my thermal padding order, but I also ordered an NVMe heat sink which arrived today. It is basically just a thin copper plate. Installed it looks like this: https://imgur.com/gallery/koqCDJr

The heat sink kit included two thin thermal pads, which are installed between the NVMe chips and the copper plate. The plate fits almost too snugly against the magnesium chassis, so perhaps it will have good thermal conduction. I'll be testing that over the next few days. If the NVMe drive still disappears under heavy load, I'm going to try removing the copper plate and replacing it entirely with thermal pads once they arrive in the mail.

My guess is that chip->thick silicone->chassis (as is the case with the CPU) would be more efficient thermal conduction than my current setup of chip->thin silicone->copper plate->chassis since silicone would fill all gaps and thus there would be more surface area to distribute heat, even though metal<->metal thermal conduction is probably more efficient in the ideal case.

I'll report back here with my findings.

Edit: after reassembling the unit, the touchpad is a popping out a little bit because the heat sink is squeezed in there - so it's definitely too snug of a fit. It still works, but it's wonky. So I'm thinking just thermal pads will be the way to go and I'll have to do some trial and error for the right amount. It's worth noting that the keyboard and trackpad action do feel a lot less "bouncy" with the pressure of the heat sink/thermal pad inside, which is a win.
  Reply
#6
Update: so my thermal padding order arrived, and I replaced the copper heat sink I installed yesterday with a 1.5mm layer of padding. I also added gratuitous thermal padding throughout the unit to tighten up the keyboard and touchpad button action. It took a bit of trial and error to find the right amount for under the touchpad; too little meant the buttons were still bouncy, and too much meant the buttons would stick out and wouldn't click/activate. Here's a photo of the unit with thermal padding: https://imgur.com/gallery/CalsmAs

I'm happy to report that the keyboard and touchpad are now much nicer to use. I'll have to see how this performs for thermal conduction. The padding adds some weight to the laptop, but I don't mind. These Pinebook Pros are very lightweight to begin with.

I also added a systemd service to lower the power setting of the NVMe drive.

/etc/systemd/nvme-enter-low-power-state.service:


Code:
# systemd service for setting the NVMe drive power state on boot
# Per "Post NVMe install power limiting" from https://wiki.pine64.org/index.php/Pinebook_Pro
# Loosely based on example one-shot service here: https://gist.github.com/drmalex07/d006f12914b21198ee43

[Unit]
Description=Set the NVMe drive power state

[Service]
Type=oneshot
# -v is power state, power states for various NVMe SSDs are outlined here:
# https://wiki.pine64.org/index.php?title=Pinebook_Pro_Hardware_Accessory_Compatibility
# Mine is Intel 660p M.2 and I'm setting it to PS 1 as recommended there
ExecStart=nvme set-feature /dev/nvme0 -f 2 -v 1
RemainAfterExit=true
User=root
StandardOutput=journal

[Install]
WantedBy=multi-user.target



I installed and started the service with:
Code:
$ sudo systemctl enable nvme-enter-low-power-state.service
$ sudo systemctrl start nvme-enter-low-power-state.service


And here's a one-liner for printing the NVMe drive temperature in Farhenheit:

Code:
$ echo $(echo $(sudo nvme smart-log /dev/nvme0 | grep temperature | sed 's/[^0-9\.]//g')'*9/5+32' | bc)'°F'

If I still encounter issues with the NVMe drive disappearing, I'll report back.
  Reply
#7
I no longer think that heat is my problem. I returned the two Intel drives and bought a sarent Rocket Q  2tb drive, installed it, and have the exact same issue. 

After installing, I set up an ext4 partition, mount it, and then run the following command:

Code:
sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M status=progress conv=fdatasync

This is the one that hangs all the time. Not sure what to do next.
  Reply
#8
what power state are you using? can you use a lower one? i run my wd blue nvme at ps1. see nvme-ps in pbp-tools.
  Reply
#9
(09-19-2020, 03:02 PM)codebreaker Wrote: I no longer think that heat is my problem. I returned the two Intel drives and bought a sarent Rocket Q  2tb drive, installed it, and have the exact same issue. 

After installing, I set up an ext4 partition, mount it, and then run the following command:

Code:
sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M status=progress conv=fdatasync

This is the one that hangs all the time. Not sure what to do next.

If you use dd on a mounted partition, I think you'd get undefined behavior, likely crashes. When I've accidentally dd'd to the wrong device (mounted /boot and /) in the past its caused a system crash.

Otherwise, it could still be a temperature thing, could also have to do with power draw. Maybe try lowering the power setting as xmixahlx suggested - my previous comment has an example systemd service to do that.

And as an aside, my thermal padding rig has been working well, no more crashes so far.
  Reply
#10
(10-01-2020, 09:52 AM)simonsouth Wrote:
(09-19-2020, 03:02 PM)codebreaker Wrote: This is the one that hangs all the time. Not sure what to do next.

This is the sort of intermittent failure I suspect may be caused by the PCIe driver trying to operate the link at a speed higher than the RK3399 can reliably support.

I've written about this in another thread. If you remove the "max-link-speed" override from the device tree your system is using, does it improve the situation any?

Any news on this one. Would like to try it, but I don't have a clue how to go about it. Any tips?
  Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  NVMe reukiodo 0 48 11-18-2024, 09:00 AM
Last Post: reukiodo
  NVMe drives not detected mattpenn 13 12,536 11-17-2024, 04:38 AM
Last Post: reukiodo
  New Working nVME gilwood 1 616 11-17-2024, 04:24 AM
Last Post: reukiodo
  NVME problems 2022 / Intel 660p 1TB Starbug 1 2,169 04-04-2023, 12:16 PM
Last Post: globaltree
Thumbs Up NVMe adapter, Great addition dachalife 2 2,361 11-28-2022, 12:56 PM
Last Post: dachalife
  NVme intall usage? tkudog 2 3,452 03-04-2022, 01:29 AM
Last Post: Tazdevl
  Anyone selling a spare NVMe adapter in Europe? tom.tomasz 1 2,233 01-03-2022, 07:57 AM
Last Post: tom.tomasz
  NVMe SSD testing methodology halogen 1 3,073 07-22-2021, 05:57 PM
Last Post: calinb
Question Battery stops charging and NVMe and other media disconnect randomly Eey0zu6O 4 5,734 07-09-2021, 08:45 PM
Last Post: moonwalkers
  NVME SPI Update not booting SD Card WZ9V 5 7,384 10-18-2020, 08:36 PM
Last Post: wdt

Forum Jump:


Users browsing this thread: 1 Guest(s)