nvme drive disappears after about an hour of uptime
(01-17-2021, 01:39 PM)dsimic Wrote: It is really strange that reinserting the ribbon cable makes the NVMe drive to appear again, but it may very well be caused by the excessive lenght of the ribbon cable.  What is the revision of the NVMe adapter you're using, the first revision (with the wider PCB), or the second revision (which has a narrow PCB)?

I meant to answer this before, and had forgotten. Mine is the second revision, I believe, whichever was the first replacement for the original which needed an adjustment to fit (i.e. mine worked out of the box).

(01-17-2021, 01:39 PM)dsimic Wrote: According to the description of NVMe thermal throttling management, this SMART data says that your drive has spent some time in a light trottling state, since the last power-on event (those numbers shouldn't be lifetime counts).  That doesn't make much sense, because the drive worked well in your last overnight test.

Could the drive be somehow defective?  Just guessing.

Edit: After checking the NVMe specification (pages 123-124) and the source code of smartmontools, I can confirm the above-stated meaning for those two SMART values.  By the way, the value for total time is in seconds.

Nice finds! I took a glance at the documentation. What is it that makes you say they shouldn't be lifetime counts? I see nothing claiming for or against, but I do see: "A value of 0h, indicates that this transition has never occurred or this field is not implemented." Particularly, "never occurred" as opposed to something like "has not occurred since uptime" would infer that it's more likely a lifetime count (I'm just basing this on the wording; I'm mostly unfamiliar with this technology).

If it is lifetime counts, it could refer to thermal throttling after first putting it in, prior to having set up power state configurations. Further, this SSD model does not support saving PS config to the NVMe itself, so I have to use a systemd task to set PS 2 every boot; I would guess this potentially leaves room for the "vendor specific thermal management actions" mentioned in the documentation.

It's also unclear what exactly those thermal thresholds are. I'm not seeing anything in my smart report to indicate what temperatures, so while PS 2 is the only power state available to it currently, I'm guessing it's possible the threshold is such that it's running the function which adjusts those values without actually switching power states.

