10-17-2021, 09:02 AM
(This post was last modified: 10-17-2021, 09:47 AM by Dendrocalamus64.)
The OEM 64GB Sandisk eMMC in my first Pinebook Pro just failed from one moment to the next. I'd been using that machine heavily, for everything, including daily web browsing with the cache on the emmc, for about two years.
I'd never used anything with an emmc, sd card or ssd before, so my expectations for reliability were set by traditional spinning platter disks, where you could leave a good one running for a decade with no problems. Looking up emmcs & sd cards now, I had no idea they were so incredibly unreliable and prone to catastrophic failure with no warning at all.
The immediate takeaway is to back up your user data daily. The OS can be reinstalled, and adds a lot of bulk; if it's slow, you're less likely to do it regularly. Ideally don't just trust sd cards for the backups; you should have network-attached storage with real disks. And turn off web browser disk cache, or put it on a ram disk. I also did a fair amount of compiling, which generates a massive amount of disk writes on large projects, and I didn't switch to zram instead of disk swap until recently.
I was running Manjaro Xfce from the sd card, with the emmc mounted for storage. I tried to save a file in Mousepad, and it said the file was read-only. Checked perms; it should be writable by me. I tried to create a new file in Thunar, knowing I shouldn't have to do that, and it said the file system was read-only. That was the "oh shit" moment because I know linux remounts file systems read-only when an I/O error occurs.
I checked the dmesg, and it was full of CQE recovery failed messages for the mmc. The filesystem was still navigable due to caching, but trying to read a file, even with less, would result in the command locking up. On the web, it looked like rebooting solved similar errors temporarily, so I rebooted. After reboot, lsblk showed ~30MB capacity for the emmc instead of the expected 58.9G, and the sd card was flagged read-only in short order; the system would not allow remounting the sd card rw.
I attempted to reboot again, and the system wouldn't boot. I now know that the boot priority on the PBP always starts with the emmc, and the bootloader on the emmc is supposed to check for bootable media on the sd card, so if the emmc bootloader is out of sorts, you have to flip the emmc disable switch or pull the emmc in order to boot.
I put the pulled emmc on the pine store-supplied emmc-to-usb adapter, and tried it in a second PBP. The USB mass storage device lists as 0b capacity, and reports "Medium not present". testdisk isn't able to read it.
Looking at recovery options, there are at least four levels you can access these devices on. The highest level is a usb adapter, where the simple circuitry on the adapter presents it to the system as a generic block device. Next is putting it into an emmc socket, where it reports as /dev/mmcblkX, and you can manipulate it as an mmc. I'm going to see what I can do with that next. The ideal testbed for that is a single-board computer in an open enclosure, so you don't have to open your running PBP and flip the switch back on.
Then there's JTAG ? I still need to read about that.
And finally, there's reading the NAND directly. This is the best single thread I've found about it so far, including the linked pages and PDFs:
Which NAND flash reader ?
https://web.archive.org/web/202110171442...10&t=33785
It looks like tons of people lose their data on these every year, and data recovery is a solved problem. But, it's all proprietary. Third-party companies have developed the tools & software to read the raw NAND, and sell them at high prices. Sending your chip to commercial data recovery would be privacy suicide; there's no way to guarantee the company doesn't keep a copy, and some just make nand dumps and ftp them to other countries for outsourced processing without telling the customer. The data is usually all there, including deleted files, but most people don't get it recovered.
There should be an open source solution for emmcs widely used by the open source community, like these sandisk emmcs are now.
Steps,
- Acquire a bunch of emmcs for practice, document the process of exposing & connecting to the nand interface, develop a training curriculum like the commerical vendors have that affected users can follow at home to develop the hardware skills.
- Start with a commercial nand dumper. Eventually can be replaced with an open source device at lower cost.
- Develop the software & procedure for the specific combination of the Sandisk hardware & likely linux filesystems. Less work than having to support all mmcs from all manufacturers, and all operating systems.
Next to do for me: Try the mmcblk interface, then start looking at how much of step (3) has already been done.
All Pine devices should have socketed emmcs, not hard-soldered. A common failure mode is that a phone dies for some other reason, and the emmc is still good, but the user never gets it desoldered, so the data is lost anyway. Socketed emmcs are a major step forward compared to the usual way of doing it.
Most of the time, my system wasn't hitting the swap, but I was experimenting with different thread counts during builds to balance compile speed against the system running out of memory in bottlenecks. Just two threads could result in the system swapping massively when the make process was attempting to build two large source files concurrently. That may account for a lot of the reduced life expectancy. Nonetheless, properly designed solid state storage should remain readable in a read-only mode when it runs out of writes, and it appears that common emmcs do not.
I'd never used anything with an emmc, sd card or ssd before, so my expectations for reliability were set by traditional spinning platter disks, where you could leave a good one running for a decade with no problems. Looking up emmcs & sd cards now, I had no idea they were so incredibly unreliable and prone to catastrophic failure with no warning at all.
The immediate takeaway is to back up your user data daily. The OS can be reinstalled, and adds a lot of bulk; if it's slow, you're less likely to do it regularly. Ideally don't just trust sd cards for the backups; you should have network-attached storage with real disks. And turn off web browser disk cache, or put it on a ram disk. I also did a fair amount of compiling, which generates a massive amount of disk writes on large projects, and I didn't switch to zram instead of disk swap until recently.
I was running Manjaro Xfce from the sd card, with the emmc mounted for storage. I tried to save a file in Mousepad, and it said the file was read-only. Checked perms; it should be writable by me. I tried to create a new file in Thunar, knowing I shouldn't have to do that, and it said the file system was read-only. That was the "oh shit" moment because I know linux remounts file systems read-only when an I/O error occurs.
I checked the dmesg, and it was full of CQE recovery failed messages for the mmc. The filesystem was still navigable due to caching, but trying to read a file, even with less, would result in the command locking up. On the web, it looked like rebooting solved similar errors temporarily, so I rebooted. After reboot, lsblk showed ~30MB capacity for the emmc instead of the expected 58.9G, and the sd card was flagged read-only in short order; the system would not allow remounting the sd card rw.
I attempted to reboot again, and the system wouldn't boot. I now know that the boot priority on the PBP always starts with the emmc, and the bootloader on the emmc is supposed to check for bootable media on the sd card, so if the emmc bootloader is out of sorts, you have to flip the emmc disable switch or pull the emmc in order to boot.
I put the pulled emmc on the pine store-supplied emmc-to-usb adapter, and tried it in a second PBP. The USB mass storage device lists as 0b capacity, and reports "Medium not present". testdisk isn't able to read it.
Looking at recovery options, there are at least four levels you can access these devices on. The highest level is a usb adapter, where the simple circuitry on the adapter presents it to the system as a generic block device. Next is putting it into an emmc socket, where it reports as /dev/mmcblkX, and you can manipulate it as an mmc. I'm going to see what I can do with that next. The ideal testbed for that is a single-board computer in an open enclosure, so you don't have to open your running PBP and flip the switch back on.
Then there's JTAG ? I still need to read about that.
And finally, there's reading the NAND directly. This is the best single thread I've found about it so far, including the linked pages and PDFs:
Which NAND flash reader ?
https://web.archive.org/web/202110171442...10&t=33785
It looks like tons of people lose their data on these every year, and data recovery is a solved problem. But, it's all proprietary. Third-party companies have developed the tools & software to read the raw NAND, and sell them at high prices. Sending your chip to commercial data recovery would be privacy suicide; there's no way to guarantee the company doesn't keep a copy, and some just make nand dumps and ftp them to other countries for outsourced processing without telling the customer. The data is usually all there, including deleted files, but most people don't get it recovered.
There should be an open source solution for emmcs widely used by the open source community, like these sandisk emmcs are now.
Steps,
- Acquire a bunch of emmcs for practice, document the process of exposing & connecting to the nand interface, develop a training curriculum like the commerical vendors have that affected users can follow at home to develop the hardware skills.
- Start with a commercial nand dumper. Eventually can be replaced with an open source device at lower cost.
- Develop the software & procedure for the specific combination of the Sandisk hardware & likely linux filesystems. Less work than having to support all mmcs from all manufacturers, and all operating systems.
Next to do for me: Try the mmcblk interface, then start looking at how much of step (3) has already been done.
All Pine devices should have socketed emmcs, not hard-soldered. A common failure mode is that a phone dies for some other reason, and the emmc is still good, but the user never gets it desoldered, so the data is lost anyway. Socketed emmcs are a major step forward compared to the usual way of doing it.
Most of the time, my system wasn't hitting the swap, but I was experimenting with different thread counts during builds to balance compile speed against the system running out of memory in bottlenecks. Just two threads could result in the system swapping massively when the make process was attempting to build two large source files concurrently. That may account for a lot of the reduced life expectancy. Nonetheless, properly designed solid state storage should remain readable in a read-only mode when it runs out of writes, and it appears that common emmcs do not.