RockPro64 has bad Memory (Software sogfaults and kernel panics)
#11
(11-19-2020, 12:27 PM)LMM Wrote:
(11-19-2020, 12:36 AM)wildering Wrote: I cracked open the two other boards and ran memtest on them with no errors reported. I then transplanted that eMMC module onto the date code 5219 board and ran the test again. I was presented with a slew of errors. It's evident that, that ROCKPro64 (v2.1, 2018-07-02 5219) is also defective and will warrant an RMA.

I ran memtest with Debian an it seems ok. (v2.1, 2018-07-02). What is noticeable is the high temperature reached (70°C) in spite of a heatsink. Then I put it over a fan and it dropped below 50°C after 9 min

(11-19-2020, 12:27 PM)LMM Wrote:
(11-19-2020, 12:36 AM)wildering Wrote: I cracked open the two other boards and ran memtest on them with no errors reported. I then transplanted that eMMC module onto the date code 5219 board and ran the test again. I was presented with a slew of errors. It's evident that, that ROCKPro64 (v2.1, 2018-07-02 5219) is also defective and will warrant an RMA.

I ran memtest with Debian an it seems ok. (v2.1, 2018-07-02). What is noticeable is the high temperature reached (70°C) in spite of a heatsink. Then I put it over a fan and it dropped below 50°C after 9 min

I don't know if it is a good practice (and a good idea) but I cut the conductive pad in order to be able to double the layer on the DDR chip to make the contact with the heatsink. Otherwise it probably does not

I don’t see how doubling the conductive pad would hurt, but I imagine conductivity would be impacted. It shouldn’t be necessary though, as I’ve found a single layer makes good contact with the heat sync. Plus in my case I didn’t have the chance to do anything other than run apt update, which shouldn’t cause a temperature spike at all. Plus both other boards run at a cool 45-50 degrees under moderate load with the tall heat sync and no fan. 

The most likely cause of the segmentation faults we’ve been experiencing is defective chips from the manufacturer or them breaking during product assembly.
  Reply
#12
I started the RMA process using the Pine64 support portal at: https://support.pine64.org/ and submitting a ticket. Started it just after my last post was made and 4 hours later got a reply to start the process. Probably the quickest RMA reply I’ve ever had, 10/10.
  Reply
#13
(11-19-2020, 06:53 PM)wildering Wrote:
(11-19-2020, 12:27 PM)LMM Wrote:
(11-19-2020, 12:36 AM)wildering Wrote: I cracked open the two other boards and ran memtest on them with no errors reported. I then transplanted that eMMC module onto the date code 5219 board and ran the test again. I was presented with a slew of errors. It's evident that, that ROCKPro64 (v2.1, 2018-07-02 5219) is also defective and will warrant an RMA.

I ran memtest with Debian an it seems ok. (v2.1, 2018-07-02). What is noticeable is the high temperature reached (70°C) in spite of a heatsink. Then I put it over a fan and it dropped below 50°C after 9 min

(11-19-2020, 12:27 PM)LMM Wrote:
(11-19-2020, 12:36 AM)wildering Wrote: I cracked open the two other boards and ran memtest on them with no errors reported. I then transplanted that eMMC module onto the date code 5219 board and ran the test again. I was presented with a slew of errors. It's evident that, that ROCKPro64 (v2.1, 2018-07-02 5219) is also defective and will warrant an RMA.

I ran memtest with Debian an it seems ok. (v2.1, 2018-07-02). What is noticeable is the high temperature reached (70°C) in spite of a heatsink. Then I put it over a fan and it dropped below 50°C after 9 min

I don't know if it is a good practice (and a good idea) but I cut the conductive pad in order to be able to double the layer on the DDR chip to make the contact with the heatsink. Otherwise it probably does not

I don’t see how doubling the conductive pad would hurt, but I imagine conductivity would be impacted. It shouldn’t be necessary though, as I’ve found a single layer makes good contact with the heat sync. Plus in my case I didn’t have the chance to do anything other than run apt update, which shouldn’t cause a temperature spike at all. Plus both other boards run at a cool 45-50 degrees under moderate load with the tall heat sync and no fan. 

The most likely cause of the segmentation faults we’ve been experiencing is defective chips from the manufacturer or them breaking during product assembly.

I double the pad on the DDR chips because they are lower than the processor and the heatsink covers both.
You're right, running apt should not hurt !
  Reply
#14
I'm having a very similar experience with a RockPro64 4GB that I received a few weeks ago. I'm trying to decide whether to RMA the board or not. I'm not sure how to check the hardware version to compare with yours.

Initially I started with Armbian_20.08.1_Rockpro64_focal_current_5.8.6_desktop.img.xz from https://www.armbian.com/rockpro64/ and the desktop came up. But, "apt update" was getting segmentation faults and then showing what looks like memory corruption (characters in dependencies using non-ascii, etc)

Thinking that the desktop put more stress on the system, I then flashed Armbian_20.08.1_Rockpro64_focal_current_5.8.6.img.xz and booted. In this configuration, I was able to apt update, install, and run the memtester:
memtester 3G 1

It reported 549 errors similar to:
FAILURE: 0x00000000 != 0xa0000000000 at offset 0x47f44688.

Out of the reported errors, 520 have only 1 or 2 bits incorrect. This leads me to think that it's a hardware problem, since software typically overwrites whole bytes.

But I'm still considering the "Older firmware overwrites actively used memory" issue noted at https://wiki.pine64.org/wiki/ROCKPro64#H...ility_Page which was mentioned earlier in this thread.

From the instructions there and additional details at https://forum.pine64.org/showthread.php?tid=8174 I tried to build the bootloader and add it to the sdcard with the ubuntu focal server image, but the device didn't boot. I ordered the necessary hardware to debug with the serial console but it hasn't arrived yet.

If anyone knows for certain that a particular image doesn't have the "blob" firmware which can overwrite memory, I'd love to flash it and run memtester so that I could determine whether to RMA the board.
  Reply
#15
Hi!

I'm booting Debian stable, based on the official debian unstable installer image. It does not include Rockchip binary blobs in u-boot. I roughly outlined my approach here.

You could also try the u-boot version made by sigmaris, which also is build from mainline u-boot (without Rockchip blobs). Read the first post he made, there are links to emmc/SD-card version you could try, so no need to flash SPI to test this out. 

On my system I don't see any errors with memtest.
  Reply
#16
(11-25-2020, 04:42 AM)n4tter4ngell Wrote: I'm booting Debian stable, based on the official debian unstable installer image. It does not include Rockchip binary blobs in u-boot. I roughly outlined my approach here.

Thanks for this idea, I gather that you're following this:
https://www.kulesz.me/post/140-debian-de...4-install/

I found your comment detailing the differences in your procedure:
http://forum.pine64.org/showthread.php?t...1#pid82701

This looked promising, but fairly involved, so I decided to try your second option.

(11-25-2020, 04:42 AM)n4tter4ngell Wrote: You could also try the u-boot version made by sigmaris, which also is build from mainline u-boot (without Rockchip blobs). Read the first post he made, there are links to emmc/SD-card version you could try, so no need to flash SPI to test this out. 

Thanks for the pointer to this, seems like sigmaris has been doing some great work. I used dd to install mmc_idbloader.img and mmc_u-boot.itb from https://github.com/sigmaris/u-boot/relea...ckpro64-ci to my sdcard. It was really nice to see output from the bootloader on the display, and my armbian/debian server image booted fine with this bootloader.

In less than 5 min memtester has already produced many errors. Since this test has eliminated the blob overwriting memory as a potential cause, I'll proceed with the RMA process.

Thanks for your help!
  Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question RockPro64 Lithium Battery Port TitleOS 0 23 10 hours ago
Last Post: TitleOS
  Can RockPro64 be powered via usb-c? pawlinski 2 479 10-16-2020, 07:27 AM
Last Post: pawlinski
  Rockpro64 Sata Card kills itself jerry110 19 6,762 10-15-2020, 04:22 AM
Last Post: GreyLinux
  Cheap 4-port SATA card working with RockPro64 4Gb andyburn 6 1,652 08-10-2020, 08:36 PM
Last Post: zer0sig
  RockPro64 pcie Gpu Davidos 8 4,184 07-14-2020, 05:11 PM
Last Post: ProDigit
  Kernel panics and spinlock when RAID mirror mounted bryan 2 712 04-25-2020, 03:07 PM
Last Post: pgwipeout
  Rockpro64 PCI-Express Issue. t4_4t 24 6,000 04-24-2020, 11:27 AM
Last Post: pgwipeout
  rockpro 64 does not boot up automatically after software reboot rahulsharma 12 3,042 04-17-2020, 09:00 AM
Last Post: pgwipeout
  Questions about the ROCKPro64 1x1 Dual Band WIFI module thelabratrules 0 479 04-04-2020, 03:29 PM
Last Post: thelabratrules
  My RockPro64 died after 8 months diskers 18 3,959 03-10-2020, 08:43 PM
Last Post: pfeerick

Forum Jump:


Users browsing this thread: 1 Guest(s)