Cap Error or Bad Memory
#1
Hi all for the past couple months I've been having some stability issues with my RockPro64 NAS. It would just seem to hang and require physical power cycle to get back online. About a week ago it seems to have given up the ghost.

This evening I hooked up an ftdi232 so I could look at the serial console as it attempts to boot, and I'm getting output similar to below pretty consistently, but with slight variations. I'm not so deep in the weeds with SBC init procedures to make too much sense of this, but "Cap error!" seems to indicate to me that there may be a faulty capacitor somewhere on my board, and the failure to read a given memory address seems to indicate that it may be a capacitor related to the ram chips. My question is, is there a good way for me to test and identify which if any capacitor may have gone bad so I can try to re-solder a new one on? If its the ram I'll resign myself to tossing the board as I don't have a reflow station, and the wife would kill me if I tried to use the oven to do that. I have a basic multimeter, and a cheapo oscilloscope from aliexpress if those tools may help drill into what I may need to replace here.

I'd really like to bring this little computer back to life if its not too much headache. This thing lasted me less than 2 years. For $80 I expected a lot better components and QC. I have an early 1st gen Raspberry Pi that's still chooching at 19 years old.

Code:
Version 1.19 20190305
In
channel 0
CS = 0
MR0=0x98
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x0
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 400MHz 0,1
channel 0
CS = 0
MR0=0x80
MR4=0x2
MR5=0x0
MR8=0x0
MR12=0x2
MR14=0x0
MR18=0x0
MR19=0x0
MR24=0x0
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 800MHz 1,0
Channel 0: LPDDR4,800MHz
W FF != R
Cap error!
Channel 1: LPDDR4,800MHz
Col error!!!
Cap error!
ERR

Code:
Version 1.19 20190305
In
channel 0
CS = 0
MR0=0x98
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 400MHz 0,1
channel 0
CS = 0
MR0=0x98
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 800MHz 1,0
Channel 0: LPDDR4,800MHz
Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=16 Size=2048MB
Channel 1: LPDDR4,800MHz
Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=16 Size=2048MB
256B stride
read addr 0x40008000 = 0x40010000
ERR

Code:
Version 1.19 20190305
In
channel 0
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x0
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x0
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 400MHz 0,1
channel 0
CS = 0
MR0=0x80
MR4=0x2
MR5=0x0
MR8=0x0
MR12=0x2
MR14=0x2
MR18=0x0
MR19=0x0
MR24=0x0
MR25=0x0
CS = 1
MR0=0x18
MR4=0x82
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x82
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 800MHz 1,0
Channel 0: LPDDR4,800MHz
W FF != R
Cap error!
Channel 1: LPDDR4,800MHz
Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=16 Size=2048MB
no stride
read addr 0x8000 = 0xFFFF8000
ERR

Code:
Version 1.19 20190305
In
channel 0
CS = 0
MR0=0x98
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x0
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 400MHz 0,1
channel 0
CS = 0
MR0=0x98
MR4=0x2
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x0
MR4=0x82
MR5=0x0
MR8=0x0
MR12=0x0
MR14=0x2
MR18=0x0
MR19=0x0
MR24=0x0
MR25=0x0
channel 1
CS = 0
MR0=0x98
MR4=0x82
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
CS = 1
MR0=0x18
MR4=0x3
MR5=0xFF
MR8=0x8
MR12=0x72
MR14=0x72
MR18=0x0
MR19=0x0
MR24=0x8
MR25=0x0
channel 0 training pass!
channel 1 training pass!
change freq to 800MHz 1,0
Channel 0: LPDDR4,800MHz
W FF != R
Cap error!
Channel 1: LPDDR4,800MHz
Bus Width=32 Col=10 Bank=8 Row=15/15 CS=2 Die Bus-Width=16 Size=2048MB
no stride
ch 0 ddrconfig = 0x101, ddrsize = 0x2020
pmugrf_os_reg[2] = 0x2AA1E000, stride = 0x18
OUT
U-Boot SPL board iq"Synchronous Abort" handler, esr 0x9600qSynchronous Abort" handler, esr 0x02000000ELR:     adac
LR:      4308
x 0: 0000000000000030 x 1: 00000000000ELR:     adac
LR:      4268
x 0: 0000000000000030 x 1: 00000000000"Synchronous Abort" handler, esr 0x96000"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x96000""Synchronous Abort" handler, esr 0x86000000
ELR:     17ff0000531f3e00
LR:      3ecc
x 0: 0000000"Synchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x86000qSynchronous Abort" handler, esr 0x02000000
ELR:     8de0
LR:      4308
x 0: 0000000000000030 x 1: 0000000000000000
x 2: 00000000ff1a0014 x 3: 0000000000000030
x 4: 00000000003f9df8 x 5: 0000000000000000
x 6: 0000000000000001 x 7: 07f822e0003fb310
x 8: 00000000ff8c1edc x 9: 0000000000000000
x10: 0000000000000000 x11: 000000000000000c
x12: 0000000080000000 x13: 000000000000000f
x14: 00000qSynchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x02000000ELR:     adac
LR:      3ecc
x 0: 000000000000000a x 1: 0000000000000000
x 2: 0000000000003e00 x 3: 0000000000000030
x 4: 0000000000003e00 x 5: 0000000000000000
x 6: 0000000000000001 x 7: 000000000000000"Synchronous Abort" handler, esr 0x02000000
ELR:     adac
LR:      4308
x 0: 0000000000000061 x 1: 0000000000""Synchronous Abort" handler, esr 0x0200000"Synchronous Abort" handler, esr 0x02000000ELR:     adac
"Synchronous Abort" han"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x0200qSynchronous Abort" handler, esr 0x02000000qSynchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handler, esr 0x02000000
"Synchronous Abort" handle"Synchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x02000000"Synchronous Abort" handler, esr 0x02000000"
  Reply
#2
You can use memtester to check for bad memory

sudo apt-get install memtester
sudo memtester 3700M

note: for 4GB use 3700M, 2GB use 1700M, 1GB use 700M
  Reply
#3
(04-05-2021, 02:18 PM)tllim Wrote: You can use memtester to check for bad memory

sudo apt-get install memtester
sudo memtester 3700M

note: for 4GB use 3700M, 2GB use 1700M, 1GB use 700M
I can't boot into an OS at all. These messages come up as the board is attempting to boot and once they come up the board seems to hang. I've tried many different OSes, SD cards, and the emmc module I have, and they all produce the same error. The only reason I've even been able to see the above logging is because I hooked up the FTDI-232 to watch the serial console output.
  Reply
#4
I'm not super familiar with the low level boot sequence of the rockchip or how DDR initialization works, but that looks like it's probably output from the Rockchip miniboot version of idbloader.bin. (loader_1)

loader_1 is in change of initializing memory, at which point execution returns to the bootrom embedded into the rockchip, which then loads loader_2.

With this failing during RAM initialization rather than at access to a specific location of memory, I'd guess the memory module has failed, rather than only a region of the memory being bad. Overall I doubt it's a capacitor on the board, as the CPU and the memory are directly connected, I don't see any capacitors involved there. All but the last log attached looked like a failure before returning to bootrom and loading loader_2.

You could write a SD card that contains an image that uses a u-boot TPL instead of miniloader during initialization (You may need to disable SPI flash if it's loading loader_1 from there https://wiki.pine64.org/index.php/ROCKPr...booting.29 ) and see what the output of that is, but I don't expect that would provide anything useful. Doing so could however provide a chance to possibly play around with adding some debugging statements to the code in u-boot at `drivers/ram/rockchip/sdram_rk3399.c`, and playing around with that in various ways if that's something you would find entertaining.
  Reply
#5
Yeah, I had tried disabling SPI using that trick. Same issue.

I don't have the bandwidth to go and write in custom debug logging and compile u-boot from scratch. I'd sooner trash the board and just wait until I see pine QC improve before they get any more of my money.

It seems that an inordinately high number of rockpro64 boards shipped with faulty memory which makes me concerned about the pinebook and pinephone which I was otherwise pretty hype for. This kind of turns me off of buying from pine until I can comb the forums for a given product and see a significantly higher ratio of posts about software errors versus these much more expensive hardware errors.

When I looked for errors similar to what I hit I found far too many posts about faulty memory than I expected, and many of them were just in the past few months.
  Reply
#6
I own multiple boards and have a similar experience with "rock64".
(I own "rockpro64: 3 units", and "rock64: 4 units")

At that time, I implemented the following two countermeasures.
1. Slow down the "ddr-memory" controller.
2. Increase voltage of "ddr-memory".

The effect of improvement was seen in both method.
I don't think there are many people who can do method "2" because it requires hardware modification.
However, method "1" is an easy method with only the standard "dd" command, so it is worth trying.
In fact, with regard to method "1", there are some reports that it has improved. (at "rock64")

Fortunately, none of my "rockpro64" showed such symptoms.
But it's no wonder there are solids with similar problems.


The following is a concrete application example when applying method "1" to "rockpro64".

# Write image to "SD-CARD".
Code:
xz -d -c ./stretch-openmediavault-rockpro64-0.9.14-1159-arm64.img.xz | dd of=/dev/mmcblk0 bs=1M status=progress

# Replace rk3399_ddr on "SD-CARD".
Code:
dd of=./ldr_666.bin if=/dev/zero bs=2K count=36
dd of=./ldr_666.bin if=./rk3399_ddr_666MHz_v1.19.bin bs=2K conv=notrunc
dd of=/dev/mmcblk0p1 if=./ldr_666.bin bs=2K seek=1

*Note)
"/dev/mmcblk0" or "/dev/mmcblk0p1" is a special node of "SD-CARD".
It depends on the operating OS environment.
Forexample,"dev/sda" and "/dev/sda1" or "/dev/sdc" and "/dev/sdc1" ... etc.

If you are lucky, it may improve.
---

Finally, I would like to point out one thing.
"Cap error!" means the capacity (size of the memory installed on the board), not the capacitor.

An error occurred when detecting the installed memory capacity => The installed memory capacity cannot be detected.
(example: 4GB / 2GB / 1GB)
  Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  RockPro64 has bad Memory (Software sogfaults and kernel panics) kop316 20 15,368 08-04-2021, 07:42 PM
Last Post: t4_4t
  Industrial memory card m.ekstrom 1 1,607 03-23-2021, 05:55 AM
Last Post: barray

Forum Jump:


Users browsing this thread: 1 Guest(s)