RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - Djhg2000 - 05-23-2020
(05-23-2020, 03:20 PM)dukla2000 Wrote: (05-23-2020, 01:16 PM)Djhg2000 Wrote: ...
I couldn't get a log sample from a system with bad memory so the entire error handling section is made from reverse engineering of the Linux source code. If you have a kernel log from a system with bad memory I'd greatly appreciate if you could provide real error messages.
But how do I get memtest to execute on boot?
I added memtest=17 in /boot/boot.cmd
Code: $ cat /boot/boot.cmd
if test ${mmc_bootdev} -eq 0 ; then
echo "Booting from SD";
setenv linux_mmcdev 0;
else
echo "Booting from eMMC";
setenv linux_mmcdev 2;
fi;
setenv bootargs console=ttyS0,115200 no_console_suspend panic=10 consoleblank=0 loglevel=7 root=/dev/mmcblk${linux_mmcdev}p1 ro splash memtest=17 plymouth.ignore-serial-consoles vt.global_cursor_default=0
...
but clearly that isn't what is being used at boot
Code: $ uname -a
Linux DuklaPP 5.6-pinephone #5.6.0+pinephone6 SMP PREEMPT Fri May 15 23:20:03 CEST 2020 aarch64 GNU/Linux
chris@DuklaPP:~$ grep -i memtest /boot/config-$(uname -r)
CONFIG_MEMTEST=y
chris@DuklaPP:~$ ./mem.sh
No memtest output found in kernel log, wrong boot option?
$ dmesg | grep Kernel
[ 0.000000] Kernel command line: console=ttyS0,115200 no_console_suspend panic=10 consoleblank=0 loglevel=7 root=/dev/mmcblk2p1 ro splash plymouth.ignore-serial-consoles vt.global_cursor_default=0
...
I'm afraid you'll have to redirect that question to @devrtz , I haven't received my PinePhone yet so I've been testing what I can on my Debian system.
But I'll try my best to help out anyway; can you please check whether or not you have memtest support in your kernel? You should be able to do that by issuing "grep -i memtest /boot/config-$(uname -r)" from a terminal. One of five things will happen:
- It returns "CONFIG_MEMTEST=y"
- It returns "CONFIG_MEMTEST=n"
- It returns "#CONFIG_MEMTEST is not set"
- It returns something like "grep: /boot/config-5.4.0: No such file or directory"
- It doesn't return anything
If you get 1 the "memtest=17" cmdline gets lost somewhere, 2 or 3 it probably means memtest isn't enabled for your kernel. If you get 4 or 5 then we need to investigate more.
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - a-wai - 05-23-2020
(05-23-2020, 03:20 PM)dukla2000 Wrote: I added memtest=17 in /boot/boot.cmd
...
but clearly that isn't what is being used at boot
/boot/boot.cmd can be seen as the source code, and /boot/boot.scr the "compiled" version, which is the one actually used by u-boot (at least that's how it's done on Mobian, but afaik it's the same on other OS'es)
It can be generated using the following command:
Code: mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - dukla2000 - 05-23-2020
(05-23-2020, 03:52 PM)a-wai Wrote: ...
It can be generated using the following command:
Code: mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr
Thanks
With 600
Code: $ ./mem.sh
17 tests completed with these patterns:
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
Tested memory region 0x0000000040000000 to 0x00000000b7bfb000 (1915MB)
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
No errors found
Going to try 624
PS - no problems with 624. It runs for somewhere between 30 and 60 second during boot (extended period with white LED) which I guess is too short. Then again I am not sure if there is something intermittent on my phone! (Standard evasive guess when trying to debug and get something repeatable!!!). Just trying memtester on 1G at 624. then tomorrow will try memtest=256 or something.
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - Djhg2000 - 05-23-2020
(05-23-2020, 04:23 PM)dukla2000 Wrote: (05-23-2020, 03:52 PM)a-wai Wrote: ...
It can be generated using the following command:
Code: mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr
Thanks
With 600
Code: $ ./mem.sh
17 tests completed with these patterns:
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
Tested memory region 0x0000000040000000 to 0x00000000b7bfb000 (1915MB)
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
No errors found
Going to try 624
Nice to see my script is working
Let's just hope it still works after encountering an error, I think I managed to figure it out but I want to know for sure before I dare say it's working properly.
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - dukla2000 - 05-23-2020
(05-23-2020, 04:33 PM)Djhg2000 Wrote: ...
Nice to see my script is working
Let's just hope it still works after encountering an error, I think I managed to figure it out but I want to know for sure before I dare say it's working properly.
Thank you for the script.
OK, set memory speed to 624 and memtest=256. Ran for 13 minutes 45 seconds extra during boot with LED white. then got to clock screen, WiFi icon lit but phone was crashed before I could ssh in!
Need to figure out the simplest way to get my phone usable again from here!
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - Djhg2000 - 05-23-2020
(05-23-2020, 05:16 PM)dukla2000 Wrote: (05-23-2020, 04:33 PM)Djhg2000 Wrote: ...
Nice to see my script is working
Let's just hope it still works after encountering an error, I think I managed to figure it out but I want to know for sure before I dare say it's working properly.
Thank you for the script.
OK, set memory speed to 624 and memtest=256. Ran for 13 minutes 45 seconds extra during boot with LED white. then got to clock screen, WiFi icon lit but phone was crashed before I could ssh in!
Need to figure out the simplest way to get my phone usable again from here!
Happy to help!
I'm not sure "memtest=256" has well defined behavior, there's only 17 patterns defined in the struct: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L9-L25 . However, the pattern selection is done through modulo of the number of available patterns here: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L110 .
So while "memtest=256" will work (evidently), the benefit of looping through the same patterns over and over is unlikely to find new errors. On the other hand, when the memory fails due to overclocking I think the results will be less consistent because some cells will appear to both pass and fail randomly. It's an interesting experiment but without persistence across reboots I don't think it will yield much of a benefit?
Anyway, just to confirm, did changing "/boot/boot.cmd" and running "mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr" suffice to get it up and running?
Oh and I almost forgot; put your phone in the freezer to make it boot, the lower temperatures should increase the stability enough to SSH in and disable memtest. From there you can power off and let it cool down again before you flash U-boot with 600 MHz DRAM
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - dukla2000 - 05-24-2020
(05-23-2020, 09:46 PM)Djhg2000 Wrote: Anyway, just to confirm, did changing "/boot/boot.cmd" and running "mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr" suffice to get it up and running?
Yes
(05-23-2020, 09:46 PM)Djhg2000 Wrote: I'm not sure "memtest=256" has well defined behavior, there's only 17 patterns defined in the struct: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L9-L25 . However, the pattern selection is done through modulo of the number of available patterns here: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L110 .
So while "memtest=256" will work (evidently), the benefit of looping through the same patterns over and over is unlikely to find new errors. On the other hand, when the memory fails due to overclocking I think the results will be less consistent because some cells will appear to both pass and fail randomly. It's an interesting experiment but without persistence across reboots I don't think it will yield much of a benefit?
I think in line with the OP, memory instability, it is still a path worth walking. My overclocking days go back to getting 100MHz out of my Pentium 75 and yup the BraveHeart seems to be susceptible to thermal issues. I don't think we are dealing with "bad" memory, just as we degrade the environment it is in, it starts misbehaving. So running memtest for longer was my aim (and I can't read a line of C code , in my day it was FORTRAN & Assembler) - in the old days systems that were over-overclocked usually took a while to get them to show measurable stress.
(05-23-2020, 09:46 PM)Djhg2000 Wrote: Oh and I almost forgot; put your phone in the freezer to make it boot, the lower temperatures should increase the stability enough to SSH in and disable memtest. From there you can power off and let it cool down again before you flash U-boot with 600 MHz DRAM
Yeah had similar thought: Managed to get into ssh after a "cold" boot with memtest=256 and 624MHz memory.
Code: $ ./mem.sh
No memtest output found in kernel log, wrong boot option?
but
Code: $ dmesg
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 1111111111111111
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 1111111111111111
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 1111111111111111
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 1111111111111111
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 1111111111111111
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 5555555555555555
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 5555555555555555
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 5555555555555555
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 5555555555555555
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 5555555555555555
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 5555555555555555
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 5555555555555555
... { overall several hundred lines removed from here }
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 5555555555555555
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 5555555555555555
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern ffffffffffffffff
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern ffffffffffffffff
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern ffffffffffffffff
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern ffffffffffffffff
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 0000000000000000
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 0000000000000000
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 0000000000000000
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 0000000000000000
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 0000000000000000
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 0000000000000000
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 0000000000000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xb7bcb100-0xb7bccfff]
[ 0.000000] Zone ranges:
... { etc then "normal" boot chit-chat }
Not sure what this means. Very different from the first "No errors found" log I posted earlier but certainly not particularly coherent!
OK - Just worked out my problem above is that memtest=256 caused the boot log to "overflow" and lose the initial verbage.
I changed memtest=85 (runs for approx 5 mins and sadly despite 624MHz no errors) and get "nice/clean/expected" results:
Code: $ ./mem.sh
85 tests completed with these patterns:
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
Tested memory region 0x0000000040000000 to 0x00000000b7bfb000 (1915MB)
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
No errors found
and
Code: $ dmesg
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[ 0.000000] Linux version 5.6-pinephone (mobian@mobian) (gcc version 9.3.0 (Debian 9.3.0-11)) #5.6.0+pinephone6 SMP PREEMPT Fri May 15 23:20:03 CEST 2020
[ 0.000000] Machine model: PinePhone
[ 0.000000] Reserved memory: created CMA memory pool at 0x00000000b8000000, size 128 MiB
[ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000] early_memtest: # of tests: 85
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 4c494e5558726c7a
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern dddddddddddddddd
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern dddddddddddddddd
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern dddddddddddddddd
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern dddddddddddddddd
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern bbbbbbbbbbbbbbbb
etc
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - dukla2000 - 05-24-2020
OK, I appreciate memtester is a lousy tool (user space application as opposed to system level) but it is the best one I have. Each run on a 1G file takes about 30 minutes: in 4 runs/2 hours I get say 2 fails at 624MHz. For my Brave Heart this really is a transient problem but I am happier and more stable at 552MHz.
Code: $ sudo memtester 1G 4
[sudo] password for chris:
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 1024MB (1073741824 bytes)
got 1024MB (1073741824 bytes), trying mlock ...locked.
Loop 1/4:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
FAILURE: 0xfba98a3204294710 != 0xfba98a3200294710 at offset 0x0eb7a9f8.
Compare SUB : FAILURE: 0x3a6d34484de28c50 != 0xa48db679f9e28c50 at offset 0x0eb7a9f8.
Compare MUL : Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 2/4:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 3/4:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Loop 4/4:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
FAILURE: 0xbafcee1d0048f82a != 0xbafcee1d0448f82a at offset 0x08254d80.
Compare MUL : Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok
Done.
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - Djhg2000 - 05-24-2020
(05-24-2020, 03:37 AM)dukla2000 Wrote: (05-23-2020, 09:46 PM)Djhg2000 Wrote: Anyway, just to confirm, did changing "/boot/boot.cmd" and running "mkimage -T script -A arm64 -C none -n pinephone -d /boot/boot.cmd /boot/boot.scr" suffice to get it up and running?
Yes
(05-23-2020, 09:46 PM)Djhg2000 Wrote: I'm not sure "memtest=256" has well defined behavior, there's only 17 patterns defined in the struct: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L9-L25 . However, the pattern selection is done through modulo of the number of available patterns here: https://github.com/torvalds/linux/blob/master/mm/memtest.c#L110 .
So while "memtest=256" will work (evidently), the benefit of looping through the same patterns over and over is unlikely to find new errors. On the other hand, when the memory fails due to overclocking I think the results will be less consistent because some cells will appear to both pass and fail randomly. It's an interesting experiment but without persistence across reboots I don't think it will yield much of a benefit?
I think in line with the OP, memory instability, it is still a path worth walking. My overclocking days go back to getting 100MHz out of my Pentium 75 and yup the BraveHeart seems to be susceptible to thermal issues. I don't think we are dealing with "bad" memory, just as we degrade the environment it is in, it starts misbehaving. So running memtest for longer was my aim (and I can't read a line of C code , in my day it was FORTRAN & Assembler) - in the old days systems that were over-overclocked usually took a while to get them to show measurable stress.
(05-23-2020, 09:46 PM)Djhg2000 Wrote: Oh and I almost forgot; put your phone in the freezer to make it boot, the lower temperatures should increase the stability enough to SSH in and disable memtest. From there you can power off and let it cool down again before you flash U-boot with 600 MHz DRAM
Yeah had similar thought: Managed to get into ssh after a "cold" boot with memtest=256 and 624MHz memory.
Code: $ ./mem.sh
No memtest output found in kernel log, wrong boot option?
but
Code: $ dmesg
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 1111111111111111
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 1111111111111111
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 1111111111111111
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 1111111111111111
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 1111111111111111
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern aaaaaaaaaaaaaaaa
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 5555555555555555
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 5555555555555555
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 5555555555555555
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 5555555555555555
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 5555555555555555
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 5555555555555555
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 5555555555555555
... { overall several hundred lines removed from here }
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 5555555555555555
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 5555555555555555
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern ffffffffffffffff
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern ffffffffffffffff
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern ffffffffffffffff
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern ffffffffffffffff
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern ffffffffffffffff
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 0000000000000000
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 0000000000000000
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 0000000000000000
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 0000000000000000
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 0000000000000000
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 0000000000000000
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 0000000000000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xb7bcb100-0xb7bccfff]
[ 0.000000] Zone ranges:
... { etc then "normal" boot chit-chat }
Not sure what this means. Very different from the first "No errors found" log I posted earlier but certainly not particularly coherent!
OK - Just worked out my problem above is that memtest=256 caused the boot log to "overflow" and lose the initial verbage.
I changed memtest=85 (runs for approx 5 mins and sadly despite 624MHz no errors) and get "nice/clean/expected" results:
Code: $ ./mem.sh
85 tests completed with these patterns:
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
4c494e5558726c7a
eeeeeeeeeeeeeeee
dddddddddddddddd
bbbbbbbbbbbbbbbb
7777777777777777
cccccccccccccccc
9999999999999999
6666666666666666
3333333333333333
8888888888888888
4444444444444444
2222222222222222
1111111111111111
aaaaaaaaaaaaaaaa
5555555555555555
ffffffffffffffff
0000000000000000
Tested memory region 0x0000000040000000 to 0x00000000b7bfb000 (1915MB)
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
0x0000000040000000 - 0x0000000040080000 GOOD
0x000000004106f000 - 0x0000000049505000 GOOD
0x000000004950e080 - 0x0000000049511000 GOOD
0x000000004a000000 - 0x00000000b7bcd330 GOOD
0x00000000b7bcd361 - 0x00000000b7bcd368 GOOD
0x00000000b7bcd397 - 0x00000000b7bcd398 GOOD
0x00000000b7bfaffc - 0x00000000b7bfb000 GOOD
No errors found
and
Code: $ dmesg
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[ 0.000000] Linux version 5.6-pinephone (mobian@mobian) (gcc version 9.3.0 (Debian 9.3.0-11)) #5.6.0+pinephone6 SMP PREEMPT Fri May 15 23:20:03 CEST 2020
[ 0.000000] Machine model: PinePhone
[ 0.000000] Reserved memory: created CMA memory pool at 0x00000000b8000000, size 128 MiB
[ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000] early_memtest: # of tests: 85
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern 4c494e5558726c7a
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern 4c494e5558726c7a
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern 4c494e5558726c7a
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern eeeeeeeeeeeeeeee
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern dddddddddddddddd
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern dddddddddddddddd
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern dddddddddddddddd
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern dddddddddddddddd
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern dddddddddddddddd
[ 0.000000] 0x0000000040000000 - 0x0000000040080000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004106f000 - 0x0000000049505000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004950e080 - 0x0000000049511000 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x000000004a000000 - 0x00000000b7bcd330 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bcd361 - 0x00000000b7bcd368 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bcd397 - 0x00000000b7bcd398 pattern bbbbbbbbbbbbbbbb
[ 0.000000] 0x00000000b7bfaffc - 0x00000000b7bfb000 pattern bbbbbbbbbbbbbbbb
etc
Apparently I'm too tired to figure out how to do inline quotes so I'll try to answer all of the above down here instead.
Thanks for confirming, this will help me a lot in getting started with my PinePhone when it arrives. You probably mentioned this somewhere but which OS is your phone currently running? My aim is to support all of the popular ones within reasonable effort.
If you have a serial port adapter for the headphone jack you could enable a serial console. That way you get a working console immediately instead of waiting for it to boot and get network access. As per https://bloggerbust.ca/post/my-first-experience-connecting-to-the-phinephone-via-serial-console/ it should work right out of the box with your favorite terminal application (for simple things like this I'd use "screen /dev/ttyUSB0 115200" on the computer end). I'd speculate you can probably even see the rest running in real time.
Correct, the script is looking for a header that says "early_memtest" and keeps reading until it finds a line without the indented output. Without the header it doesn't know what to do and just assumes a memtest didn't happen.
But the list of patterns should never exceed 17 patterns (or however many unique patterns are in the source file). That's a bug. I'll see what I can do to fix it tomorrow, should be fairly easy but I need some sleep first.
Oh and if you do manage to get a log with errors I'd really appreciate a copy of the output. From the source it looks like it can blacklist memory sections in much smaller chunks than it prints during testing. Getting more than one error between two regular lines would be hitting the jackpot in that case.
RE: [Volunteer needed]Too high DRAM clock speed MAY be causing you random crashes/freezes - devrtz - 05-25-2020
Thanks everyone for your work!
Ok, so I have enabled memtest and booted the 624 version (which made the phosh session crash within 1 or so after booting).
Unfortunately memtest didn't really help. From the .sh script and by looking at dmesg, everything seemed to be in order.
In fact only looking at the user journal (journalctl --user) did reveal anything was going wrong (phosh segfaulting).
I have attached some of the logs.
*{good,bad}mem.log are the results from running mem.sh.
*_dmesg.log are the dmesg logs
*_journalctl_user.log are journalctl --user logs
*_phosh.log are the journalctl --user logs grepped for phosh
Not quite sure what to investigate next.
Edit: Seems like I cant attach the log files. Oh well:
https://fortysixandtwo.eu/upload/576_dmesg.log
https://fortysixandtwo.eu/upload/576_goodmem.log
https://fortysixandtwo.eu/upload/576_journalctl_user.log
https://fortysixandtwo.eu/upload/576_phosh.log
https://fortysixandtwo.eu/upload/624_dmesg.log
https://fortysixandtwo.eu/upload/624_badmem.log
https://fortysixandtwo.eu/upload/624_journalctl_user.log
https://fortysixandtwo.eu/upload/624_phosh.log
|