rock64, compile problems "illegal instruction", "memory fault" -> ddr_333Mhz?
#1
Hello,
  since owning my rock64 (4GB version)  I had always problems compiling larger projects on the device itself.
I tried many things like an other ac-adapter, different micro-sdcards, compiling on USB, different host-images, compiling only on two cores than
one core, gcc-5, 6, 7, swap on,  nothing helped.
The compiler always runs in errors like "illegal instruction" or "memory fault" that were temporal as they disappeared at the next
try.
After playing around with the rockchip boot flow on the rockpro64 I got the idea to check the driver for the external memory interface.
All host-images I tried on the rock64 use the same driverfile "rk3328_ddr_786MHz_v1.13.bin", so I gave the rock64 an other try with
rk3328_ddr_333MHz_v1.13.bin at offset 0x8800 on the boot device.
After reboot I compiled the ayufan-kernel on the device itself with 4 cores on internal sdcard. Worked, no error like before.
Yesterday I took an external USB-HD with the source of palemoon compiled from USB, wrote the objectfiles to the internal sdcard, 4 cores, swap on. It took the whole night, but the build completed successfully. No error like before.

Can someone tell me the negative implications when using the 333MHz-version of this driver?

TIA,
hunderteins
#2
Hello.

I think that DDR initialization before uboot can be modified later in linux with DMC+DFI. There are enabled DMC+DFI in (some) ayufan builds.
If enabled you can dynamically set "DDR" speed (set governor,min,max...). I have stability issues with 1066000000 (4K60HDR decoding). Maybe "rk3328_ddr_333MHz_v1.13.bin" allows only lower frequencies (post output "cat /sys/class/devfreq/dmc/available_frequencies").

Code:
# grep '' /sys/class/devfreq/dmc/*
/sys/class/devfreq/dmc/available_frequencies:786000000 800000000 850000000 933000000 1066000000
/sys/class/devfreq/dmc/available_governors:dmc_ondemand userspace powersave performance simple_ondemand
/sys/class/devfreq/dmc/cur_freq:786000000
/sys/class/devfreq/dmc/governor:dmc_ondemand
/sys/class/devfreq/dmc/max_freq:1066000000
/sys/class/devfreq/dmc/min_freq:786000000
/sys/class/devfreq/dmc/polling_interval:50
/sys/class/devfreq/dmc/system_status:0x401
/sys/class/devfreq/dmc/target_freq:786000000
/sys/class/devfreq/dmc/trans_stat:   From  :   To
/sys/class/devfreq/dmc/trans_stat:         :7860000008000000008500000009330000001066000000   time(ms)
/sys/class/devfreq/dmc/trans_stat:*786000000:       0       0       0       0       0     97457
/sys/class/devfreq/dmc/trans_stat: 800000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 850000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 933000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 1066000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat:Total transition : 0

# echo 933000000 > /sys/class/devfreq/dmc/max_freq
I left this community in Aug 2019 due to PINE64 refusal to produce/deliver ROCK64-1G version 3 after more than one year of changing statuses to "planning", "evaluating", "releasing", "availability", "estimated availability" and finally "no schedule" Angry. ROCK64 is dead platform without any advantage. Buy Raspberry PI 4 !
Away
#3
Thank you for your answer, but it shows basically the same. Problem is, when DMC is enabled, I get kernel oopses even when login with dropbear.

[quote pid='45731' dateline='1555417705']
$ grep '' /sys/class/devfreq/dmc/*
/sys/class/devfreq/dmc/available_frequencies:786000000 800000000 850000000 933000000 1066000000
/sys/class/devfreq/dmc/available_governors:dmc_ondemand powersave simple_ondemand
/sys/class/devfreq/dmc/cur_freq:786000000
/sys/class/devfreq/dmc/governor:dmc_ondemand
/sys/class/devfreq/dmc/max_freq:1066000000
/sys/class/devfreq/dmc/min_freq:786000000
/sys/class/devfreq/dmc/polling_interval:50
/sys/class/devfreq/dmc/system_status:0x1
/sys/class/devfreq/dmc/target_freq:786000000
/sys/class/devfreq/dmc/trans_stat:   From  :   To
/sys/class/devfreq/dmc/trans_stat:         :7860000008000000008500000009330000001066000000   time(ms)
/sys/class/devfreq/dmc/trans_stat:*786000000:       0       0       0       0       0     45925
/sys/class/devfreq/dmc/trans_stat: 800000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 850000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 933000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat: 1066000000:       0       0       0       0       0         0
/sys/class/devfreq/dmc/trans_stat:Total transition : 0

[/quote]

I think these frequencies are set in arch/arm64/boot/dts/rockchip/rk3328.dtsi as dmc_opp_table. And 768Mhz is just to much.
I'll try my luck with 400Mhz and 600Mhz. It seems a trade off between stability and memory bandwidth.

Funny thing is, on my rockpro64 the target_freq is 400Mhz.

TIA,
hunderteins
#4
Hello,

  400 and 600 Mhz are not working with dfi/dmc in rk3328-rock64.dts.
Patching  them into rk3328_ddr_333MHz_v1.13.bin  will give me some obscure 928 Mhz frequence,
that won't reboot stable.

786 Mhz (sic!) boots, but will give me segfaults or illegal instruction when compiling.

333Mhz is shown in grep '' /sys/class/devfreq/dmc/* as 332000000 and is stable compiling.

But I found a negative implication:  Playing hevc in 720p forces framedropping in mpv.

So either compiling stable or viewing fluid videos.
#5
Year ago there was 400/600 in DMC (you can try older kernels).

Code:
# cat /sys/class/devfreq/dmc/available_frequencies
400000000 600000000 786000000 800000000 850000000 933000000 1066000000

Never mind. I suppose that you have some fault in your PCB (like "PCB delamination"), memory/rk3328 chip and/or BGA soldering problems. Memory bus is very sensitive to impedance quality. Do you have another RK64 to compare results ? What do you compile (something equals like linux kernel?) ? I have only 1GB versions to test.
I left this community in Aug 2019 due to PINE64 refusal to produce/deliver ROCK64-1G version 3 after more than one year of changing statuses to "planning", "evaluating", "releasing", "availability", "estimated availability" and finally "no schedule" Angry. ROCK64 is dead platform without any advantage. Buy Raspberry PI 4 !
Away
#6
Thank you for this thread. Changing the boot image to 333Mhz fixed the "internal compiler error" that was sporadically happening when building software in parallel. This is on Rock64 ver2.

For Arch Linux ARM (boots from SD card, not eMMC), this is the concrete steps:


Get the original image:



Code:
wget http://os.archlinuxarm.org/os/rockchip/boot/rock64/idbloader.img


md5sum idbloader.img
a903e86cc8fa81ae0f0e79915c7dc758  idbloader.img

This image contains rk3328_ddr_786MHz v1.06 and rk3328_miniloader v2.43 and one other image at the front, which I couldn't identify. Next, replace both the ddr init image (to change frequency to 333MHz and to update to latest version) and the miniloader (just to update to latest).

Download these files from ayufan Github: https://github.com/ayufan-rock64/rkbin/tree/master/rk33
a3e3ac380f794d50b06bbd76258b982d  rk3328_ddr_333MHz_v1.13.bin
79bfbe6ba1cde99372685a4be273994b  rk3328_miniloader_v2.46.bin


Replace the subimages in the image:


Code:
dd if=rk3328_ddr_333MHz_v1.13.bin seek=$((0x800)) conv=notrunc of=idbloader.img
dd if=rk3328_miniloader_v2.46.bin seek=$((0x6800)) conv=notrunc of=idbloader.img


Write the modified image to SD card (replace X with your device name):

Code:
 sudo dd if=idbloader.img of=/dev/sdX seek=64 conv=notrunc

Also attaching this final image to this post.


Attached Files
.gz   idbloader-ddr-333-mhz-v1.13-miniloader-v2.46.img.gz (Size: 40.31 KB / Downloads: 621)
#7
Glad I found this thread and thanks.    I've been suffering this segfault issue since I got the board r64v2 1GB few weeks ago.   I build a lot from source and it's not compiling right using even one core.   I'm booting from SPI in ram from ayufan's repo Aug 2019.    Not sure how to edit the spi or his image but I have to fig it out somehow.  Its not really useable as it sits for much since it's really crippled on stock Arch.    Maybe there's a distro that has this already in their image?  Not that the procedure looks too hard but for others that may come along.   Some easy edit too maybe vs replacement of idbloader.img?

UPDATE: so I used the command grep '' /sys/class/devfreq/dmc/* to find the freq of ayufan's latest rock64 build. It was set to 786000000 with no others available. Set up the rock sdcard with build-essential etc and compiled/tested ninja pkg. Works! Never did on arch or manjaro. Guess he knows the tweaks. Had me stumped and ready to quit on this board. Tough lesson. See if it lasts now.
#8
Still researching this issue.   Manjaro dev says it's a TF-A (trustedfirmware) issue so its not a change they would implement.   Getting rock64 v2 to compile is really a difficult process.
#9
I've experienced this as well on my ROCK64 (v2, 4 GB): Often either gcc or Linux itself will crash with an "Illegal instruction" or undefined-instruction error during lengthy builds. Here's a typical kernel dump, to help others searching for a solution:

Code:
[ 2437.611193] kernel BUG at arch/arm64/kernel/traps.c:405!
[ 2437.611663] Internal error: Oops - BUG: 0 [#1] SMP
[ 2437.612084] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 libarc4 crct10dif_ce
[ 2437.612933] CPU: 0 PID: 1044 Comm: kworker/0:0H Not tainted 5.4.39-gnu #1
[ 2437.613527] Hardware name: Pine64 Rock64 (DT)
[ 2437.613921] Workqueue:  0x0 (kblockd)
[ 2437.614250] pstate: 00000085 (nzcv daIf -PAN -UAO)
[ 2437.614679] pc : do_undefinstr+0x2a0/0x348
[ 2437.615041] lr : do_undefinstr+0x13c/0x348
[ 2437.615401] sp : ffff80001155bba0
[ 2437.615694] x29: ffff80001155bba0 x28: ffff0000f6879880
[ 2437.616161] x27: ffff0000d602d900 x26: ffff0000f6879880
[ 2437.616627] x25: ffff800010783050 x24: 0000000000000000
[ 2437.617092] x23: 0000000000000085 x22: ffff8000100dcdfc
[ 2437.617559] x21: ffff80001155bd40 x20: ffff80001155bc00
[ 2437.618025] x19: ffff800010af8000 x18: 0000000000000000
[ 2437.618491] x17: 0000000000000000 x16: ffffffffffcfffff
[ 2437.618956] x15: ffffffffffffffff x14: ffff0000fa1fe380
[ 2437.619423] x13: ffff0000ce340000 x12: 0000000000000002
[ 2437.619887] x11: 0000000000000001 x10: ffff0000fa1fe340
[ 2437.620352] x9 : 0000000000000000 x8 : ffff0000fc9a7ab0
[ 2437.620817] x7 : ffff0000f7e00800 x6 : ffff80001155bbf8
[ 2437.621282] x5 : ffff800010b68100 x4 : 0000000000000000
[ 2437.621748] x3 : 00000000d5300000 x2 : ffff800010b01608
[ 2437.622214] x1 : ffff800010b68100 x0 : 0000000000000085
[ 2437.622680] Call trace:
[ 2437.622900]  do_undefinstr+0x2a0/0x348
[ 2437.623235]  el1_undef+0x10/0x84
[ 2437.623528]  deactivate_task+0x5c/0xa8
[ 2437.623865]  __schedule+0x2e8/0x4d0
[ 2437.624176]  schedule+0x30/0xa8
[ 2437.624458]  worker_thread+0xe0/0x4e0
[ 2437.624786]  kthread+0x124/0x128
[ 2437.625076]  ret_from_fork+0x10/0x18
[ 2437.625398] Code: f94013b5 17ffffef a9025bb5 f9001bb7 (d4210000)
[ 2437.625935] ---[ end trace 91edd19288ede6cb ]---

I have a hunch this is due simply to the memory chip getting too hot, and not because it can't run reliably at higher speeds. In addition to the heat it generates itself, the memory chip's position right next to the SoC means it is likely absorbing heat radiated by the CPU cores. The official aluminum case may even aggravate this situation, as I suspect the extra-wide heat pipe intended to wick heat away from both chips may actually conduct it from one to the other at times.

The situation is even worse on boards like mine with the SpecTek memory chip, as its datasheet shows it rated for reliable use up to only 70 degrees Celsius, unlike the other parts that are rated for use up to 85. Meanwhile, the RK3328 can reach temperatures of 90 degrees or more under continuous heavy load.

A bit of experimentation supports my theory: While monitoring the SoC temperature (using "watch cat /sys/class/thermal/thermal_zone0/temp") during lengthy builds, I have yet to see a crash when the temperature stays below 70 degrees. Once it rises above that threshold, though, a crash often happens within minutes.

So what can be done? One solution would be to switch to active cooling by installing a fan. (In some cases, installing a separate heatsink on the memory chip may be enough.) Another is to limit the clock rate of the memory chip and/or the CPUs, as suggested above.

For systems running Linux, a third approach would be to adjust the trip points of the thermal-management driver so it manages the CPU clock rate more aggressively, with the goal of keeping the SoC temperature below 70 degrees for as long as possible (without impacting the machine's performance under normal use). Here's the existing configuration, from the RK3328 device tree:

Code:
trips {
        threshold: trip-point0 {
                temperature = <70000>;
                hysteresis = <2000>;
                type = "passive";
        };
        target: trip-point1 {
                temperature = <85000>;
                hysteresis = <2000>;
                type = "passive";
        };
        soc_crit: soc-crit {
                temperature = <95000>;
                hysteresis = <2000>;
                type = "critical";
        };
};

It appears that by default the driver takes no action at all until the temperature reaches 70 degrees, by which point the system may already be heading for a crash. My guess is that reducing the first two temperature values to (say) 60000 and 70000 will help greatly in keeping ROCK64s with SpecTek memory stable under load. (To my knowledge, the thermal driver does not allow these trip points to be adjusted at runtime.) I'll be experimenting with this to see what sort of difference it makes.
#10
Thread below this by ejolson is more likely the culprit. Granted cooling is important for stable operation ofc.


Possibly Related Threads…
Thread Author Replies Views Last Post
  Rock64 No Audio @ Debian 12 dmitrymyadzelets 2 1,115 04-08-2024, 06:47 AM
Last Post: dmitrymyadzelets
  OpenWRT on the Rock64 CanadianBacon 14 10,791 04-03-2024, 08:48 AM
Last Post: helpmerock
  Rock64 bricked shawwwn 7 7,020 03-17-2024, 12:22 PM
Last Post: dmitrymyadzelets
  Rock64 won't boot luminosity7 10 6,020 03-16-2024, 08:33 AM
Last Post: dmitrymyadzelets
  Rock64 doesn't boot dstallmo 1 742 03-16-2024, 08:29 AM
Last Post: dmitrymyadzelets
  How well does Rock64 deal with HDR and Atmos on Kodi? drvlikhell 3 2,709 04-29-2023, 04:24 AM
Last Post: newestssd
  Rock64 board not working, no HDMI no Ethernet. EDited 3 4,286 01-17-2023, 02:31 PM
Last Post: Flagtrax
  ROCK64 v3 can it boot from USB? Tsagualsa 4 2,969 11-29-2022, 11:31 AM
Last Post: Macgyver
  rock64 v3 spiflash Macgyver 0 1,058 11-28-2022, 02:18 PM
Last Post: Macgyver
  my rock64 dosen't work rookie_267 0 1,251 10-07-2022, 07:50 PM
Last Post: rookie_267

Forum Jump:


Users browsing this thread: 1 Guest(s)