Crashes when trasferring large folders
#1
Hey! Since I have gotten my RockPro64 to use as a NAS, I have constantly experienced crashes when transferring large folders (mostly game files). A particular example which always causes crashes is my modded ARK install, spanning 160GB in 110000 files and 8000 folders.

When transferring said folder over rsync, both local or over the network, one of the next cases appears about 2-10 GB into the download:
  • rsync crashes and the download stops (this sometimes shows a message in the kmsg about hung processes)
  • the kernel throws an "Oops", which seems to be always different although I do not have the skills to debug it
  • the kernel just panics
  • the entire system just freezes up, with no panic or other indication visible on my HDMI monitor
  • the entire system just resets

When trying to rule down the problem, I found out that
  • it's not a filesystem problem, it occurs on  btrfs, ext4, zfs on linux and zfs on freebsd
  • it's not a hard drive problem, I tested two HDDs and one SSD
  • it's not a SATA controller problem, it occurs both on my PCIe card and on a USB adapter
  • it's not a network card problem, running iperf for long periods of time is fine
  • it's probably not a ram problem, I ran `memtest` with the highest value I could give it and I ran a MemTest86 test
  • it's not a linux distro (I was using Debian 11 and 12) problem or a linux kernel problem, the issues occur in a similar manner on FreeBSD
  • it's not a problem with my laptop's hard drive (or with the files, whatever that may be), I can transfer the folder onto different x86 machines

The issue mostly appears when transferring computer games. The ARK folder never completed transferring, but some smaller (20-40GB) games did (although inconsistently) when transferred at lower speeds, e.g. over the network or to/from an HDD. I also tried to tar the folder, both on my laptop before transferring over the network, and locally when plugging the drive to the rockpro64. The system still experienced crashes.
Interestingly, I also tried sending my laptops /dev/urandom over netcat to the NAS, which never crashed it, not even after reaching the 100GB mark and at high speeds.

Now, half a year after trying to use my rockpro64 as a NAS, I am truly clueless about the nature of this problem. I really don't know what to try to diagnose next.
Below you can find journald entries of the kernel oopses and hung processes. I don't seem to be able to upload pictures of FreeBSD kernel panics due to image size.
I hope that someone has a clue about what may be going on on my system.

Journal log
  Reply
#2
Do you have ZRam or anything similar in use? It could be due to a limitation of the amount of RAM on the RPro64.

You could also try adding the -z and/or -p options to your rsync command to allow compression during the transfer/allow resuming interrupted transfers. Do you mind sharing the flags you're using on your rsync command?
  Reply
#3
I've been using a 8GB swap partition on my SATA hard drive. Suspecting that swap on the maybe unstable drive or PCIe connection might be the problem, I also tried without it, with same results. With rsync, I have tried with the "-a" and "-a -p" options. Increasing rsync verbosity does not yield anything useful. I am afraid that rsync is not the main problem here, since the crashes also occur when using sftp, sending a tar file over netcat (and writing it to disk) and when creating a tar file on the rockpro64 between two different disks. I also just tested transferring the data onto a 64GB SD card, which also froze the system in a similar manner. The thing that really puzzles me about this is that sending /dev/urandom over netcat does not impact the system at all.
  Reply
#4
It might help to run `top -1`. If it is indeed a running-out-of-RAM or swap problem, you'll see it (if you can still see the "top" output when it crashes; running it over telnet might help).

Otherwise, it sounds silly, but what kind of power supply are you using?
:wq



[ SRA accepts you ]
  Reply
#5
I am very certain this is not a power problem. I am running a 60W power supply, which also runs SATA power. After ruling down the problem, I continued testing only with an SATA SSD, which uses only little power. I also just tested two other power supplies, including an ATX one, with the same results.
The transfer process also uses barely any memory and running htop over ssh did not show anything interesting during the crash.
Another interesting crash happened while testing today: after throwing an "Oops", the kmsg logged "rcu_sched detected stalls on CPUs/tasks: ... Task dump for CPU 1..."
See this picture for the full log.
Here, the system did not freeze up entirely but was not usable either.
  Reply
#6
Have you tried to set it to run slower?
cpufreq-info/set
  Reply
#7
(09-13-2023, 11:19 AM)wdt Wrote: Have you tried to set it to run slower?
cpufreq-info/set

Limiting the CPU frequency to 408MHz via cpufreq indeed fixed the issue! Thank you very much for the tip! Transferring the folder both using tar over netcat and rsync (with use of rsyncd) succeeded at reasonable speeds. It would be very helpful to know what is causing this behavior, so that I may be able to use my rockpro to its fullest potential someday.
  Reply
#8
>It would be very helpful to know what is causing this behavior

Some 'marginal' chips have slipped into the production line
Myself, it has a marginal power chip, if it doesn't get VERY clean power
the red led just blinks,, and I am ONLY talking 1V ripple
----
The red led is connected to 5V line, when it blinks you have 5V, 0v, 5V, 0v ...and so on
And obviously,,NOTHING happens,..... it's dead Jim
When I connected to bench PS it ran fine, down to 7V and a day at 13V
Nearly all bricks had too much ripple, about the 4th or 5th brick was OK
  Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)