i have concluded at this point that this bug is in two parts, powersaving part and frequency change part. practically all gnu/linux distributions and plasma and phosh are affected.
powersaving part seems to be more straightforward and more rare. this can be avoided by following.
however, in my situtation, frequency bug happens only in 1.2a hardware version and not in 1.2b. so i don't rule out possible hardware problem in 1.2a version or my specfic individual phone.
if you tech savvy enough and you have pinephone regular. could you test frequency part of this bug and say your hardware version (1.2 1.2a 1.2b etc). you only set powersaving part, then run glxgears in user interfaces without auto suspend, sreenlocks and screen saving. then run following script in background.
Code:
while true
do echo 432000000 > /sys/class/devfreq/1c40000.gpu/min_freq
echo 432000000 > /sys/class/devfreq/1c40000.gpu/max_freq
echo 312000000 > /sys/class/devfreq/1c40000.gpu/max_freq
done
generic copy paste:
Code:
# udev rule. this prevent flipping frames bug
# example file location /lib/udev/rules.d/98-preventflippingbug.rules
# powersaving part
KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/autosuspend_delay_ms}="-1"
KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/control}="on"
# frequency part, may not be needed
KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{min_freq}="432000000"
KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{max_freq}="432000000"
now i have tested two 1.2a hw phones and both have frequency part of the bug. i start to think there is something fishy going on in hardware chips in case of frequency changes of gpu.
fedora f38 with "megi-kernel-pboot" 6.0.7 kernel does not crash for frequency changes, even in 1.2a. fedora does crashes for powersaving parts though (at least in the past).
i have tested (further) two pinephones of 1.2a version for frequency bug, using archlinux plasma, glxgears in u.i. and gpu frequency changing script.
older pinephone crashes only if i flip from 432MHz to 312MHz.
newer pinephone crashes both 423MHZ <> 312MHZ and 432MHZ <> 120MHZ. however not in 312MHZ <> 120MHZ.
(powersaving bug is different and both hw versions crash, 1.2a and 1.2b. it is also more difficult to replicate)
Code:
[root@danctnix ~]# pacman -Si mesa
Repository : extra
Name : mesa
Version : 22.3.3-1
Description : An open-source implementation of the OpenGL specification
Architecture : aarch64
URL : https://www.mesa3d.org/
Licenses : custom
Groups : None
Provides : mesa-libgl opengl-driver
Depends On : libdrm wayland libxxf86vm libxdamage libxshmfence libelf libomxil-bellagio libunwind llvm-libs lm_sensors libglvnd zstd vulkan-icd-loader libsensors.so=5-64 libexpat.so=1-64 libvulkan.so
Optional Deps : opengl-man-pages: for the OpenGL API man pages
mesa-vdpau: for accelerated video playback
libva-mesa-driver: for accelerated video playback
Conflicts With : mesa-libgl
Replaces : mesa-libgl
Download Size : 15.24 MiB
Installed Size : 59.51 MiB
Packager : Arch Linux ARM Build System <builder+seattle@archlinuxarm.org>
Build Date : Fri 13 Jan 2023 12:05:11 PM UTC
Validated By : MD5 Sum SHA-256 Sum Signature
[root@danctnix ~]# pacman -Si linux-megi
Repository : danctnix
Name : linux-megi
Version : 6.0.10-1
Description : The Linux Kernel and modules - Megous Kernel
Architecture : aarch64
URL : https://github.com/megous/linux
Licenses : GPL2
Groups : None
Provides : kernel26 linux=6.0.10
Depends On : coreutils kmod mkinitcpio>=0.7
Optional Deps : crda: to set the correct wireless channels of your country
Conflicts With : linux
Replaces : linux-pine64
Download Size : 165.98 MiB
Installed Size : 186.53 MiB
Packager : DanctNIX Build System <builder@main-key.danctnix.org>
Build Date : Fri 02 Dec 2022 03:50:38 PM UTC
Validated By : MD5 Sum SHA-256 Sum Signature
[root@danctnix ~]# uname -a
Linux danctnix 6.0.10-1-danctnix #1 SMP PREEMPT_DYNAMIC Fri Dec 2 15:51:28 UTC 2022 aarch64 GNU/Linux
[root@danctnix ~]#
some log
Code:
[ 222.596592] lima 1c40000.gpu: mmu page fault at 0x4d200c0 from bus id 0 of type read on ppmmu0
[ 222.605485] lima 1c40000.gpu: pp task error 0 int_state=0 status=5
[ 222.611743] lima 1c40000.gpu: pp task error 1 int_state=0 status=0
[ 222.618017] lima 1c40000.gpu: mmu resume
this is more difficult to replicate. what i know all pinephones are affected.
default settings are applied for gpu pwersaving. some kind of visible script is run on screen, which uses gpu somewhat, but in a way that gpu goes to powersaving for awhile. on a background, maybe ssh, one runs something which consumes networking and cpu power. mobile data and ethernet is more likely to reproduce this bug, but wifi is less likely. this may take hours.
Code:
# udev rule. this prevents frequency part only
# example file location /lib/udev/rules.d/98-test.rules
# reboot required
# powersaving part
#KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/autosuspend_delay_ms}="-1"
#KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/control}="on"
# frequency part, may not be needed
KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{min_freq}="432000000"
KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{max_freq}="432000000"
how to reproduce frequency bug:
only some pinephones are affected. fedora (dirty) kernel is/was not affected for frequency part).
isome settings for gpu needs to be initiated, because powersaving should be ruled out. something needs to be done a screen, maybe glxgears. on a background like ssh, frequency change script is run. crash probably happens in minutes.
Code:
# udev rule. this prevents powersaving part only
# example file location /lib/udev/rules.d/98-test.rules
# reboot required
# powersaving part
KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/autosuspend_delay_ms}="-1"
KERNEL=="1c40000.gpu", SUBSYSTEM=="platform", DRIVER=="lima", ATTR{power/control}="on"
# frequency part, may not be needed
#KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{min_freq}="432000000"
#KERNEL=="1c40000.gpu", SUBSYSTEM=="devfreq", ATTR{max_freq}="432000000"
Code:
# stupid bash script to forcibly change frequency at all time
# root or sudo required
# options are 432000000 312000000 120000000.
# you may need 432<>312 or 432<>120 or 312<>120
i put this thread for wider discussion about mali gpu and a driver for it. does mali chip have some deficiencies? has anyone some knowledge what's going on in lima driver? why some devices are affected and others aren't? is frequency change reliable?
02-15-2023, 09:54 AM (This post was last modified: 02-15-2023, 10:02 AM by e1337.)
I have this happen on 1.2b (Manjaro Edition 3GB), although I don't know what trigger (power saving or frequency change). It seems to happen when there's some sort of high CPU and moderate to high network use situation, e.g. it happens sometimes when I launch multiple apps at once with some already running that use the internet. Somehow, the combined activity of all the launches just makes it trip over something. Edit: actually I transferred a mainboard once but I forgot if it was for this one, so it could also be 1.2a (but not older), if someone knows how to check via software let me know
(02-15-2023, 09:54 AM)e1337 Wrote: I have this happen on 1.2b (Manjaro Edition 3GB), although I don't know what trigger (power saving or frequency change). It seems to happen when there's some sort of high CPU and moderate to high network use situation, e.g. it happens sometimes when I launch multiple apps at once with some already running that use the internet. Somehow, the combined activity of all the launches just makes it trip over something. Edit: actually I transferred a mainboard once but I forgot if it was for this one, so it could also be 1.2a (but not older), if someone knows how to check via software let me know
i cannot help much for checking hw version. one could use "lshw", but this command does not show sub versions, in this case, 1.2, 1.2a or 1.2b. (lshw probably needs to be installed first.) "journalctl -r" also works, if you search correct place near boot up, but this also is inaccurate.
i do not recoomend opening directly, but one number may provide clue for what version you have. number is above pogo pins. if it is just "C", i think mainboard is 1.2b. if there is a date, date probably indicates production date. picture is for 1.2a. however, i cannot accurately know what number means.
just for pre-emptive warning, if you or others post pictures, pixelize imei codes and similar.
I haven't faced this problem anymore with Manjaro Phosh Beta 29.
Or more precise (I think) after Manjaro updated to Mesa 22.3.5
This use to be nearly daily issue and now I have not seen it once for many many days - maybe over 20 days or something?
(03-02-2023, 09:31 AM)alaraajavamma Wrote: I haven't faced this problem anymore with Manjaro Phosh Beta 29.
Or more precise (I think) after Manjaro updated to Mesa 22.3.5
This use to be nearly daily issue and now I have not seen it once for many many days - maybe over 20 days or something?
if you refer to this bug https://gitlab.freedesktop.org/mesa/mesa/-/issues/8198 , i think, this is totally different issue. i encountered this bug as well, but it was already reported and fixed soon, i didn't put much attention to it.