07-16-2020, 02:55 PM
First of all a disclaimer; I haven't had a chance to test this on the latest version of Mobian yet.
This will be a quick post about an issue I've discovered but I don't have the time to investigate it (I'll be quite busy in the next few days). Hopefully my research still helps someone.
I managed to capture this in dmesg when it happened (I got lucky and had an SSH session going):
Searching various bug trackers I found this open but stagnant bug report:
https://gitlab.freedesktop.org/lima/linux/-/issues/33
So it looks like the GPU either stalls or falls out of sync with the lima driver for whatever reason. I do suspect it has something to do with power delivery because the same issue won't happen if I do this:
The most reliable way I've found to trigger the bug on my phone is do start downloading something big over LTE with wget. After somewhere between a few hundred MB and a GB you will notice the scrolling test going back and forth and you stop being able to interact with the phone through the touch screen.
At first I thought it was a hard crash and the power button only kept working through some weird quirk of interrupts offloaded from the main CPU, but as it turns out the rest of the system is fully functional. You can even recover most of the system (GUI tasks will be killed) by issuing this command:
This should bring back the lock screen as if the system was freshly booted, but any non-GUI tasks are still running as if nothing happened.
That's about as far as I've been able to work things out so far. I think the appropriate thing to do here is to test the hack from the lima issue tracker (raising the scheduling error threshold above 1) and see if it helps the driver towards a successful recover. I'd try it myself if I had a build environment set up. Since this is an upstream issue I didn't post this on the Mobian issue tracker, but in retrospect I probably should've done that as well.
This will be a quick post about an issue I've discovered but I don't have the time to investigate it (I'll be quite busy in the next few days). Hopefully my research still helps someone.
I managed to capture this in dmesg when it happened (I got lucky and had an SSH session going):
Code:
[ 7251.715549] [drm:lima_sched_timedout_job] *ERROR* lima job timeout
[ 7252.962247] lima 1c40000.gpu: fail to save task state from phoc pid 1663: error task list is full
[ 7252.971463] lima 1c40000.gpu: gp task error int_state=0 status=0
Searching various bug trackers I found this open but stagnant bug report:
https://gitlab.freedesktop.org/lima/linux/-/issues/33
So it looks like the GPU either stalls or falls out of sync with the lima driver for whatever reason. I do suspect it has something to do with power delivery because the same issue won't happen if I do this:
Code:
echo powersave | sudo tee /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
The most reliable way I've found to trigger the bug on my phone is do start downloading something big over LTE with wget. After somewhere between a few hundred MB and a GB you will notice the scrolling test going back and forth and you stop being able to interact with the phone through the touch screen.
At first I thought it was a hard crash and the power button only kept working through some weird quirk of interrupts offloaded from the main CPU, but as it turns out the rest of the system is fully functional. You can even recover most of the system (GUI tasks will be killed) by issuing this command:
Code:
sudo systemctl restart phosh.service
This should bring back the lock screen as if the system was freshly booted, but any non-GUI tasks are still running as if nothing happened.
That's about as far as I've been able to work things out so far. I think the appropriate thing to do here is to test the hack from the lima issue tracker (raising the scheduling error threshold above 1) and see if it helps the driver towards a successful recover. I'd try it myself if I had a build environment set up. Since this is an upstream issue I didn't post this on the Mobian issue tracker, but in retrospect I probably should've done that as well.