Framebuffer apps work like:
Draw this pixel with this color at this location and send it directly to the display. For Window, or a entire display, you have to do that for every single pixel in your application. The GPU is not involved at all. For simple apps, with this can be fast, no doubt. When you want to start using graphical effects (like Bluring, or shadows or ???) then you have to do that all in your application (again, it would be using the CPU, not the GPU).
Xorg/Wayland Apps:
Draw a window of dimension X at position Y, with color Z and any Shadow/transparency/blur/graphical effect (they don't send individual pixels). Xorg/Wayland then takes that, and translates those "window" commands into graphical commands (openGL, GLESv2/v3) and sends to Mesa - Mesa translates the openGL/GLES data into instructions to send to the GPU, and the GPU actually draws the individual pixels directly to the display with the graphical effects. If mesa does not support your GPU (as is the case with mainline mesa and the Imagination GPU on Star64), then mesa does that all in software (called software rendering)
A better example would be moving a window:
Framebuffer Apps: Recalculate the pixel data for where the window was and where its going to, and send to the display. Some apps would be smart, and only do that for the pixels that are changed. Others will recalculate the entire display. It has to do that every time the window moves 1 pixel in any direction for the framerate of your monitor etc (so 60 times a second if you have a 60Hz monitor).
Xorg/Wayland: Move this window to this location - Xorg translates to OpenGL/GLESv2/v3 and sends to mesa, mesa sends the move command to the GPU and the GPU recalculates all the changed pixels and sends directly to the display. (not using the CPU at all).
How about transparency:
Framebuffer app: make a window 50% transparent - App has to look at the background image, and do the maths to make the forground image transparent and send the resultant pixel data to the display
Xorg/Wayland - make this window 50% -> xorg sends the window to mesa, mesa translates it to the GPU, and the GPU says "gotcha" and does the maths without using the CPU and sends the results to the display.
So why do we need mesa patches? - because every GPU has a different "instruction set" - think of arm versus x86 versus riscv. Mesa does the translation of OpenGL/GLES etc commands to the instruction set that the GPU understands.
and just in case your not sure - a framebuffer does not use the GPU at all. Its writing data directly to the display (HDMI or whatever port).
To finally address your statement that framebuffer is fast, but xwindows is slow - Yep. XWindows is doing a lot of work in CPU translating windows commands to pixel data instead of handing off to a supported GPU to do the actual drawing. And even a very minimal X windows desktop is using a lot of features a normal GPU would provide but has to do it all in software.
if you want a practical proof of the GPU "working" do this:
Load my desktop image (or kernel on armbian etc) and execute in a terminal window
"cat /sys/kernel/debug/pvr/status" and then do move some windows or play a video:
"Every 2.0s: cat /sys/kernel/debug/pvr/status star64: Fri Jun 9 04:49:23 2023
Driver Status: OK
Device ID: 0:128
Firmware Status: OK
Server Errors: 0
HWR Event Count: 0
CRR Event Count: 0
SLR Event Count: 0
WGP Error Count: 0
TRP Error Count: 0
FWF Event Count: 0
APM Event Count: 243
GPU Utilisation: 93%
"
Now do the same with your Framebuffer app, and not that the GPU utilization does not go above 0%.
Draw this pixel with this color at this location and send it directly to the display. For Window, or a entire display, you have to do that for every single pixel in your application. The GPU is not involved at all. For simple apps, with this can be fast, no doubt. When you want to start using graphical effects (like Bluring, or shadows or ???) then you have to do that all in your application (again, it would be using the CPU, not the GPU).
Xorg/Wayland Apps:
Draw a window of dimension X at position Y, with color Z and any Shadow/transparency/blur/graphical effect (they don't send individual pixels). Xorg/Wayland then takes that, and translates those "window" commands into graphical commands (openGL, GLESv2/v3) and sends to Mesa - Mesa translates the openGL/GLES data into instructions to send to the GPU, and the GPU actually draws the individual pixels directly to the display with the graphical effects. If mesa does not support your GPU (as is the case with mainline mesa and the Imagination GPU on Star64), then mesa does that all in software (called software rendering)
A better example would be moving a window:
Framebuffer Apps: Recalculate the pixel data for where the window was and where its going to, and send to the display. Some apps would be smart, and only do that for the pixels that are changed. Others will recalculate the entire display. It has to do that every time the window moves 1 pixel in any direction for the framerate of your monitor etc (so 60 times a second if you have a 60Hz monitor).
Xorg/Wayland: Move this window to this location - Xorg translates to OpenGL/GLESv2/v3 and sends to mesa, mesa sends the move command to the GPU and the GPU recalculates all the changed pixels and sends directly to the display. (not using the CPU at all).
How about transparency:
Framebuffer app: make a window 50% transparent - App has to look at the background image, and do the maths to make the forground image transparent and send the resultant pixel data to the display
Xorg/Wayland - make this window 50% -> xorg sends the window to mesa, mesa translates it to the GPU, and the GPU says "gotcha" and does the maths without using the CPU and sends the results to the display.
So why do we need mesa patches? - because every GPU has a different "instruction set" - think of arm versus x86 versus riscv. Mesa does the translation of OpenGL/GLES etc commands to the instruction set that the GPU understands.
and just in case your not sure - a framebuffer does not use the GPU at all. Its writing data directly to the display (HDMI or whatever port).
To finally address your statement that framebuffer is fast, but xwindows is slow - Yep. XWindows is doing a lot of work in CPU translating windows commands to pixel data instead of handing off to a supported GPU to do the actual drawing. And even a very minimal X windows desktop is using a lot of features a normal GPU would provide but has to do it all in software.
if you want a practical proof of the GPU "working" do this:
Load my desktop image (or kernel on armbian etc) and execute in a terminal window
"cat /sys/kernel/debug/pvr/status" and then do move some windows or play a video:
"Every 2.0s: cat /sys/kernel/debug/pvr/status star64: Fri Jun 9 04:49:23 2023
Driver Status: OK
Device ID: 0:128
Firmware Status: OK
Server Errors: 0
HWR Event Count: 0
CRR Event Count: 0
SLR Event Count: 0
WGP Error Count: 0
TRP Error Count: 0
FWF Event Count: 0
APM Event Count: 243
GPU Utilisation: 93%
"
Now do the same with your Framebuffer app, and not that the GPU utilization does not go above 0%.