Last two test runs: http://openbenchmarking.org/result/16031...603116GA70
Please be aware that results can not be compared directly. "Pine64+ take 2" and "Pine64+ take4" are from Michael Larabel and completely irrelevant since he used old thermal/throttling settings and we do know nothing about throttling behavior in his setup.
You can use the last two results as a relative comparison how good heatsink vs. fan behave when it's about limiting throttling and compare with the "Pine64+ ARMv8 -O3" results (same code optimisation level but the "Pine64+ ARMv8 -O3" run with 1344MHz scaling_max_cpufreq, small heatsink and fan showing that you can prevent throttling even at higher clockspeeds mostly).
The results labeled "Pine64+ in enclosure", "Pine64+ enclosure+heatsink", "Pine64+ encl/heatsink/cpufreq" can also be compared directly (same/no code optimisations) and show clearly that mounting a heatsink when trying to jail the Pine64+ in a small enclosure helps with performance and that the little software tweak to allow a few more cpufreq steps improved performance also a lot just by establishing better throttling behavior that helps A64 stay at higher clockspeeds more often.
In the meantime when trying out the last test in the results above my Pine64+ always powered off for no apparent reason. I thought maybe jumping between different cpufreq operating points might overburden PSU/DC-IN and therefore chose to power the board through the Euler connector.
But to no avail. Since looking at the graphs I noticed that the board always died after heavy switching frequencies/voltages I thought maybe adding a heatsink to the AXP803 PMIC chip would help (I know it from A20's companion AXP209 that it can get quite hot and contains overtemperature protection the hard way -- maybe it's the same with AXP803 again). At least after adding the heatsink I could run the last test without problems (and the very same board already survived the tests running at 1344 MHz but with less throttling)
Pine64+ just with a fan (directly on top):
And now only a heatsink (the heatsinks on DRAM and PMIC aren't performance relevant, the latter maybe for stability -- to be confirmed)
What can we learn from that?
1) Thermal/throttling settings are responsible for high performance (true for every modern SoC -- ignored by many/most benchmarks especially the more popular ones). Using single threaded workloads most often won't show throttling effects which has to be considered.
2) Adding a few more dvfs (dynamic voltage frequency scaling) operating points allows the throttle driver to adjust clockspeeds more fine graded which helps improving performance a lot
3) To push the envelope you would have to improve heat dissipation and take thermal conditions into account (benchmarking in the morning when ambient temperature is a few degrees lower might result in 10% better scores -- keep that in mind)
4) When it's about to choose between fan and heatsink, the choice is obvious: heatsink wins. A fan does only help when combined with a heatsink. And when the fan blows just somewhere around it's only annoying and doesn't help at all (if only enclosure makers would notice!)
5) If you plan to run heavy stuff on your board be prepared to switch from the Micro USB connector to the Euler connector for DC-IN. You can feed 5V through Euler pins 2 and 4 and can connect GND to Euler pins 6, 9 and 14.
6) In case you experience sudden power-offs (green led also immediately off) think about adding a heatsink to the PMIC also (unconfirmed at the moment and more of a guess than a recommendation)
7) We should keep in mind that benchmark scores that differ by less than 20% should be interpreted as being identical when it's about normal use cases (if you want to do number crunching then it's a different story but then you chose the wrong device anyway)
8) We should also keep in mind that benchmarking irrelevant stuff is just that: irrelevant. When you want to use your device to watch videos then it's more important whether HW acceleration for the video codecs you're interested in is availble than how slow/fast the CPU might be able to calculate prime numbers (even worse with unoptimised code as it happens all the times)
9) Take benchmark results that do not take care of throttling with a grain of salt since they are misleading
10) Take benchmark results that do not make use of optimised code with a grain of salt since they are even more misleading (you got that ARMv8 thingie since you wanted to benefit from faster software, right? A benchmark that disables code optimisations like PTS' Smallpt as a prominent example is rather useless since it shows irrelevant performance scores)
11) Take every benchmark result for the Pine64+ that will be published the next few weeks with a grain of salt since settings aren't ready yet.
What does "settings" mean? Maybe we might improve the throttling strategies for single threaded workloads in the next time. Then 'real world performance and also single threaded benchmark's scores will automagically improve by 10%-30%.
Regarding this dvfs stuff: The higher the so called VDD_CPUX voltage is set (that's the core voltage the CPU cores are fed with) the hotter the SoC gets and in case CPU/GPU cores are busy throttling will jump in earlier. So you define dvfs operating points always in a way to reduce them to a reasonable minimum. This process has not even started yet. We currently rely on Allwinner's defaults and no one had a look into it how low can you go (again: If we're able to reduce the voltages all a bit with some safety headroom then the SoC will remain cooler, throttling will happen later and performance increases automagically -- but this whole process is time consuming and needs a lot of boards/users to join in)
And regarding settings I've to add that the most important setting regarding stupid 'fire and forget' benchmarks the Phoronix style we already changed is the default Allwinner behaviour. They ship their BSP with rather strange settings where CPU cores were killed instead of let throttling do the job. We already changed that to sane values recently. Therefore using the settings from last week you might end up with a Pine64+ only running on 1 or 2 cores and it should be obvious how this influences benchmarks (that's the main reason Orange Pi PC/Plus are listed on Phoronix that slow since that happened back then too and the tester didn't take notice)
BTW: It should also be obvious that you can't do proper benchmarking without monitoring the system your tests run on (see the graphs above). To ease that I wrote a simple script that installs RPi-Monitor with A64 adjustments on Debian based distros like longsleep's Ubuntu OS image. It contains also a function to apply our latest adjustments to cpufreq/dvfs settings so if you answer yes to the question "Do you want to adjust throttling settings (requires overwriting u-boot/dtb)?" your Pine64+ will perform better at least in multithreaded benchmarks (since these settings improve throttling a lot)
http://kaiser-edv.de/tmp/4U4tkD/install-...for-a64.sh
Please be aware that results can not be compared directly. "Pine64+ take 2" and "Pine64+ take4" are from Michael Larabel and completely irrelevant since he used old thermal/throttling settings and we do know nothing about throttling behavior in his setup.
You can use the last two results as a relative comparison how good heatsink vs. fan behave when it's about limiting throttling and compare with the "Pine64+ ARMv8 -O3" results (same code optimisation level but the "Pine64+ ARMv8 -O3" run with 1344MHz scaling_max_cpufreq, small heatsink and fan showing that you can prevent throttling even at higher clockspeeds mostly).
The results labeled "Pine64+ in enclosure", "Pine64+ enclosure+heatsink", "Pine64+ encl/heatsink/cpufreq" can also be compared directly (same/no code optimisations) and show clearly that mounting a heatsink when trying to jail the Pine64+ in a small enclosure helps with performance and that the little software tweak to allow a few more cpufreq steps improved performance also a lot just by establishing better throttling behavior that helps A64 stay at higher clockspeeds more often.
In the meantime when trying out the last test in the results above my Pine64+ always powered off for no apparent reason. I thought maybe jumping between different cpufreq operating points might overburden PSU/DC-IN and therefore chose to power the board through the Euler connector.
But to no avail. Since looking at the graphs I noticed that the board always died after heavy switching frequencies/voltages I thought maybe adding a heatsink to the AXP803 PMIC chip would help (I know it from A20's companion AXP209 that it can get quite hot and contains overtemperature protection the hard way -- maybe it's the same with AXP803 again). At least after adding the heatsink I could run the last test without problems (and the very same board already survived the tests running at 1344 MHz but with less throttling)
Pine64+ just with a fan (directly on top):
And now only a heatsink (the heatsinks on DRAM and PMIC aren't performance relevant, the latter maybe for stability -- to be confirmed)
What can we learn from that?
1) Thermal/throttling settings are responsible for high performance (true for every modern SoC -- ignored by many/most benchmarks especially the more popular ones). Using single threaded workloads most often won't show throttling effects which has to be considered.
2) Adding a few more dvfs (dynamic voltage frequency scaling) operating points allows the throttle driver to adjust clockspeeds more fine graded which helps improving performance a lot
3) To push the envelope you would have to improve heat dissipation and take thermal conditions into account (benchmarking in the morning when ambient temperature is a few degrees lower might result in 10% better scores -- keep that in mind)
4) When it's about to choose between fan and heatsink, the choice is obvious: heatsink wins. A fan does only help when combined with a heatsink. And when the fan blows just somewhere around it's only annoying and doesn't help at all (if only enclosure makers would notice!)
5) If you plan to run heavy stuff on your board be prepared to switch from the Micro USB connector to the Euler connector for DC-IN. You can feed 5V through Euler pins 2 and 4 and can connect GND to Euler pins 6, 9 and 14.
6) In case you experience sudden power-offs (green led also immediately off) think about adding a heatsink to the PMIC also (unconfirmed at the moment and more of a guess than a recommendation)
7) We should keep in mind that benchmark scores that differ by less than 20% should be interpreted as being identical when it's about normal use cases (if you want to do number crunching then it's a different story but then you chose the wrong device anyway)
8) We should also keep in mind that benchmarking irrelevant stuff is just that: irrelevant. When you want to use your device to watch videos then it's more important whether HW acceleration for the video codecs you're interested in is availble than how slow/fast the CPU might be able to calculate prime numbers (even worse with unoptimised code as it happens all the times)
9) Take benchmark results that do not take care of throttling with a grain of salt since they are misleading
10) Take benchmark results that do not make use of optimised code with a grain of salt since they are even more misleading (you got that ARMv8 thingie since you wanted to benefit from faster software, right? A benchmark that disables code optimisations like PTS' Smallpt as a prominent example is rather useless since it shows irrelevant performance scores)
11) Take every benchmark result for the Pine64+ that will be published the next few weeks with a grain of salt since settings aren't ready yet.
What does "settings" mean? Maybe we might improve the throttling strategies for single threaded workloads in the next time. Then 'real world performance and also single threaded benchmark's scores will automagically improve by 10%-30%.
Regarding this dvfs stuff: The higher the so called VDD_CPUX voltage is set (that's the core voltage the CPU cores are fed with) the hotter the SoC gets and in case CPU/GPU cores are busy throttling will jump in earlier. So you define dvfs operating points always in a way to reduce them to a reasonable minimum. This process has not even started yet. We currently rely on Allwinner's defaults and no one had a look into it how low can you go (again: If we're able to reduce the voltages all a bit with some safety headroom then the SoC will remain cooler, throttling will happen later and performance increases automagically -- but this whole process is time consuming and needs a lot of boards/users to join in)
And regarding settings I've to add that the most important setting regarding stupid 'fire and forget' benchmarks the Phoronix style we already changed is the default Allwinner behaviour. They ship their BSP with rather strange settings where CPU cores were killed instead of let throttling do the job. We already changed that to sane values recently. Therefore using the settings from last week you might end up with a Pine64+ only running on 1 or 2 cores and it should be obvious how this influences benchmarks (that's the main reason Orange Pi PC/Plus are listed on Phoronix that slow since that happened back then too and the tester didn't take notice)
BTW: It should also be obvious that you can't do proper benchmarking without monitoring the system your tests run on (see the graphs above). To ease that I wrote a simple script that installs RPi-Monitor with A64 adjustments on Debian based distros like longsleep's Ubuntu OS image. It contains also a function to apply our latest adjustments to cpufreq/dvfs settings so if you answer yes to the question "Do you want to adjust throttling settings (requires overwriting u-boot/dtb)?" your Pine64+ will perform better at least in multithreaded benchmarks (since these settings improve throttling a lot)
http://kaiser-edv.de/tmp/4U4tkD/install-...for-a64.sh