Wednesday, May 4, 2016

Freedreno (not so) periodic update

Since I seem to be not so good at finding time for blog updates recently, this update probably covers a greater timespan than it should, and some of this is already old news ;-)

Already quite some time ago, but in case you didn't already notice: with the mesa 11.1 release, freedreno now supports up to (desktop) gl3.1 on both a3xx and a4xx (in addition to gles3).  Which is high enough to show up on the front page at glxinfo.  (Which, btw, is a useful tool to see exactly which gl/gles extensions are supported by which version of mesa on various different hw.)

A couple months back, I spent a bit of time starting to look at performance.  On master now (so will be in 11.3), we have timestamp and time-elapsed query support for a4xx, and I may expose a few more performance counters (mostly for the benefit of gallium HUD).  I still need to add support for a3xx, but already this is useful to help profile.  In addition, I've cobbled together a simple fdperf cmdline tool:

I also got around to (finally) implementing hw binning support for a4xx, which for *some* games can have a pretty big perf boost:
  • glmark2 'refract' bench (an extreme example): 31fps -> 124fps
  • xonotic (med): 44.4fps -> 50.3fps
  • supertuxkart (new render engine): 15fps -> 19fps
More recently I've started to run the dEQP gles3 tests against freedreno.  Initially the results where not too good, but since then I've fixed a couple thousand test cases.. fortunately it was just a few bugs and a couple missing workaround for hw bug/limitations (depending on how you look at it) which counted for the bulk of the fails.  Now we are at 98.9% pass (or 99.5% if you don't count the 'skips' against the pass ratio).  These fixes have also helped piglit, where we are now up to 98.3% pass.  These figures are a4xx, but most of the fixes apply to a3xx as well.

I've also made some improvements in ir3 (shader compiler for a3xx and later) so the code it generates is starting to be pretty decent.  The immediate->const lowering that I just pushed helps reduce register pressure in a lot of cases.  We still need support for spilling, but at least now shadertoy (which is some sort of cruel joke against shader compiler writers) isn't a complete horror show:

In other cool news, in case you had not already seen: Rob Herring and John Stultz from linaro have been doing some cool work, with Rob getting android running on an upstream kernel plus mesa running on db410c and qemu (with freedreno and virtgl), and John taking all that, and getting it all running on a nexus7 tablet.  (And more recently, getting wifi working as well.)  I had the opportunity to see this in person when I was at Linaro Connect in March.  It might not seem impressive if you are unfamiliar with the extent to which android device kernels diverge from upstream, but to see an upstream kernel running on an actual device with only ~50patches is quite a feat:

The UI was actually reasonably fast, despite not yet using overlays to bypass GPU for composition.  But as ongoing work in drm/kms for explicit fencing, and mesa EGL_ANDROID_native_fence_sync land, we should be able to get hw composition working.