Saturday, August 15, 2015

freedreno - mesa 11.0 progress update, OpenGLES3 and more

So the big news for the upcoming mesa 11.0 release is gl4.x support for radeon and nouveau.  Which has been in the works for a long time, and a pretty tremendous milestone (and the reason that the next mesa release is 11.0 rather than 10.7).  But on the freedreno side of things, we haven't been sitting still either.  In fact, with the transform-feedback support I landed a couple weeks ago (for a3xx+a4xx), plus MRT+z32s8 support for a4xx (Ilia landed the a3xx parts of those a while back), we now support OpenGLES 3.0[1] on both adreno 3xx and 4xx!!

In addition, with the TBO support that landed a few days ago, plus handful of other fixes in the last few days, we have the new antarctica gl3.1 render engine for supertuxkart working!

Note that you need to use MESA_GL_VERSION_OVERRIDE=3.1 and MESA_GLSL_VERSION_OVERRIDE=140, since while we support everything that stk needs, we don't yet support everything needed to advertise gl3.1.  (But hey, according to qualcomm, adreno 3xx doesn't even support higher than gles3.0.. I guess we'll have to show them ;-))

The nice thing to see about this working, is that it is utilizing pretty much all of the recent freedreno features (transform feedback, MRT, UBO's, TBO's, etc).

Of course, the new render engine is considerably more heavyweight compared to older versions of stk.  But I think there is some low hanging fruit on the stk engine side of things to reclaim some of those lost fps.

update: oh, and the first time around, I completely forgot to mention that qualcomm has recently published *some* gpu docs, for a3xx, for the dragonboard 410c. Not quite as extensive as what broadcom has published for vc4, but it gives us all the a3xx registers, which is quite a lot more than any other SoC vendor has done to date :-)

[1] minus MSAA.. There is a bigger task, which is on the TODO list, to teach mesa st about some extensions to support MSAA resolve on tile->mem.. such as EXT_multisampled_render_to_texture, plus perhaps driconf option to enable it for apps that are not aware, which would make MSAA much more useful on a tiling gpu.  Until then, mesa doesn't check MSAA for gles3, and if it did we could advertise PIPE_CAP_FAKE_SW_MSAA.  Plus, who really cares about MSAA on a 5" 4k screen?

Saturday, July 4, 2015

happy (gpu) independence day

So, I realized it has been a while since posting about freedreno progress, so in honor of US independence day I figured it was as good an excuse as any for an update about independence from gpu blob driver for snapdragon/adreno..

Back in end of March 2015 at ELC, I gave a freedreno update presentation at ELC, listing the following major tasks left for gles3 support:
  • Uniform Buffer Objects (UBO)
  • Transform Feedback (TF)
  • Multi-Render-Target (MRT)
  • advanced flow control in shader compiler
 and additionally for gl3:
  • Multisample anti-aliasing (MSAA)
  • NV_conditional_render
  • 32b depth (z32 and z32_s8) (which I forgot to mention in the presentation)
EDIT: Ilia pointed out that 32b depth is needed for gles3 too, and gl3 additionally needs clipdist/etc (which we'll have to emulate, but hopefully can do in a generic nir pass) and rgtc (which will need sw decompression hopefully in mesa core so other drivers for gles class hw can reuse).  Original list was based on what mesa's compute_version() code was checking quite some time back.
Since then, we've gained support for UBO's (a3xx by Ilia Mirkin, and a4xx), MRT (for a3xx and core, again thanks to Ilia.. still needs to be wired up for a4xx), 32b depth (a3xx and core, again thanks to Ilia), and I've finished up shader compiler for loops/flow-control for ir3 (a3xx/a4xx).  The shader compiler work was a somewhat larger task than I expected (and I did expect it to be a lot of work), but it also involved moving over to NIR, in addition to re-writing the scheduler and register allocation passes, as well as a lot of re-org to ir3 in order to support multiple basic blocks.  The move to NIR was not strictly required, but it brings a lot of benefits in the form of shared support for conversion to SSA, scalarizing, CSE, DCE, constant folding, and algebraic optimizations.  And I figured it was less work in the long run to move to NIR first and drop the TGSI frontend, before doing all the refactoring needed to support loops and non-lowerable flow-control.  Incidentally, the compiler work should make the shader-compiler part of TF easier (since we need to generate a conditional write to TF buffer iff not overwriting past the end of the TF buffer).

In the mean time, freedreno and drm/msm have also gained support for the a306 gpu found in the new dragonboard 410c.  This board is a nice new low cost ($75) snapdragon community board based on the 64bit snapdragon 410.  And thanks to a lot of work by linaro and qualcomm, the upstream kernel situation for this board is looking pretty good.  It is shipping initially with a 4.0 based kernel (with patches on top for stuff that hadn't yet been merged for 4.0, including a lot of stuff backported from 4.1 and 4.2), including gpu/display/audio/video-codec/etc.  I believe that the 4.1 kernel was the first version where a vanilla kernel could boot on db410c with basic stuff (like serial console) working.  The kernel support for the gpu and display, other than the adv7533 hdmi bridge chip) landed in 4.2.  There is still more work to get *everything* (including audio, vidc, etc) merged upstream, but work continues in that direction, making this quite an exciting board.
Also, we have a GSoC student, Varad, working on freedreno support for android.  It is still in early stages, with some debugging still to do, but he has made a lot of progress and things are starting to work.
And since no blog post is complete without some nice screenshots...  the other day someone pointed me at a post in the dolphin forums about how dolphin was running on a420 (same device as in the ifc6540).  We all had a good laugh about the rendering issues with the blob driver.  But, since dolphin was the first gl3 game that worked with freedreno, I was curious how freedreno would do.. so I fired up the ifc6540 and replayed some dolphin fifo logs that would let me render approximately the same scenes:

Yoshi looks to be rendering pretty well.. digimon has a bit of corruption, but no where near as bad as the blob driver.  I suspect the issue with digimon is an instruction scheduling issue in the shader compiler (well, no rest for the gpu driver writers), but nice to see that it is already in pretty good shape.

Now we just need steam store or some unigine demos for arm linux :-P

Sunday, February 22, 2015

a4xx shaping up

So, I finally figured out the bug that was causing some incorrect rendering in xonotic (and, it turns out, to be the same bug plaguing a lot of other games/webgl/etc).  The fix is pushed to upstream mesa master (and I guess I should probably push it to the 10.5 stable branch too).  Now that xonotic renders correctly, I think I can finally call freedreno a4xx support usable:

Also, for fun, a little comparison between the ifc6540 board (snapdragon 805, aka apq8084), and my laptop (i5-4310U).  Both have 1920x1080 resolution, both running gnome-shell and firefox (with identical settings).  Laptop is fedora f21 while ifc6540 is rawhide), but it is quite close to an apples-to-apples comparision:

Obviously not a rigorous benchmark, so please don't read too much into the results.  The intel is still faster overall (as it should be at it's size/price/power budget), but amazing that the gap is becoming so small between something that can be squeezed into a cell phone and dedicated laptop class chips.