VOOZH about

URL: https://lists.freedesktop.org/archives/mesa-dev/2016-May/116152.html

⇱ [Mesa-dev] [PATCH 00/14] radeonsi: Offchip tessellation


[Mesa-dev] [PATCH 00/14] radeonsi: Offchip tessellation

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Tue May 10 10:52:51 UTC 2016
This patchset implements offchip tessellation after which we can finally process
more than one patch per wave without decreasing tessmark scores.

For tessmark this improves performance by ~20% for the x32 case and ~80% for the
x64 case. x8 and x16 have roughly the same performance as before. Unigine heaven
gets 43 fps compared to 28 before (roughly +50%). Amdgpu-pro gets 44 fps for
heaven. For Shadow of Mordor the performance changes from 28 fps to 40 fps
(roughly +40%).

Remaining ideas for improvement are:

 - Don't store TCS outputs to TCS and don't unnecessarily allocate LDS. This
 has pretty much no measurable effect in the games I tried.

 - Only store TCS outputs to memory when the tess factors exceed a threshold. I
 haven't been able to get the LDS case working with dynamic HS enabled, but
 the decompiled amdgpu-pro shaders give a very strong hint that this is
 possible. However amdgpu-pro sets the thresshold to -1, so pretty much always
 stores to memory too as far as I can see. Maybe it does not work on VI,
 or there is some interaction with the VI only distribution modes and these
 were considered more profitable.

 - Hardware swizzled buffers. The swizzling by hand I use results in extra VALU
 instructions and it would be nice if we did not need to have them. However,
 my attempts have not resulted in a performance improvement yet.

I have run the piglit gpu suite and found no regressions on a Tonga card.

Bas Nieuwenhuizen (14):
 radeonsi: Add buffer for offchip storage between TCS and TES.
 radeonsi: Add offchip tessellation parameters.
 radeonsi: Define build_tbuffer_store_dwords earlier to support new
 users.
 radeonsi: Add buffer load functions.
 radeonsi: Use correct parameter index for LS_OUT_LAYOUT.
 radeonsi: Add user SGPR for the layout of the offchip buffer.
 radeonsi: Add offchip buffer address calculation.
 radeonsi: Store inputs to memory when not using a TCS.
 radeonsi: Use buffer loads and stores for passing data from TCS to
 TES.
 radeonsi: Remove LDS layout user SGPR's from TES.
 radeonsi: Enable dynamic HS.
 radeonsi: Use barrier instructions for TCS barriers.
 radeonsi: Process multiple patches per threadgroup.
 radeonsi: Allow TES distribution between shader engines.

 src/gallium/drivers/radeonsi/si_pipe.c | 1 +
 src/gallium/drivers/radeonsi/si_pipe.h | 1 +
 src/gallium/drivers/radeonsi/si_shader.c | 567 ++++++++++++++++++------
 src/gallium/drivers/radeonsi/si_shader.h | 32 +-
 src/gallium/drivers/radeonsi/si_state.c | 5 +
 src/gallium/drivers/radeonsi/si_state.h | 1 +
 src/gallium/drivers/radeonsi/si_state_draw.c | 59 ++-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 67 ++-
 8 files changed, 560 insertions(+), 173 deletions(-)

-- 
2.8.2



More information about the mesa-dev mailing list