Monday, December 10, 2012

CART cache implementation.

I've created Google Code project to share my implementation of CART caching strategy. It has better characteristics than LRU and is fully thread safe. Kind of 'memcached' inside your C++ application. I'm using it, as example, for Memory Mapped Files to map only limited number of 16MB chunks to avoid enormous virtual memory consumption on very large files (10Gb+). Cache in this case decide which pages to evict from memory, keeping frequently used ones.

You can find implementation here:

Thursday, January 12, 2012

Derivative maps.

Another interesting reading about derivative maps:

Author solved the original problem with closeups by doubling bump map storage requirements. Looks pretty cool to me.

Monday, June 14, 2010

Morphological Anti-Aliasing.

I've ported recent implementation of Morphological Anti-Aliasing to HLSL. Anyone interested might compare it to classic edge blurring technique with this build.

Edit: added build that is not requiring PhysX System Software to run.

To play with AA, open Editor, select any game module (Materials shows it best), select Appearance->Edit Profile->Anti-Aliasing. Change Anti-Aliasing type there. For stable FPS just hover rendering window for UI updates to not interfere with it.

To fly, press right-mouse plus WSAD. Use Shift and Ctrl to control flying speed.

Ported version of the shader is in archive as well: GEMData\Shaders\D3D9\AntiAlias.shader.


Thursday, June 3, 2010

Bump-mapping VS Normal-mapping.

Morten S. Mikkelsen posted recently a technical paper describing method of perturbing the surface normal on GPU with bump-mapping technique, used widely in offline rendering world. Main advantage over classic for realtime rendering normal-mapping technique is one-versus-three channels required to store perturbation info. Also you don't need to carry with you all this tangent/binormal stuff in vertex buffers and shaders. Looked like a silver bullet to me.

Of course I've copy-pasted the code from Listing 1 to give it a try. After few experiments and suggestion from Morten in this thread I've got it working by scaling bump input value (coefficient 0.007 worked for me). The only thing that prevented me to merge this to trunk was moire. It polluted the solution pretty heavily. And again (thanks Morten!) author suggested a solution. It was just an usage of sampler instead of gradient instructions:
float2 TexDx = ddx(IN.stcoord.xy);
float2 TexDy = ddy(IN.stcoord.xy);

float Hll = height_tex.Sample(samLinear, IN.stcoord).x;

float dBs = (height_tex.Sample(samLinear, IN.stcoord+TexDx).x - Hll);
float dBt = (height_tex.Sample(samLinear, IN.stcoord+TexDy).x - Hll);
Guess what? Moire is gone! And I was able to compare two methods, which looks pretty much the same visually. Bump-mapping technique looks a bit more crisp, though:

The end of normal mapping era?

P.S. New method works a bit faster for me.

P.P.S. Added another view:

P.P.P.S. Added two builds to compare.

Wednesday, May 26, 2010

Move/Rotate/Scale manipulator.

Some time ago I've got an idea of all-in-one manipulator to perform move/rotate/scale transformations without constantly jumping to mode buttons or using shortcuts.

Idea is pretty simple - it's just two cubes:
  • Exterior one has flipped normals and controls rotation. Each side allows to rotate around single axis. 
  • Interior cube controls movement and scale. Movement automatically restricted to axis or plane when you hover over different areas. In GEM I don't have non-uniform scaling (because of compact quaternion/translate/scale representation with 8 floats), so corners simply control uniform scale. But you can adjust this for your needs.
It feels pretty natural and obvious to someone who sees it for the first time. You can see it in action below (sorry for mouse cursor position bug - capturing software issues, editor shows everything in place):

Here is a screenshot:

Feel free to ask on "max.dyachenko [at]" if you want to get mesh/texture/manipulator code.

Sunday, April 25, 2010

Occlusion culling, part 1.

I want to share some results from my recent experiments with GPU-based occlusion culling. I've decided to not to use occlusion visibility queues as they have lag and scalability problems. Basically you should be able to use them in production with some clever optimizations and tricks, but I wanted to make fast implementation.

So I've choose rasterization and hierarchical Z. It's easy to implement if you already have Z-fill path, easy to test and easy to debug because of very simple concept. And I was a bit inspired by this presentation from Ubisoft Montreal studio. I've seen more detailed internal presentation from them and it was pretty cool. I'll give here some high level ideas for you to not to read whole thing.

Basically they are rendering good occluders geometry to small viewport and build Z-buffer from this. Z-buffer then converted into hierarchical representation (higher levels containing only largest Z depth). All tested objects then converted into point sprites vertex stream containing post-projected bounding boxes information. Pixel shader fetches hierarchical Z and performs simple Z tests, outputting visibility queries result to it's render target. Render target then is read by CPU for visibility results to be used in scene processing. Guys are trying to keep pixel shader performance under control by testing whole bounding box with only one Z sample by selecting best fit layer.

While on X360 this scheme can perform enormous amount of tests in very small time frame, it has one major drawback. You need to sync with GPU to read back results. And GPU will be idle while you are building jobs lists for it. This is not a big deal for X360, though.

I've prototyped my implementation on PC, so read back quickly became a problem for me. In test scene with pretty heavy shaders and a lot of simple objects (~20K) CPU<->GPU synchronization spent like 2.5 - 5 ms in GetRenderTargetData() call - time needed for GPU to complete it's task. I've expected lockable render targets rotation will solve the issue but for me AMD driver still sync with GPU on LockRect() even for surfaces rendered few frames behind. So bottleneck seems unavoidable and will degrade performance for simple scenes.

For this implementation I've used GPU only for Z-buffer construction and parallelized CPU loops (TBB) for Z-hierarchy and post-projected bounding boxes tests.

You can see the final result here.

In next implementation I will add software rasterizer to compare. Stay tuned.

Thursday, April 8, 2010

Instant pipeline.

CryTek showed their Sandbox for CryEngine 3 and you can see it here:

Iterations are really fast and this is really cool. More iterations for the same budget! =)

I'm not really happy with their method of creating sandbox functionality - it's programmers driven and workflow is not good because of this in some places. Still, tool as a whole is great!

I'll share some interesting thoughts on interface and workflow design for such tools later, implemented in my pet engine.