Engine tales: 2010

Monday, June 14, 2010

Morphological Anti-Aliasing.

I've ported recent implementation of Morphological Anti-Aliasing to HLSL. Anyone interested might compare it to classic edge blurring technique with this build.

Edit: added build that is not requiring PhysX System Software to run.

To play with AA, open Editor, select any game module (Materials shows it best), select Appearance->Edit Profile->Anti-Aliasing. Change Anti-Aliasing type there. For stable FPS just hover rendering window for UI updates to not interfere with it.

To fly, press right-mouse plus WSAD. Use Shift and Ctrl to control flying speed.

Ported version of the shader is in archive as well: GEMData\Shaders\D3D9\AntiAlias.shader.

Enjoy.

Thursday, June 3, 2010

Bump-mapping VS Normal-mapping.

Morten S. Mikkelsen posted recently a technical paper describing method of perturbing the surface normal on GPU with bump-mapping technique, used widely in offline rendering world. Main advantage over classic for realtime rendering normal-mapping technique is one-versus-three channels required to store perturbation info. Also you don't need to carry with you all this tangent/binormal stuff in vertex buffers and shaders. Looked like a silver bullet to me.

Of course I've copy-pasted the code from Listing 1 to give it a try. After few experiments and suggestion from Morten in this thread I've got it working by scaling bump input value (coefficient 0.007 worked for me). The only thing that prevented me to merge this to trunk was moire. It polluted the solution pretty heavily. And again (thanks Morten!) author suggested a solution. It was just an usage of sampler instead of gradient instructions:

float2 TexDx = ddx(IN.stcoord.xy);
float2 TexDy = ddy(IN.stcoord.xy);

float Hll = height_tex.Sample(samLinear, IN.stcoord).x;
float dBs = (height_tex.Sample(samLinear, IN.stcoord+TexDx).x - Hll);
float dBt = (height_tex.Sample(samLinear, IN.stcoord+TexDy).x - Hll);

Guess what? Moire is gone! And I was able to compare two methods, which looks pretty much the same visually. Bump-mapping technique looks a bit more crisp, though:

The end of normal mapping era?

P.S. New method works a bit faster for me.

P.P.S. Added another view:

P.P.P.S. Added two builds to compare.

Wednesday, May 26, 2010

Move/Rotate/Scale manipulator.

Some time ago I've got an idea of all-in-one manipulator to perform move/rotate/scale transformations without constantly jumping to mode buttons or using shortcuts.

Idea is pretty simple - it's just two cubes:

Exterior one has flipped normals and controls rotation. Each side allows to rotate around single axis.
Interior cube controls movement and scale. Movement automatically restricted to axis or plane when you hover over different areas. In GEM I don't have non-uniform scaling (because of compact quaternion/translate/scale representation with 8 floats), so corners simply control uniform scale. But you can adjust this for your needs.

It feels pretty natural and obvious to someone who sees it for the first time. You can see it in action below (sorry for mouse cursor position bug - capturing software issues, editor shows everything in place):

Here is a screenshot:

Feel free to ask on "max.dyachenko [at] gmail.com" if you want to get mesh/texture/manipulator code.

Sunday, April 25, 2010

Occlusion culling, part 1.

I want to share some results from my recent experiments with GPU-based occlusion culling. I've decided to not to use occlusion visibility queues as they have lag and scalability problems. Basically you should be able to use them in production with some clever optimizations and tricks, but I wanted to make fast implementation.

So I've choose rasterization and hierarchical Z. It's easy to implement if you already have Z-fill path, easy to test and easy to debug because of very simple concept. And I was a bit inspired by this presentation from Ubisoft Montreal studio. I've seen more detailed internal presentation from them and it was pretty cool. I'll give here some high level ideas for you to not to read whole thing.

Basically they are rendering good occluders geometry to small viewport and build Z-buffer from this. Z-buffer then converted into hierarchical representation (higher levels containing only largest Z depth). All tested objects then converted into point sprites vertex stream containing post-projected bounding boxes information. Pixel shader fetches hierarchical Z and performs simple Z tests, outputting visibility queries result to it's render target. Render target then is read by CPU for visibility results to be used in scene processing. Guys are trying to keep pixel shader performance under control by testing whole bounding box with only one Z sample by selecting best fit layer.

While on X360 this scheme can perform enormous amount of tests in very small time frame, it has one major drawback. You need to sync with GPU to read back results. And GPU will be idle while you are building jobs lists for it. This is not a big deal for X360, though.

I've prototyped my implementation on PC, so read back quickly became a problem for me. In test scene with pretty heavy shaders and a lot of simple objects (~20K) CPU<->GPU synchronization spent like 2.5 - 5 ms in GetRenderTargetData() call - time needed for GPU to complete it's task. I've expected lockable render targets rotation will solve the issue but for me AMD driver still sync with GPU on LockRect() even for surfaces rendered few frames behind. So bottleneck seems unavoidable and will degrade performance for simple scenes.

For this implementation I've used GPU only for Z-buffer construction and parallelized CPU loops (TBB) for Z-hierarchy and post-projected bounding boxes tests.

You can see the final result here.

In next implementation I will add software rasterizer to compare. Stay tuned.

Thursday, April 8, 2010

Instant pipeline.

CryTek showed their Sandbox for CryEngine 3 and you can see it here:
http://nvidia.fullviewmedia.com/gdc2010/14-sean-tracy.html

Iterations are really fast and this is really cool. More iterations for the same budget! =)

I'm not really happy with their method of creating sandbox functionality - it's programmers driven and workflow is not good because of this in some places. Still, tool as a whole is great!

I'll share some interesting thoughts on interface and workflow design for such tools later, implemented in my pet engine.

Thursday, March 25, 2010

Collision detection trick.

Imagine you need to collide large groups of soldiers that moves all the time. Use of spatial structures in this case became a pain because of update times. To overcome this we've developed simple solution (I believe it used widely in modern physics engines for broad phase).

Consider picture below (we are looking on 2D version, but algorithm will work in 3D as well). We want to collide two circles:

Simple radius check is good enough in this case. But if you want to collide 1K circles - it will just become impractical. So, what about this colored lines on picture?

Red lines represent left bounds of each object on each axis and green lines represent right bounds. To detect possible collision between all objects on grid, we can perform the following.

Push all bounds to two arrays, one for each axis. Sort arrays by bound position. Iterate through arrays with logic:

For each left bound consider object as open.
For each right bound consider object as closed.
For any bound met ->

For each opened object ->

Put potential collision bit into bit-table.

Repeat for next axis.

If both tables are now show that there is potential collision between two objects (both bits are set) - perform radius check and report collision, if any.

To extent this method, presort bounds of groups of objects (for example units of soldiers). To perform collision between any groups you can do cheap mix of presorted arrays to avoid sorting of very large arrays.

Draw a forest.

While working on our first title in MindLink Studio ("Telladar Chronicles: Decline"), we've set one very ambitious goal for that times: be able to show like 20K soldiers clashing on battlefields. With full 3D representation of everything. Back in 2002 we didn't seen anything like this yet.

I had an ~~obvious~~ idea of using sprites for lowest LODs to be able to draw such a large world. After few years of development and experimentation I've ended up with following solution.

When you need something to be drawn at distance, you ask sprites rendering subsystem to provide you with sprite representation of this. If something already in cache - you're done.

In opposite case subsystem considers size of the thing and selects place for it in sprites cache. This textures are divided on rows of predefined height and it selects one with best smallest row height. It injects sprite into the row, modifying row linked list. Rows contents is managed by subsystem like in any common memory manager. To wipe the row if it become too fragmented just copy it's content to another row, updating the sprites vertexes.

After getting sprite representation, LOD system injects four vertexes into dynamic storage. Great thing about this approach - you can fire-and-forget about your sprites. As long your sprite is alive - it will be rendered in batch with others. And you can just ignore updates for distant objects for any amount of frames (nobody ever will notice this on great distances).

To erase the sprite - ignore it when constructing your next ping-pong index buffer. To reuse the space in vertex buffer - make sure enough frames passed from the moment it was used last time.

In this way we were able to draw forests with hundreds of thousand of trees at real time, still providing nice 3D trees at close distance.

Soldiers was a bit tricky, as you need to animate them to look believable. After months of visual tests I've came to conclusion that you need to provide only 2-3 frames of walking animations to "shimmer" on distance.

Forest in action:

Lighting is a bit outdated. Still there is something like 30K trees in viewport with 30 FPS on SM1 hardware.

Friday, March 19, 2010

Toolset.

Fast thought after visiting one of the numerous Ubi development studios around the globe:
The only way to create exceptional game is to have exceptional tools.

[We thank you, Cap!]
[Applauds]

Seriously. No good tools? No good AAA game for you, sorry.

And the reason behind is pretty obvious: less time per iteration == more iterations for the same budget. More iterations means greater quality. Greater quality means increased value to your customer and increased sales. Profit.

But there is one trick involved. Interfaces and processes for this tools should be built by UI usability experts, not programmers or artists or anybody else. And the good news is - you can find plenty on IT market. I'm wondering why most of the studios ignore this fact?

Saturday, January 16, 2010

Shaders development.

Shaders are really important part of any graphics engine and usually a lot of work is invested into them. But shaders creation pipelines proposed by NVidia and ATI lacks one core ability - they have nothing in common with real engine shaders. You can prototype a shader to some extent using FXComposer or Render Monkey. But when it comes to engine integration - you still modify them by hands and blindly run, hoping for the best.

If your engine has run time editor with "What You See Is What You Play" ability - you already has a good platform for shaders development. And the simplest form of assisting is viewing of render targets contents (which can be achieved by using NVPerfHUD for example). But to make it more interesting than just overlay rendering I've tried the following:

So basically I've attached texture exploration interface (you already need it for texture compression) to renderer outputs. In drop-down list you can select all render targets that renderer internally exposes through it's interface. This interface might look like:

virtual uint Debug_GetNumRenderOutputs() const;
virtual const wchar_t* Debug_GetRenderOutputName(uint index) const;
virtual const ITexture* Debug_RegisterForRenderOutput(const wchar_t* renderOutput);
virtual void Debug_UnregisterForRenderOutput(const wchar_t* renderOutput);

When someones registers for render output, renderer begins to capture render targets contents. Internally it looks like:

void RegisterRenderOutput(const wchar_t* name, ITexture* pTexture);
void RegisterRenderOutput(const wchar_t* name, IDirect3DSurface9* pSurface);

This methods are called in some places through renderer, which checks if render output is registered for capturing. After that it copies RT contents to already created texture, returned in method Debug_RegisterForRenderOutput(). This texture is then displayed by texture exploration UI.

In this way developer can access any render target exposed by renderer, which gets updated in real time. Combined with ability to reload shaders this will create simple and powerful shaders development pipeline.

Thursday, January 14, 2010

Color correction.

Easiest way to control the mood of the picture is color correction. No serious team will go out without this feature inside their beloved engine. And there is a lot of cases you can't live without it. Every time your lead artist wants to change something globally - color correction is the first thing he will consider.

From technical point of view, color remapping is really easy to implement. But the most obvious solution, use of volume texture to simply remap colors, will suffer from banding issues until resolution of this volume will be close to 8 bit precision (like 256x256x256). This is prohibitively costly in terms of memory and performance of texture cache as we need to touch every screen pixel.

Fortunately, this can be easily solved by using signed addition instead of remap. So to transform Red (1,0,0) to Green (0,1,0) we need to use value of (-1,1,0). In this way we can use volume texture with size like 16x16x16 without any banding artifacts. Let's call this color correction matrix.

But technical solution is not enough. We need artist-friendly way of controlling this. After some consideration and communicating with artists I've choose the following solution. Artist creates some arbitrary number of screenshots of the level or area he wants to use color correction on. Then he uses any color correction tool of his choice (we've used Photoshop for prototyping) to produce derived color corrected versions of above screenshots. In this way we are not restricting him anyhow.

After creating two sets of screenshots artist feeds them into the engine tool that builds color correction matrix from them. Processor just incrementally saves difference of colors to some table and compress it into correction matrix after all images being processed. Afterward we can add some effects to this matrix.

Implementation note: remember that you need to correct image before HDR!

Shader code for applying correction is trivial:

float4 source = tex2Dlod(samplerSource, uv);
float4 corr = tex3Dlod(samplerColorCorrection, float4(source.xyz, 0));
return source + (corr * 2.0f - 1.0f); // if you use unsigned format

This might look like this in reality:

There is some obvious difference between corrected image and generated by run time. Major source of this is 16x16x16 correction matrix. HDR contributes by adjusting luminosity as well. But it looks not that bad in real situations.

You might notice unknown parameter "Color shift". It used for shifting dark colors below threshold to some arbitrary color. In this way you can make shadows blueish for example. I've seen this in CryTek sometime ago and I don't know who should be credited for this idea. Perhaps CryTek guys. ;)

It's pretty much everything. The only valuable addition to this method might be engine ability to place zones (volumes) with arbitrary color correction on level for special effects and fake global illumination. Use your imagination.

Sunday, January 10, 2010

Editor with "instant game" ability.

Previously I've discussed engines modules organization that has one nice side effect: with proper implementation engine editor can run game logic inside editor window (like in CryEngines). This is invaluable thing from game design perspective as it really cuts down iteration times. To be honest, every serious engine should have this ability (and we know some really serious ones without this ;).

Do not underestimate the power of instant iterations. Imagine level designer that tries to set up cover on level to have very specific properties. It may take 10-15 tries to setup one specific place. Multiply it with 1.5-2.0 minutes of loading time and you will get an idea of how important it can be for speed of development and the quality of the final product.
Another good example is the weapons designer that tries to balance dozen of weapons. It can take hundreds of iterations to recheck everything. Of course it can be done using legwork of QA team. But shouldn't they concentrate on other things?

As proof of concept find some CryEngine movies showing process of editing the level with instant jumping into action to see how it looks and plays like. You'll be impressed.

And this is something that pretty easy to implement to work properly, if you think about this from Day 0. You still need some flag to distinguish editor and standalone mode as game code in some rare situations needs to behave differently. But in most cases code just runs, without any changes.

Use comments below if you have specific questions about implementation.

Saturday, January 9, 2010

Engine debugging.

At some point of engine lifetime full debug builds become unplayable. This leads to solutions like fast debug builds and such, but they limit your ability to hunt down bugs. In some situations full debug builds is the must to do the job.

Fortunately, there is a ~~simple~~ solution.

Dividing your engine into modules allows you to run separate modules in different modes. So you can run everything in release mode and one single module in full debug mode. With 10-12 modules slowdown from single module (except may be renderer) will be unnoticed. Bingo!

Some details

Divide your engine into modules, like Rendering, Physics, Animation, Sound, Scripts, UI, Networking, Resources, Scene Management and one central module I'm calling Main. You can add more modules freely, but don't divide it too much. Like your brand new flash player can be added to UI instead of creating new module.

Compile them all as DLLs that have different names for different levels of debugging. I have four basic configurations:

Full Debug (let's call is Debug);
Release with profiling code and turned off Omit Frame Pointers (let's call it Profile);
Full Speed release build (let's call it Release);
Full Speed release compiled as static library for DRM protected final builds;

Ignore configuration number 4 at this point as it has limited use. So UI module will compile into 3 versions: UI_Debug.dll, UI_Profile.dll and UI_Release.dll.

The trick here is to add into Main module the ability to load different versions of the same module by user request. In case of GEM I just show the following window when Shift is pressed on start of the Main module:

This dialog is shown only in debug version of Main module. Simple as that.

For games and demos that use GEM I'm creating two additional projects. Launcher (exe) links with appropriate version of Main module. Game module contains all game logic and implements specific game interface. In this way Game module can be loaded by GEM and we are getting very nice bonus: ability to play the game inside the editor without any additional code (more on this later).

This approach has 3 major drawbacks:

You need to define a lot of pure interface files as modules will interconnect only through them. But if you ask me, I think that this organization leads to much cleaner engine structure from external point of view. And you can get DX9/DX11/OpenGL/NULL renderers without much effort (yikes!). It can be frustrating to press "Go to definition" and get into interface file instead of implementation, though;
As modules can be run in different modes you can't pass between them classes that behave differently in debug and release builds (like STL). For example std::string in release mode module can try to reallocate memory allocated in debug mode module. This will lead to crash. In practice this is not a big deal, if interfaces are defined only by limited set of people;
If user changes something in say renderer module, that is used in release configuration, but current configuration in VS is set to debug - renderer module will be updated only for debug configuration and this can lead to undefined behavior. It's rare issue on practice, though.

Good point for large scale engines from this organization is the ability of the tech team to publish engine in SDK mode. It means you compile everything into DLLs and LIBs, add interface files and publish this to game production teams. In most cases they just don't have anything to mess with inside engine code.

Thursday, January 7, 2010

Instant pipeline in action.

After setting up FBX exporting scripts for 3DSMAX and Maya, I've prototyped whole pipeline. How it looks like? You are double-clicking the MAX file in resources explorer and getting this:

After adding desired sub-meshes to list of exported we just pressing Export and viola - editor creates meta-file that contains user choices. Next time it has enough information to re-export without user interaction. This has few strong points comparing to classic exporting pipelines:

precise selection of list of needed meshes frees artists to use any helpers he needs. No more "clean version" files or "save selection" hell to keep exported assets clean from any helpers;
version control systems can contain original asset files for build machine to generate specialized asset formats for all platforms. No intermediate formats needed;
editor can detect that original asset file is changed (meta-file contains date and content CRCs) and automatically re-export and reload asset into engine;
new asset importing is very straightforward;

I've tried same principle on textures through NVidia Texture Tools. This allows artists to use original texture files in original scenes. Conversion as well is based on meta-files conception.

Few implementation details

For prototyping needs I've took the following steps:

created Max and Maya scripts that monitors some temporary folder for files with description of what scene should be exported;
when editor needs to export something, it starts Max or Maya in no-interface mode (like "-silent -mip -U MAXScript %s") and then places description files in temporary folder monitored by script;
script loads files one by one and exports them. When it finds special quit file it closes modeling package;
editor loads exported files and perform necessarily steps;

This works good for prototyping needs, but this way has few caveats. Editor needs to run modeling package each time it needs to export something. I've ended up with keeping it open with empty scene after first need. Also, loading big files can be time consuming and this will ruin instant feedback we want to achieve. So for production environment we should modify scheme a bit:

we need to create a small plugin for our favorite packages that will monitor for scene saving event and communicate to editor instance to decide if file should be exported as FBX;
plugin silently runs export routine (so no additional overhead of loading scene in another modeling package instance) and signals to editor;
editor loads exported file and perform necessarily steps;

Engine tales