Thursday, 2 July 2015

Subtle problems linking dll's built with different compilers

You should generally try to link DLL's and exe's built with the same compiler. I recently had an issue where I redirected C++ std::cout and cerr in my exe, but got no apparent output from these functions in my linked DLL.

Turns out, the exe, being new, was built with the Visual Studio 2013 (v120) toolset, while my DLL's, for backward compatibility, were using v110_xp - the Visual Studio 2012/Windows XP-compatible toolset. And because they used the MD dynamically linked runtimes, they were silently, and happily, loading two different versions of the runtime libraries.

Therefore, my redirects (std::cout.rdbuf() etc) were working on the exe's copy of the runtime, and of course, not on the DLL's, entirely different, runtime library.

Monday, 29 June 2015

The Windows 8 SDK version of DirectX Compute shaders in hlsl seems to be broken:

void ComputeShader(uint3 sub_pos: SV_DispatchThreadID)

results in sub_pos=(0,0,0) every time.

Therefore, replace the input with:

void ComputeShader(uint3 g:SV_GroupID,uint3 t:SV_GroupThreadID)

and then:

uint3 sub_pos = g * BLOCK_SIZE + t;

Wednesday, 10 June 2015

Compiling HLSL effect files for OpenGL: Part 2 - translating HLSL syntax to GLSL

I'll talk here about some of the approaches used to enable HLSL syntax parsing for GLSL. The best resource I know for Flex/Bison parser generators is "Flex and Bison" by John Levine.

After the initial implementation of the effect parser for GLSL, I had a set of about 20 effect files (I use the .sfx extension) which use another 40-ish include files (.sl files) for shared definitions. At the top of each .sfx we had:

#include "shader_platform.sl"

- and "shader_platform.sl" was different for each API, containing #defines that allowed the same code to be compiled for HLSL and GLSL. For read-writeable texture load and store (as opposed to sampling) we had for GL:

 #define IMAGE_LOAD(tex,pos) imageLoad(tex,int2(pos))
 #define IMAGE_LOAD_3D(tex,pos) imageLoad(tex,int3(pos))
 #define IMAGE_STORE(tex,pos,value) imageStore(tex,int2(pos),value)
 #define IMAGE_STORE_3D(tex,pos,value) imageStore(tex,int3(pos),value)


whereas for HLSL, we had:

 #define IMAGE_LOAD(tex,pos) tex[pos]
 #define IMAGE_LOAD_3D(tex,pos) tex[pos]
 #define IMAGE_STORE(tex,pos,value) tex[pos]=value;
 #define IMAGE_STORE_3D(tex,pos,value) tex[pos]=value;


I didn't want to force users to include a special file to get standard HLSL shaders to work, and I didn't want to force them to use these ugly macros where the HLSL syntax is usually more elegant. So - delving into the Bison .ypp file where I'd added the C grammar, I put something like this:

postfix_exp '[' expression ']'
{
 $$.varName=$1.text;
 $$.index=$3.text;
// Is it actually a texture we're indexing?
 GLTextureType glTextureType=GetTextureType(buildFunction,$$.varName);
 if(glTextureType==0)
 {
    $$.text=($$.varName+"["+)+$$.index+"]";
 }
 else
 {
 ostringstream str;
  bool rw=IsRWTexture(glTextureType);
  if(rw)
   str<<"imageLoad(";
  else
   str<<"texelFetch(";
  }
  str<<$$.varName<<",ivec"<<GetTextureDimension(glTextureType)<<"("<<$$.index<<")"<<(rw?"":",0")<<")";
$$.text=str.str();
  }
}
This looks at any expression of the form A[B] and tries to see if A is a texture or image (as GLSL calls read-write textures), and replaces the expression with an imageLoad or a texelFetch. But postfix_exp '[' expression ']' actually recognizes A[B] even if it's on the right-hand-side. So,

outputTexture[pos]=result

would then become:

imageLoad(outputTexture,ivec2(pos))=result

- which is wrong. But that's fine as long as further up the line we match:

unary_exp '=' assignment_exp
   
- where unary_exp matches the A[B] expression, and assignment_exp matches the RHS. In this case the imageLoad is replaced with an imageStore, and the whole thing becomes:

imageStore(outputTexture,ivec2(pos),result)

Which is correct GLSL. Note: we wrap pos in ivec2() to convert it - just in case it's an unsigned int vector (a uvec2). Many GLSL compilers will complain about implicit conversion, whereas HLSL just steams through - so adding the explicit conversion gives us HLSL-style behaviour. If it's already an ivec, the conversion should optimize out at the GLSL-compile stage.

Texture Dimensions

For obtaining texture sizes in GL, we had:

#define GET_DIMENSIONS(tex,X,Y) {ivec2 iv=textureSize(tex,0); X=iv.x;Y=iv.y;}
#define GET_DIMENSIONS_3D(tex,X,Y,Z) {ivec3 iv=textureSize(tex,0); X=iv.x;Y=iv.y;Z=iv.z;}



While for HLSL, we had definitions like this:

#define GET_DIMENSIONS(tex,x,y) tex.GetDimensions(x,y)
#define GET_DIMENSIONS_3D(tex,x,y,z) tex.GetDimensions(x,y,z)



You can see that we've been forced to use two separate macros for 2D and 3D, in order to know whether we're using an ivec2 or an ivec3 in GLSL. In this case, the bare GLSL textureSize() functions are more succinct than HLSL's oddball approach of multiple individual output parameters. But the goal here (for now) is to support the HLSL syntax so that forces the complex macros you see above.

We do this by matching GetDimensions as a token in the lexer:

<in_shader>"GetDimensions" {          stdReturn(GET_DIMS);

and  in the parser:

get_dims_exp: postfix_exp '.' GET_DIMS
{
 string texture=$1.text;
 string command=$3.text;
}



So when we match:

 get_dims_exp '(' assignment_exp ',' assignment_exp ')'


...we have sufficient information to construct an expression of the form:

{ivec2 iv=textureSize(tex,0); X=iv.x;Y=iv.y;}


where in this case X and Y are the two assignment_exps.

The first part of this series is here; the code is at github.

Thursday, 4 June 2015

Compiling HLSL effect files for OpenGL: Part 1



The images above were rendered in different API's - the left image is DirectX 11, the right OpenGL. They were rendered with the same code in C++ - a wrapper class interprets my high-level rendering API to DX or GL interchangeably; this is not so unusual.

What's interesting here is that as well as the same C++ code, these two images were rendered with the same shaders. And that those shaders are pretty much standard HLSL .fx files. For example, here's the file to render text:

//  Copyright (c) 2015 Simul Software Ltd. All rights reserved.
#include "shader_platform.sl"
#include "../../CrossPlatform/SL/common.sl"
#include "../../CrossPlatform/SL/render_states.sl"
#include "../../CrossPlatform/SL/states.sl"
#include "../../CrossPlatform/SL/text_constants.sl"
uniform Texture2D fontTexture;

shader posTexVertexOutput FontVertexShader(idOnly IN)
{
 posTexVertexOutput OUT =VS_ScreenQuad(IN,rect);
 OUT.texCoords  =vec4(texc.xy+texc.zw*OUT.texCoords.xy,0.0,1.0).xy;
 return OUT;
}

shader vec4 FontPixelShader(posTexVertexOutput IN) : SV_TARGET
{
 vec2 tc  =IN.texCoords;
 tc.y  =1.0-tc.y;
 vec4 lookup =fontTexture.SampleLevel(samplerStateNearest,tc,0);
 lookup.a =lookup.r;
 lookup  *=colour;
 return lookup;
}

VertexShader vs = CompileShader(vs_4_0, FontVertexShader());

technique text
{
    pass p0
    {
 SetRasterizerState( RenderNoCull );
 SetDepthStencilState( DisableDepth, 0 );
 SetBlendState(AlphaBlend,vec4( 0.0, 0.0, 0.0, 0.0), 0xFFFFFFFF );
        SetGeometryShader(NULL);
 SetVertexShader(vs);
 SetPixelShader(CompileShader(ps_4_0,FontPixelShader()));
    }
}

A few things to notice: we include "shader_platform.sl" at the top; that's different for each API. For GL for example, it defines float2 as vec2, float3 as vec3, and so on. For DX, it defines out "uniform" so that's ignored. There are many aspects to the shader language differences that can be papered over with #defines. But when we get to something like:

           fontTexture.SampleLevel(samplerStateNearest,tc,0);

That's quite different syntax from the GLSL, which doesn't support separate texture and sampler objects.

We use "shader" to declare functions that will be used as shaders to distinguish them from utility functions. The DX11-style "CompileShader" commands create a compiled shader object for later use in technique definitions.

In techniques, we define passes, and in passes we can set render state: culling, depth test, blending - all that can be taken out of C++, leading to much cleaner-looking rendercode.

Quite aside from the cross-platform advantage, this is much easier to work with than standard GLSL, which would require a separate file for the vertex and fragment (pixel) shader, doesn't support #include, and has no concept of "Effect files" - which contain techniques, passes and state information: we would have to do all that setup in C++.

How's it done? I started off with glfx, an open-source project to allow Effect files for GLSL. Glfx passes-through most of the file, stopping to parse the parts that plain GLSL compilers don't understand. Glfx converts a source effect file into a set of individual shader texts (not files - it doesn't need to save them) - that can then be compiled with your local OpenGL.

 GLint effect=glfxGenEffect();
 glfxParseEffectFromFile(effect, "filename.glfx");

Having branched glfx to our own repo, it became apparent that it might actually be possible to adapt it to something that would, rather than adding Effect functionality to GLSL, simply understand DirectX 11-class HLSL fx files. To do this, rather than passing over the function/shader contents, it would be necessary to parse them completely. Glfx uses Flex and Bison, the GNU parser-generation tools. As GLSL and HLSL are C-based languages, I took the Bison-style Backus-Naur Form of C and added it to Glfx, so that

           fontTexture.SampleLevel(samplerStateNearest,tc,0);

can be automatically rewritten, not as

           textureLod(fontTexture,0);

but as:
textureLod(fontTexture_samplerStateNearest,0);

.. which in turn requires changing the sampler and texture definitions: we do all this automagically. Essentially, we've built separate samplers and textures as a feature into GLSL where it didn't exist before - we can set samplers like so:

glfxSetEffectSamplerState(effect,"samplerStateNearest", (GLuint)sampler);

or just define them in the effect files or headers, just like in HLSL:

SamplerState samplerStateNearest :register(s11)
{
 Filter = MIN_MAG_MIP_POINT;
 AddressU = Clamp;
 AddressV = Clamp;
 AddressW = Clamp;
};

It's conceivable that what we now have should really not be called glfx any more - it's quite different to the original project and I'm mainly concerned with cross-API compatibility, rather than specifically adding functionality to GLSL. The new library supports Pixel, Vertex, Geometry and Compute shaders, constant buffers and structured buffers, technique groups, multiple render targets, passing structures as shader input/output - most of the nice things that HLSL provides.

But more than any of that, it means I only need to write my shaders once.

In Part 2, I'll delve into how the translation is implemented using Flex and Bison parser generators.

Thursday, 14 May 2015

Gfx.waitForPresent in Unity

Gfx.waitForPresent taking a lot of CPU time in Unity? That means the CPU is waiting for the GPU - so ignore the CPU times and profile the GPU.
http://forum.unity3d.com/threads/gfx-waitforpresent.211166/

Sunday, 10 May 2015

GLSL constant buffers - alignment problems

Uniform buffers in GLSL - arrays of mats seem to destroy the alignment of the values afterwards.

layout(std140) uniform ConstantB
{

uniform mat4 views[6]; 

uniform float texCoord;  <-- this doesn't work.
uniform float gamma;
uniform float interp;
uniform float brightness;
}


Monday, 20 April 2015

So I'm working hard on make OpenGL work like DirectX 11, and come to the problem of depth.

GL expects you to use a projection matrix that transforms near z to -1.0, and far z to 1.0, which is a horrible use of numerical precision. Behind the scenes, GL will then map from [-1.0,1.0] to [0.0,1.0] to give you a nonzero depth in your buffer.

DirectX expects you to use a projection matrix that transforms near to 0 and far to 1. It doesn't need to map to [0.0,1.0] because you've already given it the range it's expecting. You can also have far as 1 and near as 0, which is actually much better for numerical precision reasons.

So if you use the same projection matrix, in the "same" vertex shader, in GL and DX, you'll get different depths. What comes out as z=0.0 in DirectX, will come out as z=0.5 in OpenGL.

glDepthRange to the rescue! This function controls the mapping from the values you actually send to GL to the values it puts in the depth buffer. If you leave it as the default, or pass glDepthRange(0.0,1.0), it will map [-1,1] to [0,1]. So if we want to leave the numbers untouched, we just pass glDepthRange(-1.0,1.0), right?

Wrong! Because as the GL spec states,
depth values are treated as though they range from 0 through 1 (like color components). Thus, the values accepted by glDepthRange are both clamped to this range before they are accepted.
Yes, another boneheaded OpenGL spec snafu means that however much you may want to map [0,1] to [0,1], OpenGL won't let you. If you pass glDepthRange(-1.0,1.0), GL will clamp the -1.0 back up to zero, and the mapping will still go [-1,1] -> [0,1].

The solution? For NVIDIA hardware, you can use their extension glDepthRangedNV, which is not nerfed like the standard function. For AMD.... ?