Info about the craziest personal project I’ve ever worked on.
Preface
In January 2024, I submitted a project for the Develop at Ubisoft Toronto competition. It was a simple isometric, local multiplayer game that I had crunched over a couple of days to complete. I was not overly proud of it. It was a lazy attempt just to get something done.
Though I was entirely confident of my skills as a programmer and felt as though I nailed the interview, this project was a mess. Coming in second place was not a massive surprise; I certainly hoped for a different outcome, but I was aware that I needed to push myself further.
On the train ride back to Ottawa, I sat down with a notepad, and like a mad detective working the case that will define their career, frantically wrote every thought down with nearly indecipherable connections being drawn page-to-page. This would become my next project for next year’s competition, oxygen.
Start
The API that Ubisoft has been giving for the challenge has been largely unchanged for the past 8 years. It is extremely limited and even comes with its share of bugs that remain unfixed as I am writing this. Here’s what you get:
- A solid colour line rendering method
- A solid colour text rendering method, with only a select few fonts and sizes to choose from
- Audio playing methods (with surprisingly, no way to change the volume)
- Keyboard / Mouse input methods
- Controller input class (with no way to check if a controller is connected)
- And finally, a sprite class
The guidelines of the competition explicitly forbid modifying any of the API code or using libraries like OpenGL to render.
In my 2024 project, I had done a ridiculous trick to check if a controller was plugged in:
The member of CController, m_bConnected, is marked as protected:
class CController
{
public:
// ...
protected:
XINPUT_STATE m_state;
WORD m_lastButtons = 0;
WORD m_debouncedButtons = 0;
bool m_bConnected = false;
};
There are several public methods to check the input state of the controller, but not one to check if it is connected. But by making a child class of CController, it is possible to copy the value from the parent class instance and create a new public method to return it.
auto IsControllerConnected(const CController& controller) -> bool
{
struct ControllerChild : public CController
{
auto IsConnected() const -> bool
{
return m_bConnected;
}
};
// To guarantee this causes no side effects:
static_assert(std::is_trivially_constructible_v<ControllerChild, const CController&>, "please make a getter for m_bConnected");
static_assert(std::is_trivially_destructible_v<ControllerChild>, "please make a getter for m_bConnected");
ControllerChild child{ controller };
return child.IsConnected();
}
This is the type of ridiculous thinking you must do when you are tasked with making something from a poorly designed API.
Privacy
Sitting on the train I thought more about this insane trick. I wondered if it could be used in other places of the API. Thinking I had just stumbled upon an amazing idea I jumped to check the CSimpleSprite class on my phone. But oh no, these members are private!
class CSimpleSprite
{
public:
// ...
private:
// ...
GLuint m_texture;
float m_xpos = 0.0f;
float m_ypos = 0.0f;
float m_width = 0.0f;
float m_height = 0.0f;
int m_texWidth = 0;
int m_texHeight = 0;
float m_angle = 0.0f;
float m_scale = 1.0f;
float m_points[8];
float m_uvcoords[8];
unsigned int m_frame;
unsigned int m_nColumns;
unsigned int m_nRows;
float m_red = 1.0f;
float m_green = 1.0f;
float m_blue = 1.0f;
// ...
};
Sadly, no amount of deriving would allow me to access these members. But I knew there had to be a way.
I knew from my days of game-hacking that it is possible to just modify these members with some simple pointer arithmetic, but that is completely undefined behaviour. Sure, it’d work, but guaranteeing that it would work on every different release of MSVC? Impossible.
Another option? #define private public your problems away. However, this is also terrible, as far as I can tell it breaks the standard in both defining away keywords (a big no-no), as well as violating the one-definition rule (ODR).
So, I sat in defeat. Until I remembered this bizarre rule in the standard: explicit template instantiations have access checks disabled.
Why?
Imagine you have a class, that itself contains a definition of a class. However, this nested class is marked as private:
struct Foo
{
private:
struct Bar{};
};
How would you make an explicit instantiation of this type ‘Bar’? Do you put it inside the definition of ‘Foo’? Well, no, you cannot explicitly instantiate inside of a class, it has to be done outside of any definitions.
template struct SomeTemplate<Foo::Bar>;
This is why access checks are disabled, they have to be otherwise it’d be impossible to explicitly instantiate.
How is this useful
Doing more research into this, I discovered this blog post: https://web.archive.org/web/20120401132446/http://bloglitb.blogspot.com/2011/12/access-to-private-members-safer.html
// use
struct A {
A(int a):a(a) { }
private:
int a;
};
// tag used to access A::a
struct A_f {
typedef int A::*type;
friend type get(A_f);
};
template struct Rob<A_f, &A;::a>;
int main() {
A a(42);
std::cout << "proof: " << a.*get(A_f()) << std::endl;
}
This is just the shenanigans I was looking for. An absurd way to access private members of a class, completely defined by the C++ standard.
Playing with this, with a more modern version of C++, I came up with this:
// When explicitly instantiated, this assigns a
// non-type-template-parameter value to a reference during global
// initialization
template <auto& Where, auto What>
requires std::convertible_to<decltype(Where), decltype(What)>
struct NTTPAssigner
{
static inline decltype(auto) s_assignmentReturnResult{Where = What};
};
int PrivateClass::*g_privateClassBPointer{};
template struct NTTPAssigner<g_privateClassBPointer, &PrivateClass::b>;
// Effectively, this is roughly equivalent to:
// int PrivateClass::*g_privateClassBPointer = &PrivateClass::b;
So there we go! A way to access private members of the API.
Putting it into action
The first two members of the API that immediately caught my attention were m_points and m_uvcoords. Using this ridiculous syntax (seriously why is this the syntax for a pointer to a sized array member):
// float m_points[8];
static inline float (CSimpleSprite::*g_CSimpleSpriteMemberPointerMPoints)[8]{};
template struct NTTPAssigner<g_CSimpleSpriteMemberPointerMPoints, &CSimpleSprite::m_points>;
// float m_uvcoords[8];
static inline float (CSimpleSprite::*g_CSimpleSpriteMemberPointerMUVCoords)[8]{};
template struct NTTPAssigner<g_CSimpleSpriteMemberPointerMUVCoords, &CSimpleSprite::m_uvcoords>;
I was able to move the points and texture coordinates of the sprite anywhere I wanted. This allowed me to render a fully skewed polygon.
Rendering
There’s a problem, we are still stuck in 2D. The Draw method of the sprite class only passed in 2D coordinates, with no Z value, and no perspective correction.
But that doesn’t mean 3D is impossible. These are the same limitations the Sony PlayStation and Sega Saturn had, yet we all remember those consoles for their 3D graphics.
How do you do 3D without a Z-buffer?
The simplest technique is the painter’s algorithm. You draw the back-most polygons first. But without clipping every polygon with every polygon that overlaps its plane, this is impossible. Clipping polygons is expensive, and my early N^2 attempt proved to be a complete failure.
So how did the developers of the 90s do it? They didn’t. They took the average Z value of every polygon and sorted them by that. If you go back and play these early 3D games, you will quickly notice polygons being drawn in the wrong order.
But one game had a genius solution, Quake. With a pre-computation stage during level design, all polygons are clipped and a binary-space-partition (BSP) tree is created.
The BSP tree gives polygon order in whichever way you want. Back-to-front? Front-to-back? It can handle it. This is a perfect solution.
The source code for Quake is freely available online, and although the code is a little bit of a nightmare to decipher, I was able to parse the BSP file format in my codebase.
The perspective issue
There is no perspective texture correction. This is fine when looking from a far distance at small polygons. But as soon as polygons are close to the camera, or god-forbid, perpendicular with it, they skew massively.
I was able to slightly remedy this with clipping against the view frustum, however, this does increase the count of polygons we need to draw.
The solution? Don’t render textures where this effect is obvious. I settled on mostly flat colours in the final project, which perspective warping had little impact on.
Dynamic objects
Sure, we can render a static scene. That’s great! But how do we move things?
We could insert them into the BSP tree, clip them, etc. But this would be very slow (and complex).
How did Quake do this? It had a Z-buffer.
Ugh.
Writing a rasterizer
I had previously written a software rasterizer. It’s rather simple, you calculate the bounding box (min/max) of the triangle on the screen, iterate over every pixel, calculate the barycentric coordinates, test if they are inside the triangle, and there you go!
I knew this would be slow as hell. But through months of tweaking, I was able to get it to a point where I was comfortable with it.
The trick Quake used to speed it up? You already have sorted polygons for the level geometry, which you don’t need to perform any depth testing on, all you need to do is write the Z value.
Multicore
Back in the ye ol’ days, you got one thread, that’s it. Now we have parallel execution. Which can massively speed things up.
Quickly wrapping my rasterizing code into a std::for_each with std::execution::par_unseq, my frame time went from 90ms to 4ms. I was flabbergasted at how easy it was to exponentially speed up my code. I love modern hardware.
So we have a Z-buffer, how do we draw
Typically you have two buffers, Z, and colour. This design could work. Draw a quad for every pixel using its colour, but that is a lot of quads. On top of this, it means you must sample the texture on the CPU, which is even more crippling to performance.
Instead of a colour buffer, I settled on a “triangle-buffer”, and write the index of the triangle into that buffer, alongside the Z depth. After we have finished rastering, we can iterate this triangle-buffer, and create horizontal spans. These spans can be one pixel or could be the whole width of the screen, chopping down on the amount of polygons we send to the GPU.