Overview
I’ve worked on several game and engine projects over the years, and have often been constrained to not using external libraries (self-imposed or not).
I find myself rewriting a simple math library for almost every such project, usually consisting of Vectors, sometimes Quaternions, and occasionally Matrices. Over the years, I’ve honed my personal approach, making small improvements with each iteration.
A pain point of mine has been rewriting similar code for my Vectors, based on type (usually floats and some kind of int), and dimension (2D to 4D most of the time).
While working on my current engine project, I wanted to do things differently, using a template-based approach.
Why shouldn’t I just use a library?
You probably should.
I recommend and love the popular glm (for C++) and nalgebra for Rust.
The only reasons I can think of where you’d want to do it yourself are:
- You’ve been given a constraint - academic, professional, some funky new hardware or software environment
- You want to learn how things work :P
For this project, I found myself in both camps, so allons-y!
Constraints
For this project, the following constraints exist:
- No external libraries (anything in the standard library is fair game)
- Target C++17, must be cross-platform (Windows and macOS)
- Strong type-safety (vector sizes and types should be known at compile time)
To set expectations early, here’s what this article is NOT:
- A production-ready implementation that can be used as-is (you should really use a library if you can)
- A SIMD-focused implementation (although a significant portion was auto-vectorized in my limited analysis on Compiler Explorer)
- A step-by-step guide on Templates, Unions, and C++ building blocks
Design
I wanted to approach this from a top-down perspective, focusing on the API I wanted, worrying about the implementation later.
Vec2 a = {2.3f, 4.1f};
Vec2 b = Vec2::UnitX(); // Vec2(1.0f, 0.0f);
a.x // 2.3f
b.y // 0.0f
a.z // ERROR: Vec2 does not have a z-component
Vec2 c = a + b; // Vec2(3.3f, 4.1f);
a += b; // a = Vec2(3.3f, 4.1f);
float d = a.Dot(b); // 3.3f
Vec3 e = {1, 2, 3};
e.z // 3.0f (Fine, e has a z-component)
e += b; // ERROR: You can't add Vec2s and Vec3s directly
Vec2Int f = Vec2Int::UnitY(); // Vec2Int(0, 1);
What do these (Vec2, Vec3, Vec2Int) all look like then???
If you don’t mind spoilers, they’re all just aliases, as so:
// Typedefs
using Vec2 = Vector<float, 2>;
using Vec3 = Vector<float, 3>;
using Vec4 = Vector<float, 4>;
using Vec2Int = Vector<int32_t, 2>;
using Vec3Int = Vector<int32_t, 3>;
using Vec4Int = Vector<int32_t, 4>;
I was first exposed to this idea five-or-so years ago, while working on my first sorta-serious game engine project, moon engine.
I was using the nalgebra library, and was surprised to learn that all their vectors (and matrices) were built on top of a single Matrix struct.
Okay… So what’s Vector<???, ???> like?
While we’ll have to modify this signature as we go forward, the basic definition is simply:
template <typename T, size_t N>
struct Vector;
Two things to note here briefly are:
- this would technically allow us to create vectors of any type
- we can have n-dimensional vectors, where n is a non-negative integer (this would include zero)
While these are definitely fascinating concepts if you’re mathematically-inclined, we’re focusing on a simple game-specific library where concepts such as a Vec3 of std::string and zero-dimensional vectors aren’t too useful, so we’ll fix this eventually.
On constexpr
Inspired by a rewatch of this talk, another goal for me on this project was maximizing constexpr use where appropriate.
As such, most of the code is constexpr, including all constructors, operators, and most additional operations.
Basic Implementation
Helpers
A small helper to store the size of a Vector called size:
constexpr static size_t size = N;
Storage (a.k.a. I still want access via .x, .y, etc.)
This is one of the bigger gotchas when it comes to switching to a single templated struct to store our vectors. We lose the freedom of having simple element-based access by default.
While separately you could do the following:
struct Vec2 { float x, y; };
struct Vec3 { float x, y, z; };
struct Vec4 { float x, y, z, w; };
There’s no easy way to do the same with our shiny new Vector<T, N> struct. Well, kinda.
The trick is to use even more templates!
Instead of handing storage on our Vector directly, we’ll use another templated struct, VecData.
What is VecData?
VecData is a simple templated struct like this:
template <typename T, size_t N>
struct VecData {
T data[N];
constexpr VecData(): data{} {}
explicit constexpr VecData(const T value) : data{} {
for (size_t i = 0; i < N; i++) {
data[i] = value;
}
}
};
We then update our Vector definition as such:
template <typename T, size_t N>
struct Vector : VecData<T, N> {
using storage_type = VecData<T, N>; // another helper
// magic for now...
};
But that didn’t help…?
Although we now have somewhere to store our data, we aren’t really any better off in terms of dimension-specific fields.
However, we can make use of Template Specializations!
With the constraint that we’ll provide better support for 2-to-4 dimensional vectors, which are what we’ll mostly be working with anyway, we can do the following:
// For Vec2, Vec2Int, etc.
template <typename T>
struct VecData<T, 2> {
union {
T data[2];
struct { T x, y; };
};
constexpr VecData(): data{static_cast<T>(0), static_cast<T>(0)} {}
explicit constexpr VecData(const T value): data{value, value} {}
}
// For Vec3, Vec3Int, etc.
template <typename T>
struct VecData<T, 3> {
union {
T data[3];
struct { T x, y, z; };
};
constexpr VecData(): data{static_cast<T>(0), static_cast<T>(0), static_cast<T>(0)} {}
explicit constexpr VecData(const T value): data{value, value, value} {}
}
// For Vec4, Vec4Int, etc.
template <typename T>
struct VecData<T, 4> {
union {
T data[4];
struct { T x, y, z, w; };
};
constexpr VecData(): data{static_cast<T>(0), static_cast<T>(0), static_cast<T>(0), static_cast<T>(0)} {}
explicit constexpr VecData(const T value): data{value, value, value, value} {}
}
Our actual storage occurs via this anonymous union (my beloved <3) of a fixed-size array and nameless struct containing our individual, named values.
NOTE: This is technically UB land (in C++, it’s been well-defined in C for ages) but this is extremely common in real-world usage (because of the benefits) and is an extension provided by most of the common compilers. I’ve tested this on MSVC and clang personally, which fits my needs. Yours may vary.
What about defining all members specifically?
Can’t let anything past you, can I? We’ll get to this a little later but supporting the following is essential:
Vec2 = {2, 3.2f}; // Exactly two parameters that convert to float
Vec3Int = {4, 12, 32}; // Exactly three parameters that convert to int32_t
It does get into slightly more complicated territory though, so I’m going to hold off on the explanation for now.
Constructors
Our two primary constructs are fairly straightforward, as you’d expect:
struct Vector : VecData<T, N> {
...
constexpr Vector() : storage_type() {}
explicit constexpr Vector(T value) : storage_type(value) {}
...
}
I use our storage_type helper to make things a little easier to read.
We’ll get into our third, secret constructor later. Hope you’re excited!
Element-wise Accessors
Accessors via the [] operator, both const and not provide index-based access to the underlying data.
constexpr T& operator[](size_t i) { return this->data[i]; }
constexpr const T& operator[](size_t i) const { return this->data[i];}
To keep things simple and fast, there are no checks here.
Arithmetic Operations
At a minimum, I want to support the following operations (where A, B, and C are vectors):
C = A + B; // Addition
C = A - B; // Subtraction
C = A * B; // Element-wise Multiplication
C = A / B; // Element-wise Division
C = A * scalar; // Multiply with a scalar
C = A / scalar; // Division by a scalar
As well as their their in-place cousins (+=, /=, etc.)
This is fairly straightforward, as a simple loop that iterates and operates over each element:
// Add two Vectors
constexpr Vector& operator+=(const Vector& v) {
for (size_t i = 0; i < size; i++) {
this->data[i] += v.data[i]; // Similar for -, *, /
}
return *this;
}
// Multiply by a scalar of T
constexpr Vector operator*(const T value) const {
auto result = *this;
result *= value;
return result;
}
Dot Product <3
Like many others, I love the Dot product — it is my favourite vector operation of all time (along with the 2D Wedge product).
Part of its charm lies in how simple it really is:
- multiply each corresponding component of two vectors
- sum the results
- profit
In our library, dot products are performed with a simple:
auto scalar = A.Dot(B);
Implementation isn’t much more complicated than the previous operations:
// Inside Vector:
constexpr T Dot(const Vector& v) const {
auto result = static_cast<T>(0);
for (size_t i = 0; i < size; i++) {
result += this->data[i] * v.data[i];
}
return result;
}
Cross Product
Unlike the wonderful dot product — that works for all n-dimensional pairs of vectors (or sequences of numbers) — the Cross-Product is only defined for 3D ones (and 7D??). Since I can only personally visualize up to six dimensions, implementation of a 7D cross-product is left to the reader.
// OK: 3D Cross Product
using V3 = Vector<T, 3>; // for any type T
V3 A;
V3 B;
V3 C = A.Cross(B);
// NOT OK: 4D Cross Product?
using V4 = Vector<T, 4>; // for any type T
V4 X;
V4 Y;
auto Z = X.Cross(Y); // ERROR: Cross() isn't defined because N != 3
As such, this acts as our first major roadblock. We have a templated Vector that could be of any size, but we only want the Cross Product to be available on Vectors where N (our dimensions) is 3 (and 7 if you’re up for it).
template <size_t M = N, typename = std::enable_if_t<M == 3>>
constexpr Vector<T, M> Cross(const Vector<T, M>& other) const {
static_assert(M <= N); // We allow Vec4.Cross<3>() but NOT Vec2.Cross<3>()
return {
this->data[1] * other.data[2] - this->data[2] * other.data[1],
this->data[2] * other.data[0] - this->data[0] * other.data[2],
this->data[0] * other.data[1] - this->data[1] * other.data[0]
};
}
M?? enable_if_t?? static_assert?? What’s all this then?
If you’re new to the land of template metaprogramming, this can be a bit much at first, but it’s fairly straightforward once you understand the individual parts.
First, we add an additional template parameter M that defaults to our own dimensions (N). We then make sure that this is equal to 3.
This takes advantage of SFINAE as such:
auto r1 = vec2.Cross(another_vec2); // NOT allowed, M = N = 2, 2 != 3
auto r2 = vec3.Cross(another_vec3); // Allowed, M = N = 3, 3 == 3
auto r3 = vec4.Cross<3>(another_vec3); // Allowed, M = 3, 3 == 3
// Here the final component of the Vec4 is ignored, acting as if it were a Vec3.
// This is safe because for performance reasons, or otherwise, we may want to use a 4D Vector as if it were a 3D one, ignoring the w
// However, not that the following is not permitted:
auto r3 = vec2.Cross<3>(another_vec3); // NOT allowed, although M = 3, 3 == 3, static_assert fails
Four (or more) component vectors may still use the Cross Product, provided they explicitly override M, ensuring any such usage is intentional. The static_assert simply makes sure that we’re not trying to do cross products on a Vector with fewer than three components.
2D Wedge Product
Similarly, I find myself using the 2D Wedge product (a.k.a. perpendicular dot product) frequently, such as in my racetrack sector representation article.
The math for this is quite beautiful:
a.x * b.y - b.x * a.y
The implementation is as such, with similar tricks to the Cross Product:
template <size_t M = N, typename = std::enable_if_t<M == 2>>
constexpr T Wedge(const Vector<T, M>& other) const {
static_assert(M <= N);
return this->data[0] * other.data[1] - this->data[1] * other.data[0];
}
Length and Normalizing
Finally, the other common operations we need to handle have to do with size and normalizing.
Things get a little challenging here, for a few key reasons.
Firstly, the idea of normalizing (scaling a vector by the inverse of its length), ensuring that all elements fall within the -1 to 1 range only really makes sense if we can represent such values. As such, the idea of normalizing Vec2Ints and such needs some addressing.
Normalized Types
There’s rarely a perfect solution and my choice was to convert Vectors to a type that can represent these values when normalizing.
As such it means the following:
| original type | normalized type |
|---|---|
| float | float |
| double | double |
| anything else | float |
Hence, a Vec3Int would transform into a Vec3 when normalized.
So, we need a way to represent this programmatically. Taking advantage of another type trait, we can use the following as another helper in our Vector:
using normalized_t_type = std::conditional_t<std::is_floating_point_v<T>, T, float>;
using normalized_type = Vector<normalized_t_type, N>;
Here, if T is a floating-point value (such as float or double), it keeps the same type, defaulting to float if not for all other types.
normalized_t_type and normalized_type are simple aliases that make expressing this a lot easier.
Measuring Length
Since the length of a vector is simply the square root of the sum of it’s squared elements (which is analogous to a dot product with itself), we can separate the two steps, in a sense:
- LengthSquared() that performs the Dot product with itself and retains its type
- Length() that performs the square root and returns a
normalized_t_type
LengthSquared()
For this library, I chose to not handle overflow, to keep things simple, but you may need to. Since our beloved Dot product already handles the logic, we can simply use:
constexpr T LengthSquared() const {
return Dot(*this);
}
The squared length is often useful in its own right, since square roots can be expensive. It’s often used in circle-to-circle collision detection, for example, where squared distances are sufficient.
Length()
Now, all we need is a Length function that takes the square root right?
Well, yeah, but we run into a particularly frustrating problem here: std::sqrt is not constexpr. Not in C++17, and not until C++26?!?!
This trickles up, so normalizing and everything else that requires the length can no longer be constexpr, unless we write our own square root function that is. I chose not to for this specific project, largely to ensure the implementation was technically sound and identical to std::sqrt for all valid cases (float, double, and long double?).
With a solemn disposition, here’s our function. The disgraceful, non-constexpr member of the family:
normalized_t_type Length() const {
// NOTE: std::sqrt is NOT constexpr in C++17
return std::sqrt(static_cast<normalized_t_type>(this->LengthSquared()));
}
Normalize, Finally
We have all the tools to normalize vectors now:
normalized_type Normalize() const {
normalized_type result = *this;
normalized_t_type length = Length();
if (length == static_cast<normalized_t_type>(0)) {
return result;
}
return result / length;
}
Note that because our Length() isn’t constexpr, this won’t be either. The zero-check is simply to avoid pesky division-by-zero errors.
With that we’re done with the core of our vector implementation!
Practical Improvements
Gatekeeping Vectors
As mentioned previously, we don’t actually want anything to be a Vector. We can achieve this protection by using a similar trick to our earlier approach with Cross products.
First, we add the following helper:
template <typename T, size_t N>
using is_valid_vec_t = std::enable_if_t<std::is_arithmetic_v<T> && (N > 0)>;
We then update our structs as follows:
// Declaration
template <typename T, size_t N, typename = is_valid_vec_t<T, N>>
struct VecData;
// Definition
template <typename T, size_t N, typename>
struct VecData { ... }
// Specializations for 2D, 3D, and 4D Vectors
template <typename T>
struct VecData<T, 2, is_valid_vec_t<T, 2>> // similarly for 3-and-4D Vectors
{...}
// Vector Implementation
template <typename T, size_t N, typename = is_valid_vec_t<T, N>>
struct Vector : VecData<T, N> { ... }
So no more Vector<std::string, 0>.
Secret Constructor
It’s finally here!!
Keep in mind that we’ll still have to do the actual work in VecData, and we have a helper structure called Initializer to set things up. We’ll go over it after we look at the constructor on our Vector.
It can look fairly complicated, so I’ll break it down once you’ve had a minute to take all of it in:
template <typename... Args, typename = std::enable_if_t<sizeof...(Args) == N
&& (std::is_convertible_v<Args, T> && ...)
&& (std::is_floating_point_v<T> || (!std::is_floating_point_v<Args> && ...))>>
constexpr Vector(Args... args)
: storage_type(typename storage_type::Initializer({static_cast<T>(args)...})) {}
That’s a lot of dot dot dots
Unlike in Mamma Mia!, the ...’s here serve a very different purpose.
Here, they’re used in a few key ways, depending on context:
typename... Args
This is a variadic template parameter that allows multiple arguments to be packed into Args.
This allows us to pass in as many arguments as we’d like. However, this isn’t quite what we want.
We want to ensure the following as well:
- We accept exactly as many parameters as elements we can hold (that is count(args) == N)
- We only accept values that are either
Tor can be converted toT(like ints can be converted to floats, for example). - We don’t allow a double to be passed in for int (but we allow the converse).
sizeof...(Args)
The first piece in our std::enable_if_t ensure that we have exactly N parameters, like we wanted.
The sizeof... operator, like its cousin sizeof (that returns sizes of types in bytes) gives us the count of the variadic template argument pack, Args.
(std::is_convertible_v<Args, T> && ...)
This is a fold expression, available since C++17 that makes all of this a lot easier to work with.
We check each parameter in Args to make sure it is convertible to T, and logically AND the results.
(std::is_floating_point_v<T> || (!std::is_floating_point_v<Args> && ...))
Similarly, if we’re not a floating point type, we require all arguments to also not be floating point values.
With these pieces in place, we can now accept the right number and kind of arguments in our constructor.
static_cast<T>(args)...
Here we use another fold expression to cast all of our arguments to T.
Well what’s storage_type::Initializer really?
Okay, I guess it’s time to raise the curtains. The final piece of our secret constructor magic lies in our VecData structs.
The primary template contains the following:
struct Initializer { T data[N]; };
explicit constexpr VecData(Initializer init) : data{} {
for (size_t i = 0; i < N; i++) {
this->data[i] = init.data[i];
}
}
Fairly straightforward, an array of N elements that we accept as a parameter. Similarly, the specializations are updated as follows:
// VecData<T, 2, is_valid_vec<T, 2>>
struct Initializer { T x, y; };
explicit constexpr VecData(Initializer init) : data{init.x, init.y} {}
// VecData<T, 3, is_valid_vec<T, 3>>
struct Initializer { T x, y, z; };
explicit constexpr VecData(Initializer init) : data{init.x, init.y, init.z} {}
// VecData<T, 4, is_valid_vec<T, 4>>
struct Initializer { T x, y, z, w; };
explicit constexpr VecData(Initializer init) : data{init.x, init.y, init.z, init.w} {}
We forward our Args, each casted to T, into our VecData<T, N>::Initializer ensuring we use the correct constructor.
Constants
If you recall our initial API, you might recall the use of Vec2::UnitX() and Vec2Int::UnitY() and such.
I find these particularly useful to express the following: 0. Zero Vector (all elements are zero)
- One Vector (all elements are one)
- Unit Vectors for each available Axis
- Direction Vectors (Up, Down, Right, Forward, etc.)
The Zero and One Vectors are straightforward and simply:
static constexpr Vector Zero() { return Vector(); }
static constexpr Vector One() { return Vector(static_cast<T>(1)); }
The direction vectors prove challenging due to their reliance on dimensionality, yet again:
Vec2::UnitX() // Vec2(1, 0);
Vec2::UnitZ() // ERROR: No z-component
While we could’ve technically used more conditional template magic here, I wanted a simple, clean approach that didn’t muddy our beloved Vector struct with a host of conditional logic. I also knew that I wanted these to be static and constexpr.
Hence the following:
template <typename T, size_t N, typename Vec>
struct VecDirections {};
template <typename T, typename Vec>
struct VecDirections<T, 2, Vec> {
static constexpr Vec UnitX() { return Vec{static_cast<T>(1), static_cast<T>(0)}; }
static constexpr Vec UnitY() { return Vec{static_cast<T>(0), static_cast<T>(1)}; }
static constexpr Vec Right() { return UnitX(); }
static constexpr Vec Left() { return -UnitX(); }
static constexpr Vec Up() { return UnitY(); }
static constexpr Vec Down() { return -UnitY(); }
};
template <typename T, typename Vec>
struct VecDirections<T, 3, Vec> {
static constexpr Vec UnitX() { return Vec{static_cast<T>(1), static_cast<T>(0), static_cast<T>(0)}; }
static constexpr Vec UnitY() { return Vec{static_cast<T>(0), static_cast<T>(1), static_cast<T>(0)}; }
static constexpr Vec UnitZ() { return Vec{static_cast<T>(0), static_cast<T>(0), static_cast<T>(1)}; }
static constexpr Vec Right() { return UnitX(); }
static constexpr Vec Left() { return -UnitX(); }
static constexpr Vec Up() { return UnitY(); }
static constexpr Vec Down() { return -UnitY(); }
static constexpr Vec Forward() { return UnitZ(); }
static constexpr Vec Back() { return -UnitZ(); }
};
So we can now do VecDirections<float, 3, Vector<float, 3>>::Forward() and it works!!
Well, that’s annoying to type out and not quite the lovely Vec3::Forward() we wanted though…
We can finally update our Vector struct definition (for the last time, I promise) as such:
template <typename T, size_t N, typename = is_valid_vec_t<T, N>>
struct Vector : VecData<T, N>, VecDirections<T, N, Vector<T, N>>
The observant few of you may notice that we’re passing in itself(?) to its parent VecDirections. This is a case of Curiously Recurring Template Pattern (CRTP), a name I only recently discovered, despite using the pattern in my work for several years now. It lets us use the templated Vector in VecDirections without creating a circular dependency.
Converting, Truncating, Projecting, and Expanding
For our final stretch, I wanted to go over a few common helper methods I added to my Vector.
Converting Types
A common use case might be to convert a Vec3 to a Vec3Int or vice-versa, and it helps to have an easy way to do so.
We use some more template magic and checking to do so:
template <typename U, typename = std::enable_if_t<std::is_convertible_v<T, U>>>
explicit constexpr operator Vector<U, N>() const {
Vector<U, N> result;
for (size_t i = 0; i < N; i++) {
result[i] = static_cast<U>(this->data[i]);
}
return result;
}
Truncating Vectors
Another common operation is creating a 3D Vector from a 4D one. This is useful if we are multiplying 3D vectors via 4x4 homogeneous transformation matrices, for example.
Here’s what that looks like:
template <size_t M = N - 1, typename = std::enable_if_t<(M > 1)>>
constexpr Vector<T, M> Truncate() const {
static_assert(M <= N); // Ensure we're not going out-of-range
Vector<T, M> result;
for (size_t i = 0; i < M; i++) {
result[i] = this->data[i];
}
return result;
}
Defaulting to one dimension less, we allow truncating a larger vector to a positive integer M, such that 0 < M <= N.
Projecting Vectors
Similarly common is transforming a 4D vector into a 3D one by taking its last component (the w) and dividing each of the other elements with it.
It’s implemented similarly to truncating:
template <size_t M = N - 1, typename = std::enable_if_t<(M > 0)>>
constexpr Vector<T, M> Project() const {
static_assert(M < N); // Ensure we're smaller
Vector<T, M> result;
const auto w = this->data[size - 1];
const auto inv_w = w == static_cast<T>(0) ? static_cast<T>(0) : static_cast<T>(1) / w;
for (size_t i = 0; i < M; i++) {
result[i] = this->data[i] * inv_w;
}
return result;
}
We similarly allow projecting to a smaller vector, of size M, such that 0 < M < N. Note that I could have defined this purely for 4D vectors (like we did the Cross product), but chose to keep this one generalizable.
Expanding Vectors
Similar to truncating, I find myself needing to expand a smaller dimensional vector to a larger one frequently. This time I chose to make it a single element larger, always.
Here’s the code:
constexpr Vector<T, N + 1> Expand(const T value = static_cast<T>(0)) const {
Vector<T, N + 1> result;
for (size_t i = 0; i < size; i++) {
result[i] = this->data[i];
}
result[size] = value;
return result;
}
Closing Remarks
That was quite the journey, and I thank you for reading if you made it all the way. Reach out if you find any errors, want to suggest improvements or suggestions (I’m always looking for more to dive into), or just to liven my inbox.
I may get around to making this a bit of a series, with articles such as Quaternions, Matrices, and more.
Till we meet again,