Hi, so whilst I was working on my two plugins for Unreal Engine 5.7+ for cinematic rendering purposes, I stumbled onto the Carbon Github and have some feedback.
I didn’t clearly see any way to post this anywhere else, so if an ISD wishes to move this, go right ahead - but here are my findings.
Firstly, “Transform(const Vector3&, const Matrix&)” computes the homogeneous w incorrectly. In “Matrix_inline.h” (line 599), the returned w is transform._14 + transform._24 + transform._34 + transform._44, which is independent of the input point. For a Vector3 treated as (x, y, z, 1), that term should be x*_14 + y*_24 + z*_34 + _44. If any caller uses this overload for projection or perspective-correct work, I think the result would be wrong?
“GetAxisAngle” does not appear to recover the axis from a quaternion. In “Quaternion_inline.h” (line 270), it returns Vector3(q.x, q.y, q.z) and 2*acos(q.w). That vector is the quaternion’s imaginary part, not the rotation axis. The axis should be divided by sin(angle/2) for non-degenerate rotations, plus special handling near zero rotation. I think the function name promises more than it delivers and might silently mislead downstream code, creating drift in the future?
Some of angle/extraction helpers give “acos” unclamped values, which makes them numerically ‘insecure’ or perhaps ‘brittle’. In “Quaternion_inline.h” (line 257), (line 272), and “Vector3_inline.h” (line 271), values derived from dot products or q.w go straight into “acos”. Surely in animation tiny floating-point drift routinely producing 1.0000001 or -1.0000001; which turns into NaN unless clamped to [-1, 1] ?
The Quaternion normalization/inversion lack zero-length guards are…strange. In “Quaternion_inline.h” (line 177), (line 184), both Normalize and Inverse divide by length-derived values without checking for zero? That might be fine for well-formed inputs, but…questionable for import pipelines where partially initialized or degenerate rotations do show up, perhaps?
“Vector3::Normalize” rescales first to avoid overflow before squaring, that’s cool and fairly robust. Nice touch…
Slerp handles the negative-dot case correctly to follow the shortest arc in “Quaternion_inline.h” (line 244). I thought that was slick.
“Decompose” is…conventional: extract basis lengths as scale, translation from row 4, normalize the 3x3, then convert to a quaternion in “Matrix.cpp” (line 225).
Here is (I think) some basic math code that would solve 80% of transform bugs and is the direction the industry has been going for a while now, including Unreal Engine and Unity.
#pragma once
#include <cmath>
#include <algorithm>
namespace gm
{
constexpr float kEps = 1e-6f;
constexpr float kPi = 3.14159265358979323846f;
inline float clamp(float x, float lo, float hi)
{
return std::max(lo, std::min(x, hi));
}
inline float safe_sqrt(float x)
{
return std::sqrt(std::max(0.0f, x));
}
inline float safe_acos(float x)
{
return std::acos(clamp(x, -1.0f, 1.0f));
}
struct Vec3
{
float x{}, y{}, z{};
Vec3() = default;
Vec3(float x_, float y_, float z_) : x(x_), y(y_), z(z_) {}
Vec3 operator+(const Vec3& b) const { return {x + b.x, y + b.y, z + b.z}; }
Vec3 operator-(const Vec3& b) const { return {x - b.x, y - b.y, z - b.z}; }
Vec3 operator*(float s) const { return {x * s, y * s, z * s}; }
Vec3 operator/(float s) const { return {x / s, y / s, z / s}; }
};
inline float dot(const Vec3& a, const Vec3& b)
{
return a.x * b.x + a.y * b.y + a.z * b.z;
}
inline Vec3 cross(const Vec3& a, const Vec3& b)
{
return {
a.y * b.z - a.z * b.y,
a.z * b.x - a.x * b.z,
a.x * b.y - a.y * b.x
};
}
inline float length_sq(const Vec3& v)
{
return dot(v, v);
}
inline float length(const Vec3& v)
{
return std::sqrt(length_sq(v));
}
inline Vec3 normalize(const Vec3& v)
{
float m = std::max({std::fabs(v.x), std::fabs(v.y), std::fabs(v.z)});
if (m < kEps) return {0, 0, 0};
Vec3 r = v / m;
float len = length(r);
return (len < kEps) ? Vec3{} : r / len;
}
}
That being said, I think I’d propose a proper transform instead of passing matrices around like strippers at a bachelors party.
#pragma once
#include <cmath>
#include <cstdint>
#include <algorithm>
namespace km
{
constexpr float kEpsilon = 1e-8f;
inline float clamp(float x, float lo, float hi)
{
return std::max(lo, std::min(x, hi));
}
inline float rsqrt_safe(float x)
{
return (x > kEpsilon) ? (1.0f / std::sqrt(x)) : 0.0f;
}
struct alignas(16) Vec4
{
float x, y, z, w;
constexpr Vec4() : x(0), y(0), z(0), w(0) {}
constexpr Vec4(float x_, float y_, float z_, float w_ = 0.0f)
: x(x_), y(y_), z(z_), w(w_) {}
};
struct alignas(16) Mat4
{
// Row-major, row-vector convention:
// p' = p * M
float m[4][4];
constexpr Mat4()
: m{
{1, 0, 0, 0},
{0, 1, 0, 0},
{0, 0, 1, 0},
{0, 0, 0, 1}
}
{
}
static constexpr Mat4 identity()
{
return Mat4{};
}
};
inline Vec4 make_point(float x, float y, float z)
{
return Vec4(x, y, z, 1.0f);
}
inline Vec4 make_vector(float x, float y, float z)
{
return Vec4(x, y, z, 0.0f);
}
inline Vec4 add3(const Vec4& a, const Vec4& b)
{
return Vec4(a.x + b.x, a.y + b.y, a.z + b.z, 0.0f);
}
inline Vec4 sub3(const Vec4& a, const Vec4& b)
{
return Vec4(a.x - b.x, a.y - b.y, a.z - b.z, 0.0f);
}
inline Vec4 mul3(const Vec4& a, const Vec4& b)
{
return Vec4(a.x * b.x, a.y * b.y, a.z * b.z, 0.0f);
}
inline Vec4 scale3(const Vec4& v, float s)
{
return Vec4(v.x * s, v.y * s, v.z * s, 0.0f);
}
inline float dot3(const Vec4& a, const Vec4& b)
{
return a.x * b.x + a.y * b.y + a.z * b.z;
}
inline float dot4(const Vec4& a, const Vec4& b)
{
return a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w;
}
inline Vec4 cross3(const Vec4& a, const Vec4& b)
{
return Vec4(
a.y * b.z - a.z * b.y,
a.z * b.x - a.x * b.z,
a.x * b.y - a.y * b.x,
0.0f
);
}
inline float length_sq3(const Vec4& v)
{
return dot3(v, v);
}
inline float length3(const Vec4& v)
{
return std::sqrt(length_sq3(v));
}
inline Vec4 normalize3(const Vec4& v)
{
float inv_len = rsqrt_safe(length_sq3(v));
if (inv_len == 0.0f)
{
return Vec4(0, 0, 0, 0);
}
return Vec4(v.x * inv_len, v.y * inv_len, v.z * inv_len, 0.0f);
}
struct alignas(16) Quat
{
// xyzw, unit quaternion expected
Vec4 v;
constexpr Quat() : v(0, 0, 0, 1) {}
constexpr Quat(float x, float y, float z, float w) : v(x, y, z, w) {}
static constexpr Quat identity()
{
return Quat(0, 0, 0, 1);
}
};
inline Quat quat_conjugate(const Quat& q)
{
return Quat(-q.v.x, -q.v.y, -q.v.z, q.v.w);
}
inline Quat quat_normalize(const Quat& q)
{
float inv_len = rsqrt_safe(dot4(q.v, q.v));
if (inv_len == 0.0f)
{
return Quat::identity();
}
return Quat(
q.v.x * inv_len,
q.v.y * inv_len,
q.v.z * inv_len,
q.v.w * inv_len
);
}
inline Quat quat_inverse(const Quat& q)
{
float lsq = dot4(q.v, q.v);
if (lsq <= kEpsilon)
{
return Quat::identity();
}
float inv = 1.0f / lsq;
return Quat(
-q.v.x * inv,
-q.v.y * inv,
-q.v.z * inv,
q.v.w * inv
);
}
inline Quat quat_mul(const Quat& a, const Quat& b)
{
return Quat(
a.v.w * b.v.x + a.v.x * b.v.w + a.v.y * b.v.z - a.v.z * b.v.y,
a.v.w * b.v.y - a.v.x * b.v.z + a.v.y * b.v.w + a.v.z * b.v.x,
a.v.w * b.v.z + a.v.x * b.v.y - a.v.y * b.v.x + a.v.z * b.v.w,
a.v.w * b.v.w - a.v.x * b.v.x - a.v.y * b.v.y - a.v.z * b.v.z
);
}
inline Quat quat_from_axis_angle(const Vec4& axis, float radians)
{
Vec4 n = normalize3(axis);
float half = 0.5f * radians;
float s = std::sin(half);
float c = std::cos(half);
return quat_normalize(Quat(
n.x * s,
n.y * s,
n.z * s,
c
));
}
inline Quat quat_nlerp(const Quat& a, const Quat& b, float t)
{
float d = dot4(a.v, b.v);
float sign = (d < 0.0f) ? -1.0f : 1.0f;
Quat out(
a.v.x + (b.v.x * sign - a.v.x) * t,
a.v.y + (b.v.y * sign - a.v.y) * t,
a.v.z + (b.v.z * sign - a.v.z) * t,
a.v.w + (b.v.w * sign - a.v.w) * t
);
return quat_normalize(out);
}
inline Quat quat_slerp(const Quat& a_in, const Quat& b_in, float t)
{
Quat a = quat_normalize(a_in);
Quat b = quat_normalize(b_in);
float d = dot4(a.v, b.v);
if (d < 0.0f)
{
b = Quat(-b.v.x, -b.v.y, -b.v.z, -b.v.w);
d = -d;
}
d = clamp(d, -1.0f, 1.0f);
if (d > 0.9995f)
{
return quat_nlerp(a, b, t);
}
float theta = std::acos(d);
float sin_theta = std::sin(theta);
if (std::fabs(sin_theta) <= kEpsilon)
{
return a;
}
float wa = std::sin((1.0f - t) * theta) / sin_theta;
float wb = std::sin(t * theta) / sin_theta;
return Quat(
a.v.x * wa + b.v.x * wb,
a.v.y * wa + b.v.y * wb,
a.v.z * wa + b.v.z * wb,
a.v.w * wa + b.v.w * wb
);
}
inline Vec4 rotate_vec3(const Quat& q_in, const Vec4& v)
{
Quat q = quat_normalize(q_in);
Vec4 qv(q.v.x, q.v.y, q.v.z, 0.0f);
Vec4 t = scale3(cross3(qv, v), 2.0f);
Vec4 r = add3(v, add3(scale3(t, q.v.w), cross3(qv, t)));
return Vec4(r.x, r.y, r.z, 0.0f);
}
struct alignas(16) Transform
{
// xyz used, w ignored
Vec4 t;
// unit quaternion
Quat r;
// xyz used, w ignored
Vec4 s;
constexpr Transform()
: t(0, 0, 0, 0)
, r(Quat::identity())
, s(1, 1, 1, 0)
{
}
constexpr Transform(const Vec4& translation, const Quat& rotation, const Vec4& scale)
: t(translation.x, translation.y, translation.z, 0.0f)
, r(rotation)
, s(scale.x, scale.y, scale.z, 0.0f)
{
}
static constexpr Transform identity()
{
return Transform{};
}
static Transform from_trs(const Vec4& translation, const Quat& rotation, const Vec4& scale)
{
return Transform(translation, quat_normalize(rotation), scale);
}
};
inline Vec4 transform_vector(const Transform& xform, const Vec4& v)
{
Vec4 scaled = mul3(v, xform.s);
return rotate_vec3(xform.r, scaled);
}
inline Vec4 transform_point(const Transform& xform, const Vec4& p)
{
Vec4 rotated_scaled = transform_vector(xform, p);
Vec4 out = add3(rotated_scaled, xform.t);
return Vec4(out.x, out.y, out.z, 1.0f);
}
inline Transform combine(const Transform& parent, const Transform& child)
{
// Row-vector semantic equivalent:
// world = child then parent, expressed in TRS form.
//
// Translation path:
// child.t is first scaled by parent.s, then rotated by parent.r, then offset by parent.t.
//
// This is exact for uniform scale and standard rigid hierarchies.
// For arbitrary non-uniform scale mixed with rotation, TRS composition is not equivalent
// to full matrix composition with shear preservation. This version intentionally keeps
// Transform as TRS-only, which is the correct tradeoff for most engine/runtime code.
Vec4 scaled_child_t = mul3(child.t, parent.s);
Vec4 rotated_child_t = rotate_vec3(parent.r, scaled_child_t);
Vec4 out_t = add3(parent.t, rotated_child_t);
Quat out_r = quat_normalize(quat_mul(parent.r, child.r));
Vec4 out_s = mul3(parent.s, child.s);
return Transform::from_trs(out_t, out_r, out_s);
}
inline Transform inverse(const Transform& xform)
{
// TRS inverse:
// inv(T * R * S) = inv(S) * inv(R) * inv(T)
//
// Stored back in semantic TRS form.
Vec4 inv_s(
std::fabs(xform.s.x) > kEpsilon ? 1.0f / xform.s.x : 0.0f,
std::fabs(xform.s.y) > kEpsilon ? 1.0f / xform.s.y : 0.0f,
std::fabs(xform.s.z) > kEpsilon ? 1.0f / xform.s.z : 0.0f,
0.0f
);
Quat inv_r = quat_inverse(xform.r);
Vec4 neg_t(-xform.t.x, -xform.t.y, -xform.t.z, 0.0f);
Vec4 rot_t = rotate_vec3(inv_r, neg_t);
Vec4 inv_t = mul3(rot_t, inv_s);
return Transform::from_trs(inv_t, inv_r, inv_s);
}
inline Mat4 to_mat4(const Transform& xform)
{
Quat q = quat_normalize(xform.r);
float xx = q.v.x * q.v.x;
float yy = q.v.y * q.v.y;
float zz = q.v.z * q.v.z;
float xy = q.v.x * q.v.y;
float xz = q.v.x * q.v.z;
float yz = q.v.y * q.v.z;
float wx = q.v.w * q.v.x;
float wy = q.v.w * q.v.y;
float wz = q.v.w * q.v.z;
float sx = xform.s.x;
float sy = xform.s.y;
float sz = xform.s.z;
Mat4 out;
// Row-major, row-vector form.
out.m[0][0] = (1.0f - 2.0f * (yy + zz)) * sx;
out.m[0][1] = (2.0f * (xy + wz)) * sx;
out.m[0][2] = (2.0f * (xz - wy)) * sx;
out.m[0][3] = 0.0f;
out.m[1][0] = (2.0f * (xy - wz)) * sy;
out.m[1][1] = (1.0f - 2.0f * (xx + zz)) * sy;
out.m[1][2] = (2.0f * (yz + wx)) * sy;
out.m[1][3] = 0.0f;
out.m[2][0] = (2.0f * (xz + wy)) * sz;
out.m[2][1] = (2.0f * (yz - wx)) * sz;
out.m[2][2] = (1.0f - 2.0f * (xx + yy)) * sz;
out.m[2][3] = 0.0f;
out.m[3][0] = xform.t.x;
out.m[3][1] = xform.t.y;
out.m[3][2] = xform.t.z;
out.m[3][3] = 1.0f;
return out;
}
inline Vec4 mul_point(const Vec4& p, const Mat4& m)
{
return Vec4(
p.x * m.m[0][0] + p.y * m.m[1][0] + p.z * m.m[2][0] + p.w * m.m[3][0],
p.x * m.m[0][1] + p.y * m.m[1][1] + p.z * m.m[2][1] + p.w * m.m[3][1],
p.x * m.m[0][2] + p.y * m.m[1][2] + p.z * m.m[2][2] + p.w * m.m[3][2],
p.x * m.m[0][3] + p.y * m.m[1][3] + p.z * m.m[2][3] + p.w * m.m[3][3]
);
}
}
I think combine() is the correct runtime design for a TRS transform system? I know every TRS system it is not a full affine algebraic, but if you compose rotated non-uniform scales across a hierarchy, true matrix composition have been shown to produce shear, if that is where you were going.
If you want exact affine preservation, use Mat4 in that subsystem and convert to Transform only when you are sure the data is really TRS.
If you want SSE/NEON later, this layout can be expanded. If you want full SIMD immediately, I would change Vec4 internals but as a CPU can do same math operations on multiple values at once, people prefer Vec4 as a storage primitive. So, the door is left open for SSE and NEON accelerration later.
Just because you got SIMD, does not give the programmer the licence to wreak the API.
I think you tried to build a robust baseline graphics-math toolkit, but perhaps missed the step where a math library stops being “a bag of formulas” and becomes “a system that protects engine code from predictable failure modes”.
I am unsure whether this may lead to an API that is competent but still placeholder in design.
Again - I do not wish to upset anyone if this is in the wrong place, but I certainly did not wish to pull a Github request if I have no idea which direction this code is going in. There are so many bits to it, and some of those bits are good, but the overall structure is kinda outdated in terms of C++ math for game engines…sort of 2010 era?
Anyway, back to work…
