Avoiding Interspacial Transformations in 3D
By Neil Edelman, 06/14/00 A point is usually defined in 3D via three Cartesian coordinate numbers. These represent the distances along each axis. By convention, the order of the points is first x (increasing to the right) followed by y (increasing upwards) and finally z (increasing away into the screen). Occasionally, the directions to which the axes are mapped are different. Here, x, y, and z distances are marked in red. These denote a vector, in green, which points to the blue point.

A 3D engine goes through a significant overhead each frame involving the conversion of points in a 3D world to points on a 2D screen. This is the process of projection. Ordinarily, it is accomplished by a method of brute force, wherein the entire universe (minus any clipped points) is moved from its absolute location (referred to as world space) to a point where the viewer is at its center (referred to as camera space, view space, eye space, or something similar), and then the horizontal and vertical components of these camera space coordinates are divided by the depth to convert to a 2D coordinate for the illusion of diminishing size with distance.

Personally, I've been playing with various attempts at skipping the entire process of inverse translation and rotation using vanishing points, look-up tables, ray-tracing, and some more peculiar devices. I've found one method that yields good results (the points end up in the same spot as with the normal scheme) with a simple implementation. Instead of a distance along the x axis, the vector's (green) x component (red) can rather be though of as the distance from the point (blue) to the plane encompassing the y and z axes. Instead of a distance along the y axis, the vector's (green) y component (red) can rather be though of as the distance from the point (blue) to the plane encompassing the x and z axes. Instead of a distance along the z axis, the vector's (green) z component (red) can rather be though of as the distance from the point (blue) to the plane encompassing the x and y axes.

The most intuitive way to envision the system is as three planes, each encompassing a different pair of axes. These planes can be calculated for any arbitrary axis - in this case, the camera's. The distance from a point to any of the planes is the component of the point with respect to the axes for the axis not encompassed by the plane. I shouldn't expect that babble to make any sense whatsoever, so hopefully the diagrams yield a better idea of my meaning.

Imagine a flat horizontal prairie (like Saskatchewan) with an infinite measuring stick sticking straight up into the sky. The measuring stick is the y axis. An albatross' height is defined as the measure of how far up the measuring stick it reaches. Using the ground surface, which is the xz plane, this height can be defined as the distance off the ground. That probably sounds like an obvious and stupid thing to say, but imagine now that we want to find the distance from an arbitrary point along a similar measuring stick pointing in an arbitrary direction corresponding to the albatross' position. Ordinarily, the bird would have to be transformed inversely to the position and orientation of the new measuring stick so that the entire universe (in this case, composed of one bird) has moved. Essentially, the new measuring stick is fetched and brought to a point where it is aligned with the original, dragging the whole scene with it. The bird's height is then measured off the original measuring stick to give its height as seen from the position of the arbitrary one. The idea of using axis-pair distances is that the same measurement can be obtained without disturbing the bird. One might think it an easy matter to simply read the distance off of the arbitrary measuring stick directly (which can be accomplished using dot products with the axes normals), but there is a better way. This is how: construct another "ground" which passes through the zero mark on the new measuring stick (just as the solid, original ground passes through zero on the original measuring stick) and which is perpendicular to the this measuring stick (i.e. the stick points straight up from it), likewise similar to the original. Then measure the height of the bird from this imaginary ground (well, the whole thing is imaginary, but this one is extra imaginary).

That's nice, but how can it be put into practice? This is where some familiarity with vectors and planes is required. A vector can be defined in three dimensions as a set of three numbers, x, y, and z:

```typedef struct {
float x, y, z;
} vec3_ft;
```

A plane can be defined by the equation Ax + By + Cz + D = 0. {A, B, C} forms a vector representing the plane's normal which points in the direction which the front of the plane faces and is one unit long. D is the distance from the origin to the plane. One structure that could represent a plane is as such:

```typedef struct {
vec3_ft normal;
float d;
} plane_t;
```

A set of axes has three planes, one for each pair of axes.

```typedef struct {
plane_t yz;
plane_t xz;
plane_t xy;
} axes_t;
```

The D in the plane equation can be solved for if we know the normal to the plane and any point that lies on the plane. This is simply a matter of solving the equation for D and substituting in the known values. This is an utterly essential equation for this mathematical mechanization of constructing arbitrary planes.

#define SETPLANEDFORVEC(Plane, Point) ((Plane).d = -((Plane).normal.x * (Point).x + (Plane).normal.y * (Point).y + (Plane).normal.z * (Point).z))

To define a plane, first the normal will be found. The normal of each plane is a unit vector pointing along the axis not enclosed by the plane. That means, for example, that the normal of the xy plane will point along the z axis. Calculating the normal to each plane, therefore, is a matter of calculating a unit vector pointing along each axis. How this is done depends on how orientation is defined in the application. It may be with euler angles, quaternions, etcetera. Here is my pseudo-code of one way of doing this, which defines the z axis according to pitch and yaw, with the x and y axes rotated about it by the angle of roll. This simplified code assumes that the z axis points away into the screen. It also decides which rotation directions are positive, and I'm not certain that they're all consistent. A few negations here and there would change this. In any case, the calculations of the axes would have to change depending on the implementation. After the normals are calculated, the d portion of the plane definition is calculated. The point used that we know to be on the plane is the position of the camera (the camera's axes converge at its position, therefore any plane through any of the axes contains the camera's position). Note that the values of pitch, yaw, and roll should be compatible with consequent references to sin and cos, however these are implemented (hopefully not in floating-point measurements of radians with raw library trigonometric functions as below).

axes_t Axes
vec3_ft CameraPos; /* must be known */
float pitch, yaw, roll; /* must be known */
float sx, cx, sy, cy, sz, cz;
float cy_sz, sy_sz, cy_cz, sy_cz;

/* precalculate repeated values for speed and readability */
sx = sin(pitch);
cx = cos(pitch);
sy = sin(yaw);
cy = cos(yaw);
sz = sin(roll);
cz = cos(roll);
cy_sz = cy * sz;
sy_sz = sy * sz;
cy_cz = cy * cz;
sy_cz = sy * cz;

/* the normal of the yz plane points along the x axis */
Axes.yz.normal.x = cy_cz + sx * sy_sz;
Axes.yz.normal.y = cx * sz;
Axes.yz.normal.z = sy_cz - sx * cy_sz;
/* plug in a known point, CameraPos, and solve for d */
SETPLANEDFORVEC(Axes.yz, CameraPos);

/* the normal of the xz plane points along the y axis */
Axes.xz.normal.x = sx * sy_cz - cy_sz;
Axes.xz.normal.y = cx * cz;
Axes.xz.normal.z = -sy_sz - sx * cy_cz;
/* plug in a known point, CameraPos, and solve for d */
SETPLANEDFORVEC(Axes.xz, CameraPos);

/* the normal of the xy plane points along the z axis */
Axes.xy.normal.x = -(cx * sy);
Axes.xy.normal.y = sx;
Axes.xy.normal.z = cx * cy;
/* plug in a known point, CameraPos, and solve for d */
SETPLANEDFORVEC(Axes.xy, CameraPos);

If you understand what is going on, you'll see that the above pseudo-code must be called every time the camera moves, which is probably once per frame in a game where the viewer moves around. If the camera doesn't move, it only needs to be calculated once. If the camera still points in the same direction, but has only changed position, the plane normals will remain the same, but the d variables will change. These axes are not only useful for graphics, but if you save them, you can use them to move the camera around independent of attitude (i.e. so that up is always "up" even if you're inverted), amongst other things.

Finally comes the part where the actual camera space coordinates are calculated from the world space coordinates. The equation, Ax + By + Cz + D, where A, B, C, and D are the variables we know for the plane returns the distance of a point from the plane. That is why the plane equation is Ax + By + Cz + D = 0; it's the locus of all points that are at zero distance from the plane (viz. on the plane). Simply plugging a point into the plane equation for the yz plane will give distance therefrom, being the x value of the point relative to the camera's axes (in camera space). Putting the point into the equation for the xz plane gives the y value. Substituting it into the xy plane's equation gives the z value. So in general, substituting a point into the plane equation of a plane enclosing two axes yields the distance along the axis not enclosed by the plane relative to the axes. This means that to find, for example, the x value of a point it now suffices to: CameraSpacePoint.x = WorldSpacePoint · YZAxesPlane.normal + YZAxesPlane.d. This replaces the entire mess of translation and multiple-rotation usually used to translate the point into camera space. In expanded form in pseudo-code (overloading the dot product operator may be desirable):

axes_t Axes; /* must be known */
vec3_ft ws, cs; /* world space and camera space; ws must be known */

cs.x = ws.x * Axes.yz.normal.x + ws.y * Axes.yz.normal.y + ws.z * Axes.yz.normal.z + Axes.yz.d;
cs.y = ws.x * Axes.xz.normal.x + ws.y * Axes.xz.normal.y + ws.z * Axes.xz.normal.z + Axes.xz.d;
cs.z = ws.x * Axes.xy.normal.x + ws.y * Axes.xy.normal.y + ws.z * Axes.xy.normal.z + Axes.xy.d;

That's all. Three runs through the plane equation to calculate the camera space coordinates. There are also some really cool things about this that probably don’t make any difference most of the time. One thing is that the x, y, and z values are calculated independently of each other. This might be used to calculate z values and skip calculating the x and y values if z doesn't fall between clipping distances (possibly useful with bounding spheres). Normally, the translations are semi-recursive so you can't get z without finding x and y with it. Another thing is that it's really easy to make totally eldritch messed-up spaces. The axes vectors themselves determine the camera space's actual axes. Moving and stretching them can produce some odd results (I got to see some while I was trying to figure out the equations that would give the correct results). Perhaps squishing the view could be used as a falling-damage effect. Maybe different axes would allow the view to squeeze into a non-square screen. A math wizard could try modifying the plane equations to add curvature (fish-eye view), bumps, ripples (for underwater), or some other neat stuff.

Finally, projection works as normal . . . For now:

vec3_ft cs; /* camera space coordinates, must be known */
int halfScreenSizeX, halfScreenSizeY; /* depends of screen settings; must be known */
int viewDist; /* distance behind viewing plane (affects fov); must be known */
int x, y; /* the end result of all this work: x and y screen coordinates */

x = (cs.x * viewDist) / cs.z + halfScreenSizeX;
y = -(cs.y * Game->pov.fov) / cs.z + halfScreenSizeY; /* inverted b/c screen's y is so */

These methods are all totally experimental for me. If you use them, improve on them, think they're worthless, have any related thoughts, want a better explanation, are looking for source, etcetera, I'd care to hear about it. My address is dreaded.neil@phreaker.net.

It has been pointed out to me that this approach to moving points into different spatial references is not new at all. The idea of using planes is simply a different way of visualizing the method of orthogonal coordinate systems. This is a type of affine transformation with a 3x3 orthogonal matrix in combination with a translation vector. Apparently, 4x4 matrices generalize this even further. Understanding the concepts here has helped visualize these other methods for me, and hopefully it provides an interesting alternative to the way in which the same solution is approached. This idea is essentially a reinvention of the way matrices are used on a less general level.

HOME