| The 3D Geometry Pipeline | Sean Baxter |
| August 30, 1998 |
Introduction |
Like most things in life, 3D computer
graphics is an illusion. Programming real-time 3D imagery is no easy task, so
the programmer needs all the help he can get. Fortunately, maths have been developed
from linear algebra techniques to fool your cones and rods into believing the world you
create. The fundamental object in 3D graphics is the triangle. Triangles, of course, are defined by three vertices, which in our 3D world space take the form (x, y, z). The obvious difficulty with this system is that today's unsophisticated computer monitors have no appreciation of depth - you only see what's on their surface. The 3D geometry pipeline will produce 2D screen coordinates given 3D world coordinates, and preserve the perspective projection (objects seem to move towards a vanishing point as they travel away from the observer) that humans are accustomed to. Not only does the geometry pipeline resolve screen coordinates from world coordinates, it supports z and w buffering by producing depth values clamped between 0 and 1. The pipeline's functionality doesn't stop there however; the abstract concept of a camera is easily expressed mathematically. This gives the programmer the freedom to place objects in world space and explore his environment through the lens of a moving camera. Impressed with the power of the geometry pipeline?
Hold that thought; there's more! The pipeline relates relative coordinate
systems. So far, we know that coordinates in world space can be resolved into
coordinates in camera space, and coordinates in camera space subsequently converted to the
final screen space coordinates. Unfortunately, world space is a rather impersonal
place. Do you know your address? Relative to the Earth you don't (unless you
have a global positioning satellite receiver lying around), but you do know your position
relative to your city. For most daily tasks, dealing with street addresses is more
convenient than working with latitude and longitude. The same is true in 3D graphics
- for most operations, you work with objects in their native object space. The
transformation pipeline provides easy to use mechanisms to resolve world coordinates from
object coordinates.
|
Mathematically speaking... |
The 3D geometry pipeline levies
the ungodly power of the matrix to solve the problems discussed above. The first
section of the pipeline we will cover is the object space to world space transform.
There are some fundamental operations we associate with vectors in 3D space: rotation
about all three axes, scaling, and translation. Each of these operations transforms
a vector by vector-matrix multiplication. All vectors are assumed to be row vectors
(the advantage behind this will be revealed shortly). Suppose you have modeled a house centered at (0, 0, 0) with the front door in the direction of the positive z axis. The apex of the roof is directly above center at (0, 1, 0). You have exported this model to file, and have imported it for your Direct3D project. You wish to incorporate this house into your scene. However in your scene, it's a big house. A mansion practically. We need to scale our model by a factor of three. Additionally, the front door of the house should be facing the right side of the screen, so we will need to rotate every vertex of the house 90 degrees about the y axis. In addition, this house is falling off a cliff! (it's expensive Malibu property) The rear of the house is 30 degrees lower than the front of the house - this requires a rotation about the z axis of -30 degrees. And by the way, the house isn't the center of your scene. The observer is standing far from harm's way. Relative to the observer, the house is located at position (-10, 0, 40) (far back and to the left). This will require us to translate every vertex of the house along the vector (-10, 0, 40). The perceptive reader would've counted four transformations to move the house from object space to world space. So read it again and verify this yourself. Choose an arbitrary vertex from the house to transform. Let's call it x = (x, y, z). We also define the transformation matrices: A is the scale matrix, B is the rotation about the Y axis, C is the rotation about the Z axis, and D is our translation matrix. We can transform x simply by multiplying it by each of the translation matrices: x' = x·A·B·C·D. This series of multiplication will be performed once per vertex. The perceptive reader would have noticed a brilliant optimization: instead of having to multiply out A·B·C·D for every vertex, we can pre-mulitply this term and save it as matrix E:
This is an optimization that holds true throughout the geometry pipeline. In fact, all transformations can be condensed into one concatenated matrix. Multiplying a vertex in object space by this concatenated matrix will produce a vertex in screen space. Imagine if we used column vectors rather than row vectors for a moment:
Due to the order of matrix-column vector multiplication,
the concatenated matrix E would need to be computed by multiplying the transformation
matrices in reverse order. Clearly, the use of row vectors is most intuitive
here. |
The Transformations |
The first transformation matrix
to be computed is our uniform scale of the house. This is the simplest of the
transformations: For our uniform scale of magnitude 3, we set sx, sy,
and sz to 3 and multiply the vector in question by this matrix. This has the
same effect as multiplying each component of the vector by 3. While the latter
method may seem more efficient and simpler, remember that we need a matrix representation
to construct our one concatenated super matrix. The next two transformations are rotations about the primary
axes. For a discussion on how they are derived, see (Foley, Van Damm, et. al).
Finally, we wish to construct a translation matrix. But we can't. The scale and rotation matrices construct the elements of the result vector as linear combinations of the elements in the operand vector. You cannot translate a vector using linear combinations. Multiplying a vector by a translation matrix would in effect produce a new vector x' = (x.x + Tx, x.y + Ty, x.z + Tz). If we can express the translation factors Tx, Ty, and Tz as linear combinations, then we can indeed construct a translation matrix. The popular solution is to move into homogenous coordinates. In this new homogenous system, each vector has four components: the familiar x, y, and z, plus the homogenous w. This homogenous coordinate is initially set to one. We can now easily translate the x, y, and z elements using linear combinations:
Construction of a 4x4 translation matrix is now trivial:
We also define the operation homogenize here. A
homogenous vector can be resolved into a three component vector by dividing x.x, x.y, and x.z by x.w.
|
The Perspective Transform |
The perspective transform is
geometry pipeline magic. When you multiply a vector in world space by this matrix,
the vector is projected from 3D onto your 2D screen. The American Heritage even
defines "perspective" as "The technique of representing three-dimensional
objects and depth relationships on a two-dimensional surface." Thanks to the
miracle of numbers, this seemingly complicated transformation is really very simple.
Assuming that our vanishing point is the center of the screen (it is in humans, assuming
you weren't slapped with a case of congenital cross-eyes), the perspective is computed as
follows:
This assumes that the depth of the particle is at least one. While this is a very simple computation, we give it a collective frown for two reasons: it can't be concatenated because it isn't a 4x4 matrix, and it doesn't clamp the depth to the range [0...1] required by the zbuffer. Because the zbuffer has finite precision, this [0...1] range of zbuffer values must correspond with a range of depth values in world space. We therefore define two clipping planes perpendicular to the Z axis: Zfar and Znear. The closer the two planes, the greater the zbuffer precision over that range is. For this perspective transform, Znear is at least one. There is only a practical limit on the distance of the far plane. At this stage in the geometry pipeline, we try to clamp our depth to [0...Zf]. At the end of the pipeline, this depth will be divided by the homogenous w to ensure the correct [0...1] depth. Similarly, the x and y components will also be divided by the homogenous w to produce x'.x and x'.y (see the equation above). Summing up the chores of the perspective transform: copy the depth to homogenous w, then clamp depths between the two clip planes [Zn...Zf] to [0...Zf]. We can attack this problem in increments.
The result is in the range [0...Zf-Zn].
This is in the range [0...Zf(Zf - Zn)].
This result is in our desired range [0...Zf]. Now because a matrix is a set of rules describing a set of linear combinations, we need to express Zf((x.z - Zn))/(Zf - Zn) as linear combinations of x.z and x.w (x.w will remain one after any combination of the geometric transformations from the above section. The camera transformation in the next section will also leave x.w unaffected).
We can finally construct our projection matrix!
The projection matrix copies the depth into the homogenous
w and then clamps the depth into the range [0...Zf]. Using a far Zn
with a close Zf increases zbuffer precision, but culls out more geometry.
The perspective is applied at the end of the pipeline during homogenization. This
perspective transform defines a viewing frustum. The frustum is the volume that
contains all the world space vectors that can be seen in screen space. There also
exists a projection matrix that uses an orthographic transform: its view volume is a box,
where all edges are parallel or perpendicular.
|
And onto the Screen! |
You've completed your world
transform. You've completed your projection transform. Now the only thing that
separates you from victory is the viewport transform. All your friends who rely on
Direct3D's D3DVIEWPORT2 structure to create their viewport matrix... they're
wussies! The first thing you need to do in order to create a viewport matrix is
understand the viewport. The entries in D3DVIEWPORT2 provide all the information we
need to construct our own transform matrix. The following graphic relates the
necessary entries with their locations on the screen. For best results, the viewport should maintain the screen's aspect
ratio. That is, dvClipWidth / dvClipHeight should equal dwWidth / dwHeight.
For fullscreen modes, this aspect ratio is usually 4:3 (512x384, 640x480, 800x600,
etc). After applying our projection transformation, the homogenized resulting
vectors that are visible will have a depth between 0 and 1. If dvClipWidth is 2,
dvClipHeight is 2, and the viewport is centered at the origin, the visible vectors that
the projection matrix produces (when homogenized) fall within the cube with corners at
(-1, 1, 0) and (1, -1, 1). The first thing we want to do is align this clipping
volume with our viewport. To do that we create a translation matrix (-dvClipX,
-dvClipY, 0). We've now covered enough of the geometry pipeline to deserve a recap. 3D models are located in their own coordinate system: object space. Each model should be oriented in object space as to make their transformation into world space more intuitive. For example, an airplane would likely be centered in its object space with the nose pointing down the positive Z axis. The Y axis shoots out the top of the airplane, and the wings are aligned on the X axis. This simple configuration lets the programmer easily control all six degrees of the plane's freedom: roll requires a RotateZ, pitch requires a RotateX, and yaw requires a RotateY transformation. The airplane can then be positioned anywhere in world space with a single translation. The world matrix is the set of instructions for positioning an object in world space. Each mesh may have its own world matrix, which it saves from frame to frame. Suppose we want to take our plane on a test flight. The first task of course is to create our world matrix. We decide to construct this matrix as follows:
We can create our concatenated transformation matrix, T, as well:
We can transform any vector in the plane's object space directly into screen space by multiplying it by our concatenated transformation matrix, T, and them homogenizing:
Care must be taken before the homogenization. If the vector being transformed is at the world space origin, homogenous w will be zero - homogenizing will cause a divide by zero and your program will crash. Therefore, it is often a good idea to only homogenize vertices between the two clip planes, that is, where Zn <= xt.w <= Zf. Notice that we take the inverse of xt.w for xs.w. Direct3D's D3DTLVERTEX structure requires the programmer to provide RHW when sending the structure to the rasterizer: Reciprocal of Homogenous W. RHW is used for perspective correct texture mapping. Suppose all of the plane's vertices have been transformed and rendered. We flip the backbuffer and prepare to render the next frame. During this short period of time, the plane has flown along the vector (.5, 0, 1.5). Instead of having to create a whole new world transform by creating a scale, rotate about Z, and two translation matrices, we can just create a Translate(.5, 0, 1.5) matrix and post multiply it by the existing world matrix:
We can treat W' as our world matrix for this second frame of animation.
|
The Camera |
I've been using the term
"world space" pretty loosely during this article. Perhaps you endorse the
idea of moving the world around an observer who never changes position. Then again
you might not. The geometry pipeline allows the programmer to define a camera
transformation. After a vector is transformed into world space, you can transform
the result by the camera transformation (Direct3D calls this the 'view' transformation,
but that is often confused with the viewport transformation. To suppress ambiguity,
I will use the term 'camera transformation'). Once the vector is transformed into
camera space, the geometry pipeline can be resumed at the projection matrix. Passing
world space vectors to the projection matrix is in effect the same as using the default
camera - a camera centered at the origin looking down the positive X axis, with the top of
the camera the positive Y axis. The camera transformation is a perfect example of a change of basis matrix. One coordinate system is mapped to another coordinate system using linear combinations. Every point in the world space will resolve to exactly one point in the camera space. While this behavior is shared by all of the geometry pipeline transformations, the camera transform is constructed much differently than its siblings. The camera is defined by a set of orthogonal vectors and a
position. These three orthogonal vectors provide the unit vectors for the camera
coordinate space. They are defined relative to world space. The first thing we
need to do is specify our translation. If our camera is located at (0, 10, 12) in
world space, we need to translate every vector (0, -10, -12). This transformation
will in effect re-center the world space at the camera's position. If our camera
position vector is p, our translation matrix is:
With further inspection of these three elements, you will
notice that each is the negative dot product of the position vector and one of the
orthogonal axes vectors. We use this dot product representation in our camera
matrix: While this camera transformation is easily defined, it is often hard to build. Finding three orthogonal vectors from a position and a direction of view requires a bit of tact. The parameters you would likely pass to a camera matrix construction function would likely be the location of the camera, and its direction. Let p continue to be the position vector. We name the direction vector d. Our first task is to normalize d. This gives us zc, as the direction of sight is along the Z axis. Now we try to find the yc vector. Because we are expressing the yc vector in world space, we can incorporate the world space Y vector (0, 1, 0) in our computation. The dot product operation finds the magnitude of the shadow when one vector is cast onto another. For example, when the vectors are at right angles, no shadow is cast, so their dot product is zero. Also, if the angle between two vectors is greater than 90 degrees, their dot product is negative. When dealing with normalized vectors (vectors of length one), the largest dot product is one, and the smallest is negative one. These occur when the two operands are the same, and when one operand points opposite from the other. Finding the dot product of the world Y axis unit vector (called world up) and our camera space Z axis vector (which is coincidentally the normalized direction vector - we denote it "to") is given the name "shadow". A shadow with small magnitude (magnitude being the absolute value of shadow) means that our camera space Y axis vector is similar to the world space Y axis vector, that is (0, 1, 0). The method for determining the camera space Y axis vector is:
A two dimensional case is illustrated below. We now have two of our coordinate axes - only xc remains undetermined. Remembering that these three axes are orthogonal, we decide that the cross product is the perfect tool to use. The cross product of two orthogonal vectors will produce the third orthogonal vector. Therefore we can find our final axis vector as
In addition, we can specify a roll for our camera to allow complete position freedom. Transforming the camera matrix by a rotation about axis Z will let the programmer position the top of the camera at any angle. The geometry pipeline camera transformation is a great
convenience for the programmer. No longer does your program need to move the world
around a fixed observer; it can now build a relatively fixed scene and leave it unchanged
from frame to frame, yet change the direction of rendering as simply as creating a new
camera transform. We've now discussed the most important concepts of the 3D
geometry pipeline, so you should wield the required knowledge to perform your own
transformations. You can cast away the shackles of the Direct3D SetTransform
methods, and use D3DTLVERTices. For your benefit, I've included my own matrix and
matrix transformation code below. Have fun. |
| The Matrix Class |
| const float pi = 3.14159265f; class Vector; inline Matrix& operator=(const Matrix&); inline Matrix& Zero(); inline float operator()(const int, const int) const; inline Matrix& RotateX(const float); Matrix operator*(const Matrix&) const; Matrix Inverse(); friend inline Vector operator*(const Vector&, const
Matrix&); Matrix::Matrix() { Matrix::Matrix(const D3DMATRIX& m) { Matrix::Matrix(const Matrix& m) { Matrix& Matrix::operator=(const D3DMATRIX& m) { Matrix& Matrix::operator=(const Matrix& m) { Matrix& Matrix::Zero() { Matrix& Matrix::Identity() { float Matrix::operator()(const int row, const int column) const { float& Matrix::operator()(const int row, const int column) { Matrix& Matrix::RotateX(const float ang) { Matrix& Matrix::RotateY(const float ang) { Matrix& Matrix::RotateZ(const float ang) { Matrix& Matrix::Translate(float x, float y, float z) { Matrix& Matrix::operator*=(const Matrix& m) { Matrix Matrix::operator*(const Matrix& b) const { Matrix c; c._21 = a._21 * b._11 + a._22 * b._21 + a._23 * b._31 + a._24 *
b._41; c._31 = a._31 * b._11 + a._32 * b._21 + a._33 * b._31 + a._34 *
b._41; c._41 = a._41 * b._11 + a._42 * b._21 + a._43 * b._31 + a._44 *
b._41; Matrix Matrix::Inverse() {
memcpy(temp, &i._m[s][0], 16); i._m[c][0] /= div; for(int k(0); k < 4; k++) {
i._m[k][0] -= fac * i._m[c][0]; |
| The Vector Class |
| class Vector { float _x, _y, _z, _w; public: Vector() {} inline Vector(const float x, const float y, const float z, const float w = 1.0f) : _x(x), _y(y), _z(z), _w(w) {} float& x() { return _x; } float& y() { return _y; } float& z() { return _z; } float& w() { return _w; } float x() const { return _x; } float y() const { return _y; } float z() const { return _z; } float w() const { return _w; } inline Vector& operator*=(const Matrix&); inline Vector& homogenize(); friend inline Vector operator*(const Vector&, const Matrix&); }; Vector& Vector::operator*=(const Matrix& m) { return *this = *this * m; } Vector& Vector::homogenize() { Vector operator*(const Vector& v, const Matrix& m) { |
| The Transformation Classes |
| class Projection : public Matrix { public: inline Projection() : Matrix() {} inline Projection(float, float); }; Projection::Projection(float nearp, float farp) : Matrix() { class Viewport : public Matrix { Viewport::Viewport(int w, int h, float x, float y, float cw, float ch) : Matrix() {
class Camera : public Matrix { Camera::Camera(D3DVECTOR from, D3DVECTOR to, float roll) { RotateZ(-roll); class Transformation { Matrix _world; float _nearplane; inline int transform(const Vector&, Vector*); void Transformation::world(const Matrix& m) { void Transformation::camera(D3DVECTOR a, D3DVECTOR b, float c) { bool Transformation::projection(float n, float f) { void Transformation::viewport(int a, int b, float c, float d, float e, float f) {
int Transformation::transform(const Vector& source, Vector* target) { int Transformation::transform(Vector* v) { |