Why is the normal transformation a inverse transpose? (9)


I think it is not exactly correct that the normal transform is an inverse transpose. A normal vector is not a column vector, the transform is a left inverse as Equation 5 shows. The transpose comes from the implementation of the vector class. This should be a left inverse as a rigorous formulation. I don't say the references [2,3,6] are incorrect, but I prefer the references [4,5] since they distinguish these two kind of vectors.

However, the references [2,3,6] have an easier to understand explanations. Especially, [2,3,6] shows why normal transformation doesn't work with applying A intuitively. I like these explanations more. Here I would like to highlight the difference of a usual vector and a normal vector, they are a column vector or a row vector. By the way, contravariant vector means doesn't-change-vector, but, its representation can change. It seems I should explain the difference of these vectors are related with inner product. I would like to study more on this and hopefully I could explain this better one day.

I find it is interesting theme that what is not change by the coordinate transformation. For example, Euclidean distance doesn't change even we translate the coordinate system. That is related inner product. Another example is magnetic field of Maxwell equation. Einstein got a hint of relatively theory because of any coordinate transformation should not change the divergence (no magnetic monopole). I hope I can write something about this in the future.


[1] Yoshihiko Futamura, http://en.wikipedia.org/wiki/Partial_evaluation

[2] Matt Pharr, Greg Humphreys, ``Physically Based Rendering, Second Edition: From Theory To Implementation'', Morgan Kaufmann, 2010

[3] Philip Schneider, David H. Eberly, ``Geometric Tools for Computer Graphics'', Morgan Kaufmann, 2002

[4] Gilbert Strang, ``Introduction to Linear Algebra, 4th Edition'', Wellesley-Cambridge Press, 2009

[5] Koukichi Sugihara, ``Guraphics no suuri (Mathematical theory of graphics)'', Kyouritu shuppan, 1995 (杉原厚吉, グラフィックスの数理, 共立出版, 1995)

[6] Tomas Akenine-Moller, Eric Haines, Naty Hoffman, ``Real-Time Rendering (2nd Edition)'',
A K Peters/CRC Press, 2002


Why is the normal transformation a inverse transpose? (8)

Why does it transpose of inverse matrix?

In the last section, we know what is the transformation matrix for the normals. I will write it here again.
But this is not a transpose of inverse matrix. It's just a left inverse matrix. Because a normal vector is a row vector, this is the correct notation. But, in a graphics library, i.e. OpenGL, we usually don't distinguish row vectors and column vectors in the computer memory. Moreover, we also don't distinguish points and vectors in the computer memory, they are usually length three (or four) of array. A usual vector is a column vector, and since we don't distinguish normal vectors and usual vectors, they are all treated as column vectors. But actually they are different. This point, you can see the difference. To make it normal vector, we need to transpose a normal vector to make it a column vector.  The transpose of Equation 5 is:
Now you see why most of the books said normal transformation is a transpose of inverse matrix.

I have a bit more simple formulation, although the meaning is the same. This formulation starts with a normal's equation.
When v is transformed by a matrix A, the correspond transformation for the normal should be
 Again, we transpose the
to make the normal vector a column vector.


Why is the normal transformation a inverse transpose? (7)

Let coordinate \Sigma 's the origin O, coordinate \Sigma' 's origin O', then we can think about the coordinates of O'O in the coordinate system \Sigma' that is represented as:
I would like to have a comment of this why this matters. In some graphics system, e.g., OpenGL, we can move or distort objects by applying transformation matrix.  This looks like the object is moved or is distorted. This interpretation is possible, but, here we did not move the objects, but we changed the coordinate system. This is rather how we interpret the result. I prefer this interpretation since we can think changing coordinate system without objects. You can still think the applying transformation matrix is an applying an operator, but, object is just one subject to operate. I would like to think rather about operator itself. Then we can concentrate operator itself, the transformation matrix itself. In Figure 3, there is a point P, this point actually doesn't move in the space. But the coordinates was changed. Let me have an example as a city map. We can make any point or landmark as the origin of the city map. For instance, we can see the Zoologischer Garten as the origin of the city Berlin (in Germany) map. Also we can have an map that origin is Alexanderplatz station. The coordinates of other landmarks are not the same between these two maps.  But Zoologischer Garten never moved. It is just in the different coordinate system. We can define a coordinate system arbitrarily. Therefore, which coordinate system we can use is our choice. The important thing here is we can convert one coordinate system to another coordinate system. Then, there is no problem to choose your favorite coordinate system. This is the motivation and reason that I am explaining transforming coordinate systems. (Linear) Transforming coordinate system is generally done by a transformation matrix.

Transforming coordinate system follows the next equation:
If A is regular, there exists an inverse matrix of A,
I think this is clear when you see Figure 3. The origin movement is just a translation, the inverse is movement of the opposite direction. Therefore, we can rewrite the equation to:

On the other hand, when a normal vector n is defined in a plane which has a point P, it is defined in the coordinate system \Sigma as:

When this normal is transformed in the coordinate system \Sigma', it is defined as:
Let plug in Equation (1) into (2).
Let's compare the Equation (3) and Equation (4):
Equation (5) shows how the normal vector is transformed. The vectors that are transformed like this are called ``covariant vector.'' Usual vectors are called ``contravariant vector.'' co (together) variant (changing) vector changes following the coordinate system's transformation. On the other hand, usual vectors changed their representation, but not changed as the vector itself. It is like the point P changed the representation of the coordinates, but the point itself is actually never moved. Therefore, these vectors are called contra (against) variant (change) vector.


Why is the normal transformation a inverse transpose? (6)

Normal vector as a covariant vector

This third explanation includes how to transform normal vectors. This explanation is based on one of my favorite book[5] by Sugihara. This explanation is a bit formal and less intuitive compare to the first and the second explanation. If you are not interested in a formal explanation, you can skip this section.

First I would like to introduce the affine transformation.

Affine transformation is one of linear transformations. This is quite often used in computer graphics area. Affine transformation transform a line to a line and keep the ratio on a line. If we include a degenerated case, a triangle is always transformed into a triangle. Assume a representation of a three-dimensional affine transformation is a 3x3 matrix. This transformation is a transformation between two coodinate systems. Therefore, I think an object deformation by the transformation is a secondary effect. As a result, we can deform an object. However, this is a transformation of coordinate systems, we can not deform an object arbitrarily. In general, this transformation is a combination of scaling, roration, translation, and shearing. This means, an affine transformation can not transform a triangle to a circle.

If we consider an affine transformation as an coordinate transformation, our interest is how the coordinate of a point P represented in the two different coordinate systems \Sigma and \Sigma'. Note, P itself doesn't move as shown in Figure[3]. P doesn't move, but the coordinates are changed depends on the coordinate system.  These coordinate systems may not have perpendicular basis. They may change the distance depends on the axis direction. For example, x direction is two times magnified to y direction.
Figure 3. A matrix A transforms the coordinate system \Sigma to \Sigma'. Note: the point P doesn't move, but its coordinates representation may differ depends on the coordinate system.

It's a bit cumbersome, but I will write down how the coordinates of P is represented in the coordinate system \Sigma and \Sigma'.

A point P is represented in a coordinate system \Sigma,
and is represented in a coordinate system \Sigma',
A representation means you can define the concrete coordinates, e.g., (1,1,0)^T. On the other hand, if you just have a point P, you don't need to know what is the exact coordinates. Even you don't need to know this point P is in 2-dimensional space or 3-dimensional space. The relationship between a point P and its representation is similar to a linear operator T and its representation matrix M. I can also think the relationship between an interface and its implementation in a programming language context. (I just think the next step of this analogy is related with Futamura projections[1], but it is beyond this article and my understanding is not enought to explain it yet.)


Why is the normal transformation a inverse transpose? (5)

Normal defined by inner product

Let's think about a normal is defined by inner product.
As you see, the normal is actually in a row space instead of in a column space.
A coordinate transformation matrix M is a transformation of a column vector.
Therefore, the following equation is not defined.
This is my second explanation what is the difference between normal vector and usual vector.


Why is the normal transformation a inverse transpose? (4)

Normal vector as a perpendicular vector of the surface tangent vectors

Normal vector has the same direction to the cross product of two tangent vectors of a surface.  Figure 2 shows the tangent vectors are correctly transformed by the matrix M that magnifies only x direction. However, their cross product is not necessary to the same as the transformation of normal by the matrix M.

Figure 2. The normal vector n is a cross product of tangent vectors u and v. Tangent vectors are linear to M, but not for the normal vector.
In short, tangent vectors u, v can be transformed by M, but their cross product is not. In general,
Are you convinced this is the reason distinguishing a usual vector and a normal vector? If you think about the x component of the cross product, uy vz - uz vy, this is not linear. Therefore, a linear transformation cannot transform this. This is my first explanation.


Why is the normal transformation a inverse transpose? (3)

The difference between normal vectors and usual vectors

Actually, normal vectors and usual vectors are different type of vectors. A usual vector itself defines direction or position. Let's recall how the normal vector is defined. A normal vector is not defined by itself. First a face is defined using usual vectors, then, the normal vector is defined according to this surface. The definition of the normal vector is a unit vector that perpendicular to the surface, in other words, it is defined as the inner product of surface tangent vector and the normal vector is 0. We should think these vectors are different vectors when they follow a transformation. As we saw in the Figure 1, when we transform a normal vector, the meaning of normal is lost. If we consider this vector is a usual vector, twice large x component is still fine, just it is not normal anymore. In this sense, normal vector has a special meaning, that should be always perpendicular to the surface.

The following three sections, we will see this problem more deeply. All three explanations are mathematically the same, but I think each has a slightly different intuition, so I will write all the three explanations.

Why is the normal transformation a inverse transpose? (2)

What is the problem? What is the transformation matrix of normal?

We use transformation matrices every day when we move objects in a computer. Current the state of the art DCC (digital contents creation) software usually represents objects with triangles or polygons.  Each vertex of the triangles or polygons usually has its coordinates.  When we rotate or move each vertex, we apply a transformation matrix on each vertex. A vertex is usually three dimensional vector in computer graphics.

We can define a normal vector for each triangle. A normal vector points out to which direction a triangle face is oriented. This normal vector is also a three dimensional vector. In a 3D computer graphics system, normal vectors are important since we need these normal vectors to compute how bright the surfaces are. Because a usual vector can be transformed by a matrix, it seems straightforward to use the same matrix to transform a normal vector. However, this fails. But why? The article is all about this ``why?''

Why an usual transformation matrix fails on a normal vector?

According to [3], an explanation by Eric Haines is quite good. The book [6] has the same explanation, I see that is a great explanation. A similar explanation can also found in [2]. Figure 1 shows the similar explanation.

Figure1. Scaling on a normal break the normal.
Figure 1 shows a three dimensional plane standing straight (standing z up direction) and the view point is from the top (view is the z minus direction). Simply, We are looking down a wall from the top. This wall has a (1,1,0) normal vector. Let's think to apply the following transformation matrix M. This matrix M magnifies x direction twice than other directions.
Now you see the wall is double sized in x direction, but, if we apply this matrix to the normal, the normal is no longer normal vector of this wall. In the left figure of Figure 1, the normal vector is perpendicular to the wall, but, in the right figure, the transformed normal vector is no perpendicular to the wall any more. This is the problem.

Why cannot we transform the normal vector same as usual vectors?

Why is the normal transformation a inverse transpose? (1)


Several books[2,6] explained the normal vector transformation matrix is $(M^{-1})^{T}$. I always forget this formula. This time I understand it a bit in three different ways, so I will write them down here.


Assume matrix M is applied to a vector where the M is a coordinate transformation matrix. For example, M could be a translation, rotation, scaling, and so forth. To transform a position vector, we can just multiply this matrix M. However, we may fail when we transform a normal vector by just multiplying the matrix M. Several books mentioned normal transformation matrix should be $(M^{-1})^{T}$[2,6].

In this article, I would like to mention about the following three issues:

  • What is the problem? What is the transformation matrix of normal?
  • Why may multiplying matrix M fail?
  • Why is it an inverse of transpose of matrix M?
(The references will be shown up at the end of this series.)

Next time, I would like to talk about the problem.