Saturday, 30 July 2016

Units for Physically Based Rendering

Physically-based rendering (PBR) should really use physical units, though many PBR engines don't.

 

Sunlight is typically described as being in watts per square metre. But that represents the energy across the entire spectrum, and has no concept of colour.

So for PBR, the units for directional light, e.g. sunlight are watts per square metre per nanometre.

Sunlight comes from so far away that the direction of the light is essentially parallel. But a local light source like a light bulb emits its energy in all directions.

So the irradiance due to a light bulb varies with distance from the bulb. If the total spectral power of the bulb, P (the spectral flux), is measured in watts per nanometre (for any given colour on the spectrum), suppose that the bulb has a radius of r metres, and thus a surface area of $4 \pi r^2$ square metres. Then the irradiance, at the surface, will be $I=P/ (4 \pi r^2) w/m^2/nm$, in the same units as sunlight.

But away from the bulb, at distance R, the same power passes through a larger surface area. So again, $I = P / (4 \pi R^2)$, where P is the same total spectral power as before.

Thus $I(R) = I(r) (\frac{r}{R})^2$ - the irradiance follows an inverse-square power law.

So we don't use irradiance to measure point lights. If the light is uniformly distributed by direction, we can use spectral flux P, watts/nm.

the radiance at any point is given in watts per steradian per nm, where there are 4pi steradians across the entire sphere.

Rendering

The challenge with rendering is to recreate (on a lcd monitor, cinema screen, or in print) the radiance that the you would perceive if you were really looking at the thing the image represents.
Imagine you want to create the experience of flying at ten thousand feet. It looks something like this:
photo at 10,000 feet altitude
trueSKY render at 10,000 feet altitude

But in rendering we don't (necessarily) want to recreate what the photo or video of something would look like: that's actually a more complex problem. The fundamental challenge of rendering is to see if we can recreate the real thing. Consider one point in the image, perhaps part of the sky. The light from that point that the eye would perceive is defined by a spectrum, like this:
and the way the human eye would perceive it is defined by the response of its three types of cone (ignoring rods for now - those are for night-vision).

The three cone types correspond only roughly to red green and blue, and they overlap considerably. So they are called X, Y and Z. These functions, mapped by the CIE, don't describe a precise physiological response in the eye - but they do allow us to match perceived colours from different sources as the eye sees them. Having these functions, we can calculate three responses:

$X= \int_{\lambda} f_x(\lambda) E(\lambda) d\lambda $
$Y= \int f_y E$
$Z= \int f_z E$

X, Y and Z are the three numbers we want to reproduce.
Now monitors have red, green, and blue elements to each pixel, and the wavelengths of these are pretty sharply concentrated. They're not single-frequency spikes like lasers, but they're clearly separated.
So in the "true" image we had one continuous spectrum, but the monitor gives out three distinct colours. You can calibrate your monitor to get its exact curves (see e.g. here) although the calibration will only be valid until you adjust the monitor's settings.

We would like to reproduce the same three X, Y, and Z values as above: that will make the colour and brightness of the point/pixel look the same as reality to the eye.

Of course, the eye response curves are meant to represent the typical human eye. If your eyes don't match the standard curves - for example, if you're colour blind - that assumption is invalid, and the two radiances won't look the same to you.

$X= \int f_x E_m$
$Y= \int f_y E_m$
$Z= \int f_z E_m$

where $E_m$ is the monitor spectral radiance, which is a combination of what the red, green, and blue elements are putting out:

$E_m(\lambda)=R_m(\lambda)+G_m(\lambda)+B_m(\lambda)$

At any given $\lambda$, we expect at most one of those values to be significant.

So for a known spectral radiance distribution, we solve for Rm, Gm and Bm:

$\int f_x E = \int f_x (R_m+G_m+B_m)$
$\int f_y E = \int f_y (R_m+G_m+B_m)$
$\int f_z E = \int f_z (R_m+G_m+B_m)$

Note we can't simply say

$\int f_x E = \int f_x B_m$

etc. X, Y and Z are not exactly blue, green and red. More like blue, yellowy-green, and greeny-yellow-violet. Our eyes know how to interpret the infinite combinations of cone responses into colour and brightness.

Let's assume that the spectral profile of our monitor is known, and that generally:

$R_m(\lambda)=R \times m_R(\lambda)$

where R is the brightness, from zero to one, of the red part of the pixel, and $m_R$ is a known function for the monitor.

\begin{align}
\int f_x E &= \int f_x (R m_R+G m_G+B m_B) \\
 &=R \int f_x m_R + G \int f_x m_G + B \int f_x m_B \\
 \end{align}

So knowing the eye functions $f_x$ etc., and the monitor functions $m_R$ etc, the right-hand-side integrations can be precalculated, leaving us with:

\begin{align}
\int f_x E &= R I_xR + G I_xG + B I_xB \\
\int f_y E &= R I_yR + G I_yG + B I_yB \\
\int f_z E &= R I_zR + G I_zG + B I_zB \\
 \end{align}

This is a 3x3 matrix equation: linear algebra.

\begin{align}
c &= M c_m
\end{align}

where c is the vector of three "ground truth" integrals, $c_m$ is the vector of three monitor rgb values (assuming a linear monitor - more on this later), and $M$ is the monitor-eye matrix, which is constant for a given monitor and viewer.

While X is kind-of blue, and Z is kind-of red, we can't really regard any of the integral constants that make up M as being close enough to zero to be negligible, except for maybe $I_xR$. We'll leave it in for now. We must get the matrix inverse of M to solve for $c_m$. If we knew the shape of $E(\lambda)$ through the whole spectrum, and assuming we have all the monitor data, and assuming we have a viewer with typical human eyes, we'd be able to calculate the exact $c_m$, the exact RGB values to send to the monitor that will reproduce the ground truth view. We would have to hope, as well, that when we've found $c_m$, none of its members are greater than 1.0. Because monitors have low dynamic range, for now, we can't represent many of the brightness values that in real life we encounter every day.

Much of the above must be taken on trust, or worked around. But what do we know about c? We probably haven't calculated the entire spectral radiance curve for the visual spectrum, for each pixel onscreen. We've probably calculated three values. Again, red, green, and blue. And we must make an assumption about how those three numbers approximate the full spectrum.

Suppose we assume that each of our three calculated values represents a range of the spectrum over which the spectral radiance is constant:

We can refine this later with a better shape. But our three columns roughly approximate the full spectrum, and they allow us to calculate $c$ as follows:


\begin{align}
c_x &= \int f_x E_d \\
c_y &= \int f_y E_d \\
c_z &= \int f_z E_d \\
 \end{align}

where $E_d$ is our rendered sr curve, which is:

\begin{align}
E_d &= R_d (r_0 \le \lambda < r_1) \\
&= G_d (g_0 \le \lambda < g_1) \\
&= B_d (b_0 \le \lambda < b_1) \\
 \end{align}

where $R_d$ etc are the rendered spectral radiances. So we can now calculate $c$, or at least $c_d$, the rendered approximation to the ground truth. And finally:

$c_m = M^{-1} $c_d