More Bits per Color

We need more bits per color component in our 3D accelerators.

I have been pushing for a couple more bits of range for several years now,
but I now extend that to wanting full 16 bit floating point colors throughout
the graphics pipeline. A sign bit, ten bits of mantissa, and five bits of
exponent (possibly trading a bit or two between the mantissa and exponent).
Even that isn’t all you could want, but it is the rational step.

It is turning out that I need a destination alpha channel for a lot of the
new rendering algorithms, so intermediate solutions like 10/12/10 RGB
formats aren’t a good idea. Higher internal precision with dithering to 32
bit pixels would have some benefit, but dithered intermediate results can
easily start piling up the errors when passed over many times, as we have
seen with 5/6/5 rendering.

Eight bits of precision isn’t enough even for full range static image
display. Images with a wide range usually come out fine, but restricted
range images can easily show banding on a 24-bit display. Digital television
specifies 10 bits of precision, and many printing operations are performed
with 12 bits of precision.

The situation becomes much worse when you consider the losses after multiple
operations. As a trivial case, consider having multiple lights on a wall,
with their contribution to a pixel determined by a texture lookup. A single
light will fall off towards 0 some distance away, and if it covers a large
area, it will have visible bands as the light adds one unit, two units, etc.
Each additional light from the same relative distance stacks its contribution
on top of the earlier ones, which magnifies the amount of the step between
bands: instead of going 0,1,2, it goes 0,2,4, etc. Pile a few lights up like
this and look towards the dimmer area of the falloff, and you can believe you
are back in 256-color land.

There are other more subtle issues, like the loss of potential result values
from repeated squarings of input values, and clamping issues when you sum up
multiple incident lights before modulating down by a material.

Range is even more clear cut. There are some values that have intrinsic
ranges of 0.0 to 1.0, like factors of reflection and filtering. Normalized
vectors have a range of -1.0 to 1.0. However, the most central quantity in
rendering, light, is completely unbounded. We want a LOT more than a 0.0 to
1.0 range. Q3 hacks the gamma tables to sacrifice a bit of precision to get
a 0.0 to 2.0 range, but I wanted more than that for even primitive rendering
techniques. To accurately model the full human sensable range of light
values, you would need more than even a five bit exponent.

This wasn’t much of an issue even a year ago, when we were happy to just
cover the screen a couple times at a high framerate, but realtime graphics
is moving away from just “putting up wallpaper” to calculating complex
illumination equations at each pixel. It is not at all unreasonable to
consider having twenty textures contribute to the final value of a pixel.
Range and precision matter.

A few common responses to this pitch:

“64 bits per pixel??? Are you crazy???” Remember, it is exactly the same
relative step as we made from 16 bit to 32 bit, which didn’t take all
that long.

Yes, it will be slower. That’s ok. This is an important point: we can’t
continue to usefully use vastly greater fill rate without an increase in
precision. You can always crank the resolution and multisample anti-alaising
up higher, but that starts to have diminishing returns well before you use of
the couple gigatexels of fill rate we are expected to have next year. The
cool and interesting things to do with all that fill rate involves many
passes composited into less pixels, making precision important.

“Can we just put it in the texture combiners and leave the framebuffer at 32
bits?” No. There are always going to be shade trees that overflow a given
number of texture units, and they are going to be the ones that need the
extra precision. Scales and biases between the framebuffer and the higher
precision internal calculations can get you some mileage (assuming you can
bring the blend color into your combiners, which current cards can’t), but
its still not what you want. There are also passes which fundamentally
aren’t part of a single surface, but still combine to the same pixels, as
with all forms of translucency, and many atmospheric effects.

“Do we need it in textures as well?” Not for most image textures, but it
still needs to be supported for textures that are used as function look
up tables.

“Do we need it in the front buffer?” Probably not. Going to a 64 bit front
buffer would probably play hell with all sorts of other parts of the system.
It is probably reasonable to stay with 32 bit front buffers with a blit from
the 64 bit back buffer performing a lookup or scale and bias operation before
dithering down to 32 bit. Dynamic light adaptation can also be done during
this copy. Dithering can work quite well as long as you are only performing
a single pass.

I used to be pitching this in an abstract “you probably should be doing this”
form, but two significant things have happened that have moved this up my hit
list to something that I am fairly positive about.

Mark Peercy of SGI has shown, quite surprisingly, that all Renderman surface
shaders can be decomposed into multi-pass graphics operations if two
extensions are provided over basic OpenGL: the existing pixel texture
extension, which allows dependent texture lookups (matrox already supports a
form of this, and most vendors will over the next year), and signed, floating
point colors through the graphics pipeline. It also makes heavy use of the
existing, but rarely optimized, copyTexSubImage2D functionality for
temporaries.

This is a truly striking result. In retrospect, it seems obvious that with
adds, multiplies, table lookups, and stencil tests that you can perform any
computation, but most people were working under the assumption that there
were fundamentally different limitations for “realtime” renderers vs offline
renderers. It may take hundreds or thousands of passes, but it clearly
defines an approach with no fundamental limits. This is very important.
I am looking forward to his Siggraph paper this year.

Once I set down and started writing new renderers targeted at GeForce level
performance, the precision issue has started to bite me personally. There
are quite a few times where I have gotten visible banding after a set of
passes, or have had to worry about ordering operations to avoid clamping.
There is nothing like actually dealing with problems that were mostly
theoretical before…

64 bit pixels. It is The Right Thing to do. Hardware vendors: don’t you be
the company that is the last to make the transition.

Leave a Reply