On the importance of choosing a convenient basis

The benefits of Caltech’s proximity to Hollywood don’t usually trickle down to measly grad students like myself, except in the rare occasions when we befriend the industry’s technical contingent. One of my friends is a computer animator for Disney, which means that she designs algorithms enabling luxuriously flowing hair or trees with realistic lighting or feathers that have gorgeous texture, for movies like Wreck-it Ralph. Empowering computers to efficiently render scenes with these complicated details is trickier than you’d think and it requires sophisticated new mathematics. Fascinating conversations are one of the perks of having friends like this. But so are free trips to Disneyland! A couple nights ago, while standing in line for The Tower of Terror, I asked her what’s she’s currently working on. She’s very smart, as can be evidenced by her BS/MS in Computer Science/Mathematics from MIT, but she asked me if I “know about spherical harmonics.” Asking this to an aspiring quantum mechanic is like asking an auto mechanic if they know how to use a monkey wrench. She didn’t know what she was getting herself into!

me, LIGO, Disney

IQIM, LIGO, Disney

Along with this spherical harmonics conversation, I had a few other incidents last week that hammered home the importance of choosing a convenient basis when solving a scientific problem. First, my girlfriend works on LIGO and she’s currently writing her thesis. LIGO is a huge collaboration involving hundreds of scientists, and naturally, nobody there knows the detailed inner-workings of every subsystem. However, when it comes to writing the overview section of ones thesis, you need to at least make a good faith attempt to understand the whole behemoth. Anyways, my girlfriend recently asked if I know how the wavelet transform works. This is another example of a convenient basis, one that is particularly suited for analyzing abrupt changes, such as detecting the gravitational waves that would be emitted during the final few seconds of two black holes merging (ring-down). Finally, for the past couple weeks, I’ve been trying to understand entanglement entropy in quantum field theories. Most of the calculations that can be carried out explicitly are for the special subclass of quantum field theories called “conformal field theories,” which in two dimensions have a very convenient ‘basis’, the Virasoro algebra.

So why does a Disney animator care about spherical harmonics? It turns out that every frame that goes into one of Disney’s movies needs to be digitally rendered using a powerful computing cluster. The animated film industry has traded the painstaking process of hand-animators drawing every single frame, for the almost equally time-consuming process of computer clusters generating every frame. It doesn’t look like strong AI will be available in our immediate future, and in the meantime, humans are still much better than computers at detecting patterns and making intuitive judgements about the ‘physical correctness of an image.’ One of the primary advantages of computer animation is that an animator shouldn’t need to shade in every pixel of every frame — some of this burden should fall on computers. Let’s imagine a thought experiment. An animator wants to get the lighting correct for a nighttime indoor shot. They should be able to simply place the moon somewhere out of the shot, so that its glow can penetrate through the windows. They should also be able to choose from a drop down menu and tell the computer that a hand drawn lightbulb is a ‘light source.’ The computer should then figure out how to make all of the shadows and brightness appear physically correct. Another example of a hard problem is that an animator should be able to draw a character, then tell the computer that the hair they drew is ‘hair’, so that as the character moves through scenes, the physics of the hair makes sense. Programming computers do these things autonomously is harder than it sounds.

In the lighting example, imagine you want to get the lighting correct in a forest shot with complicated pine trees and leaf structures. The computer would need to do the ray-tracing for all of the photons emanating from the different light sources, and then the second-order effects as these photons reflect, and then third-order effects, etc. It’s a tall order to make the scene look accurate to the human eyeball/brain. Instead of doing all of this ray-tracing, it’s helpful to choose a convenient basis in order to dramatically speed up the processing. Instead of the complicated forest example, let’s imagine you are working with a tree from Super Mario Bros. Imagine drawing a sphere somewhere in the middle of this and then defining a ‘height function’, which outputs the ‘elevation’ of the tree foliage over each point on the sphere. I tried to use suggestive language, so that you’d draw an analogy to thinking of Earth’s ‘height function’ as the elevation of mountains and the depths of trenches over the sphere, with sea-level as a baseline. An example of how you could digitize this problem for a tree or for the earth is by breaking up the sphere into a certain number of pixels, maybe one per square meter for the earth (5*10^14 square meters gives approximately 2^49 pixels), and then associating an integer height value between [-2^15,2^15] to each pixel. This would effectively digitize the height map of the earth. In this case, keeping track of the elevation to approximately the meter level. But this leaves us with a huge amount of information that we need to store, and then process. We’d have to keep track of the height value for each pixel, giving us approximately 2^49*2^16=2^65 bits=4 exabytes that we’d have to keep track of. And this is for an easy static problem with only meter resolution! We can store this information much more efficiently using spherical harmonics.


There are many ways to think about spherical harmonics. Basically, they’re functions which map points on the sphere to real numbers Y_l^m: (\theta,\phi) \mapsto Y_l^m(\theta,\phi)\in\mathbb{R}, such that they satisfy a few special properties. They are orthogonal, meaning that if you multiply two different spherical harmonics together and then integrate over the sphere, then you get zero. If you square one of the functions and then integrate over the sphere, you get a finite, nonzero value. This means that they are orthogonal functions. They also span the space of all height functions that one could define over the sphere. This means that for a planet with an arbitrarily complicated topography, you would be able to find some weighted combination of different spherical harmonics which perfectly describes that planet’s topography. These are the key properties which make a set of functions a basis: they span and are orthogonal (this is only a heuristic). There is also a natural way to think about the light that hits the tree. We can use the same sphere and simply calculate the light rays as they would hit the ideal sphere. With these two different ‘height functions’, it’s easy to calculate the shadows and brightness inside the tree. You simply convolve the two functions, which is a fast operation on a computer. It also means that if the breeze slightly changes the shape of the tree, or if the sun moves a little bit, then it’s very easy to update the shading. Implicit in what I just said, using spherical harmonics allows us to efficiently store this height map. I haven’t calculated this on a computer, but it doesn’t seem totally crazy to think that we’d be able to store the topography of the earth to a reasonable accuracy, with 100 nonzero coefficients of the spherical harmonics to 64 bits of precision, 2^7*2^6= 2^13 << 2^65. Where does this cost savings come from? It comes from the fact that the spherical harmonics are a convenient basis, which naturally encode the types of correlations we see in Earth’s topography — if you’re standing at an elevation of 2000m, the area within ten meters is probably at a similar elevation. Cliffs are what break this basis — but are what the wavelet basis was designed to handle.

I’ve only described a couple bases in this post and I’ve neglected to mention some of the most famous examples! This includes the Fourier basis, which was designed to encode periodic signals, such as music and radio waves. I also have not gone into any detail about the Virasoro algebra, which I mentioned at the beginning of this post, and I’ve been using it heavily for the past few weeks. For the sake of diversity, I’ll spend a few sentences whetting your apetite. Complex analysis is primarily the study of analytic functions. In two dimensions, these analytic functions “preserve angles.” This means that if you have two curves which intersect at a point with angle \theta, then after using an analytic function to map these curves to their image, also in the complex plane, then the angle between the curves will still be \theta. An especially convenient basis for the analytic functions in two-dimensions (\{f: \mathbb{C} \to \mathbb{C}\}, where f(z) = \sum_{n=0}^{\infty} a_nz^n) is given by the set of functions \{l_n = -z^{n+1}\partial_z\}. As always, I’m not being exactly precise, but this is a ‘basis’ because we can encode all the information describing an infinitesimal two-dimensional angle-preserving map using these elements. It turns out to have incredibly special properties, including that its quantum cousin yields something called the “central charge” which has deep ramifications in physics, such as being related to the c-theorem. Conformal field theories are fascinating because they describe the physics of phase transitions. Having a convenient basis in two-dimensions is a large part of why we’ve been able to make progress in our understanding of two-dimensional phase transitions (more important is that the 2d conformal symmetry group is infinite-dimensional, but that’s outside the scope of this post.) Convenient bases are also important for detecting gravitational waves, making incredible movies and striking up nerdy conversations in long lines at Disneyland!


9 thoughts on “On the importance of choosing a convenient basis

  1. I feel like you’re being a little inconsistent with your precision values here–giving 2^16 values for the heights versus 2^7 for the Spherical Harmonics. There’s still a profound savings in information, though.

    How does $z^2$ preserve angles? Wouldn’t it double an angle?

    • You’re absolutely right that I was imprecise in this calculation and subsequent explanation. In order to go deeper would require much more work. Guilty.

      You hit upon the key loophole in this definition of conformal map. In 2d, the conformal maps are the analytic functions *at points where their derivative is nonzero!* f(z) = z^2 doesn’t preserve angles at the origin, but infinitesimally, it does at other points.

  2. The spherical harmonics are special cases of the rotation matrices “D” (whose properties are enumerated in Kurt Gottfried’s Quantum Mechanics, for example). But one computationally advantageous property that is *not* in Gottfried, is that these rotation matrices (for arbitrary spin-j representations) take a purely algebraic form in the complex values (za,zb) = (Q1+iQ2,Q3+iQ4), where the four Q* are real-valued quaternionic coordinates. Particularly for large-j systems, the computational speed-up in transformations (relative to Euler-angle variables) is many orders of magnitude.

  3. In geometry, another nice example of a convenient basis are the directions of extreme bending on a (hyper)surface, i.e., the principal curvature directions. In fact, since God was in a good mood that day, these directions are always orthogonal (and can be expressed as the eigenvectors of a self-adjoint operator).

    I just saw a nice talk by Ingrid Daubechies about sparsity in signal analysis and in computation. The theme there (as you might expect) is that many natural signals are extremely sparse, as long as you can find a convenient basis. One question raised in the talk was: is this notion of “sparsity” too vague to be meaningful? My personal take is that this question of picking a convenient basis is kind of like Kolmogorov complexity. In compressed sensing your signal gets more and more compressible as you specialize the dictionary (consider a basis which includes your signal as one of the basis vectors!). With Kolmogorov complexity, your signal gets more and more compressible as you specialize the language (my favorite example is 64 kilobyte computer graphics demos, which have become way more impressive ever since they’ve been written in the “language” of the graphics API, rather than a general-purpose programming language). In short: choosing a convenient basis is important! 🙂

    Nice post, Shaun.

    • Keenan, did you get a pingback because of my link under “sophisticated new mathematics?” Anyways, your comment is close to home. The first paper John recommended to me was Charles Bennett’s “Logical Depth and Physical Complexity”, which is intimately related to your points about Kolmogorov complexity. Thanks for the comment, but I wish you were still at Caltech to discuss these things in person!

      • A great many of Joseph Landsberg’s recent articles, and especially many of the topics covered in his new textbook Tensors: Geometry and Applications, can be appreciated as meditations upon the general topic of finding efficient/accurate/compact/low-rank representations of transport systems—with “transport” understood broadly as encompassing both dynamical flow and communication channels. Moreover, the (mostly algebraic) ideas of Landsberg’s text gain further power when they are blended with the (mostly geometric) ideas of texts like Mikio Nakahara’s Geometry, Topology and Physics.

        (Needless to say, there are many more great textbooks on algebraic geometry and dynamics than just the above two …)

        For younger students, the point is that the 20th century’s great quantum textbooks, from Dirac’s Principles of Quantum Mechanics (1930) and Subrahmanyan Chandrasekhar’s An Introduction to the Study of Stellar Structure (which has excellent descriptions of classical and quantum transport theory), extending through seven dacades to Nielsen and Chuang’s Quantum Computation and Quantum Information can be systematically reconsolidated—often to great practical advantage—by transcribing the main results of 20th century physics into the natural mathematical language of the 21st century’s emerging algebraic/symplectic/dynamic mathematical synthesis.

        It’s a great time to be a young researcher! 🙂

  4. Pingback: Effects and Benefits of Computerization in Different Spheres

  5. Pingback: Hacking nature: loopholes in the laws of physics | Quantum Frontiers

Your thoughts here.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s