Reading the sub(linear) text

Physicists are not known for finesse. “Even if it cost us our funding,” I’ve heard a physicist declare, “we’d tell you what we think.” Little wonder I irked the porter who directed me toward central Cambridge.

The University of Cambridge consists of colleges as the US consists of states. Each college has a porter’s lodge, where visitors check in and students beg for help after locking their keys in their rooms. And where physicists ask for directions.

Last March, I ducked inside a porter’s lodge that bustled with deliveries. The woman behind the high wooden desk volunteered to help me, but I asked too many questions. By my fifth, her pointing at a map had devolved to jabbing.

Read the subtext, I told myself. Leave.

Or so I would have told myself, if not for that afternoon.

That afternoon, I’d visited Cambridge’s CMS, which merits every letter in “Centre for Mathematical Sciences.” Home to Isaac Newton’s intellectual offspring, the CMS consists of eight soaring, glass-walled, blue-topped pavilions. Their majesty walloped me as I turned off the road toward the gatehouse. So did the congratulatory letter from Queen Elizabeth II that decorated the route to the restroom.


I visited Nilanjana Datta, an affiliated lecturer of Cambridge’s Faculty of Mathematics, and her student, Felix Leditzky. Nilanjana and Felix specialize in entropies and one-shot information theory. Entropies quantify uncertainties and efficiencies. Imagine compressing many copies of a message into the smallest possible number of bits (units of memory). How few bits can you use per copy? That number, we call the optimal compression rate. It shrinks as the number of copies compressed grows. As the number of copies approaches infinity, that compression rate drops toward a number called the message’s Shannon entropy. If the message is quantum, the compression rate approaches the von Neumann entropy.

Good luck squeezing infinitely many copies of a message onto a hard drive. How efficiently can we compress fewer copies? According to one-shot information theory, the answer involves entropies other than Shannon’s and von Neumann’s. In addition to describing data compression, entropies describe the charging of batteriesthe concentration of entanglementthe encrypting of messages, and other information-processing tasks.

Speaking of compressing messages: Suppose one-shot information theory posted status updates on Facebook. Suppose that that panel on your Facebook page’s right-hand side showed news weightier than celebrity marriages. The news feed might read, “TRENDING: One-shot information theory: Second-order asymptotics.”

Second-order asymptotics, I learned at the CMS, concerns how the optimal compression rate decays as the number of copies compressed grows. Imagine compressing a billion copies of a quantum message ρ. The number of bits needed about equals a billion times the von Neumann entropy HvN(ρ). Since a billion is less than infinity, 1,000,000,000 HvN(ρ) bits won’t suffice. Can we estimate the compression rate more precisely?

The question reminds me of gas stations’ hidden pennies. The last time I passed a station’s billboard, some number like $3.65 caught my eye. Each gallon cost about $3.65, just as each copy of ρ costs about HvN(ρ) bits. But a 9/10, writ small, followed the $3.65. If I’d budgeted $3.65 per gallon, I couldn’t have filled my tank. If you budget HvN(ρ) bits per copy of ρ, you can’t compress all your copies.

Suppose some station’s owner hatches a plan to promote business. If you buy one gallon, you pay $3.654. The more you purchase, the more the final digit drops from four. By cataloguing receipts, you calculate how a tank’s cost varies with the number of gallons, n. The cost equals $3.65 × n to a first approximation. To a second approximation, the cost might equal $3.65 × n + an, wherein a represents some number of cents. Compute a, and you’ll have computed the gas’s second-order asymptotics.

Nilanjana and Felix computed a’s associated with data compression and other quantum tasks. Second-order asymptotics met information theory when Strassen combined them in nonquantum problems. These problems developed under attention from Hayashi, Han, Polyanski, Poor, Verdu, and others. Tomamichel and Hayashi, as well as Li, introduced quantumness.

In the total-cost expression, $3.65 × n depends on n directly, or “linearly.” The second term depends on √n. As the number of gallons grows, so does √n, but √n grows more slowly than n. The second term is called “sublinear.”

Which is the word that rose to mind in the porter’s lodge. I told myself, Read the sublinear text.

Little wonder I irked the porter. At least—thanks to quantum information, my mistake, and facial expressions’ contagiousness—she smiled.



With thanks to Nilanjana Datta and Felix Leditzky for explanations and references; to Nilanjana, Felix, and Cambridge’s Centre for Mathematical Sciences for their hospitality; and to porters everywhere for providing directions.

“Feveral kinds of hairy mouldy fpots”

The book had a sheepskin cover, and mold was growing on the sheepskin. Robert Hooke, a pioneering microbiologist, slid the cover under one of the world’s first microscopes. Mold, he discovered, consists of “nothing elfe but feveral kinds of fmall and varioufly figur’d Mufhroms.” He described the Mufhroms in his treatise Micrographia, a 1665 copy of which I found in “Beautiful Science.” An exhibition at San Marino’s Huntington Library, “Beautiful Science” showcases the physics of rainbows, the stars that enthralled Galileo, and the world visible through microscopes.

Hooke image copy

Beautiful science of yesterday: An illustration, from Hooke’s Micrographia, of the mold.

“[T]hrough a good Microfcope,” Hooke wrote, the sheepskin’s spots appeared “to be a very pretty fhap’d Vegetative body.”

How like a scientist, to think mold pretty. How like quantum noise, I thought, Hooke’s mold sounds.

Quantum noise hampers systems that transmit and detect light. To phone a friend or send an email—“Happy birthday, Sarah!” or “Quantum Frontiers has released an article”—we encode our message in light. The light traverses a fiber, buried in the ground, then hits a detector. The detector channels the light’s energy into a current, a stream of electrons that flows down a wire. The variations in the current’s strength is translated into Sarah’s birthday wish.

If noise doesn’t corrupt the signal. From encoding “Happy birthday,” the light and electrons might come to encode “Hsappi birthdeay.” Quantum noise arises because light consists of packets of energy, called “photons.” The sender can’t control how many photons hit the detector.

To send the letter H, we send about 108 photons.* Imagine sending fifty H’s. When we send the first, our signal might contain 108- 153 photons; when we send the second, 108 + 2,083; when we send the third, 108 – 6; and so on. Receiving different numbers of photons, the detector generates different amounts of current. Different amounts of current can translate into different symbols. From H, our message can morph into G.

This spring, I studied quantum noise under the guidance of IQIM faculty member Kerry Vahala. I learned to model quantum noise, to quantify it, when to worry about it, and when not. From quantum noise, we branched into Johnson noise (caused by interactions between the wire and its hot environment); amplified-spontaneous-emission, or ASE, noise (caused by photons belched by ions in the fiber); beat noise (ASE noise breeds with the light we sent, spawning new noise); and excess noise (the “miscellaneous” folder in the filing cabinet of noise types).

Vahala image copy

Beautiful science of today: A microreso-nator—a tiny pendulum-like device— studied by the Vahala group.

Noise, I learned, has structure. It exhibits patterns. It has personalities. I relished studying those patterns as I relish sending birthday greetings while battling noise. Noise types, I see as a string of pearls unearthed in a junkyard. I see them as “pretty fhap[es]” in Hooke’s treatise. I see them—to pay a greater compliment—as “hairy mouldy fpots.”


*Optical-communications ballpark estimates:

  • Optical power: 1 mW = 10-3 J/s
  • Photon frequency: 200 THz = 2 × 1014 Hz
  • Photon energy: h𝜈 = (6.626 × 10-34 J . s)(2 × 1014 Hz) = 10-19 J
  • Bit rate: 1 GB = 109 bits/s
  • Number of bits per H: 10
  • Number of photons per H: (1 photon / 10-19 J) (10-3 J/s)(1 s / 109 bits)(10 bits / 1 H) = 108


An excerpt from this post was published today on Verso, the blog of the Huntington Library, Art Collection, and Botanical Gardens.

With thanks to Bassam Helou, Dan Lewis, Matt Stevens, and Kerry Vahala for feedback. With thanks to the Huntington Library (including Catherine Wehrey) and the Vahala group for the Micrographia image and the microresonator image, respectively.

The theory of everything: Help wanted

When Scientific American writes that physicists are working on a theory of everything, does it sound ambitious enough to you? Do you lie awake at night thinking that a theory of everything should be able to explain, well, everything? What if that theory is founded on quantum mechanics and finds a way to explain gravitation through the microscopic laws of the quantum realm? Would that be a grand unified theory of everything?

The answer is no, for two different, but equally important reasons. First, there is the inherent assumption that quantum systems change in time according to Schrodinger’s evolution: i \hbar \partial_t \psi(t) = H \psi(t). Why? Where does that equation come from? Is it a fundamental law of nature, or is it an emergent relationship between different states of the universe? What if the parameter t, which we call time, as well as the linear, self-adjoint operator H, which we call the Hamiltonian, are both emergent from a more fundamental, and highly typical phenomenon: the large amount of entanglement that is generically found when one decomposes the state space of a single, static quantum wavefunction, into two (different in size) subsystems: a clock and a space of configurations (on which our degrees of freedom live)? So many questions, so few answers.

The static multiverse

The perceptive reader may have noticed that I italicized the word ‘static’ above, when referring to the quantum wavefunction of the multiverse. The emphasis on static is on purpose. I want to make clear from the beginning that a theory of everything can only be based on axioms that are truly fundamental, in the sense that they cannot be derived from more general principles as special cases. How would you know that your fundamental principles are irreducible? You start with set theory and go from there. If that assumes too much already, then you work on your set theory axioms. On the other hand, if you can exhibit a more general principle from which your original concept derives, then you are on the right path towards more fundamentalness.

In that sense, time and space as we understand them, are not fundamental concepts. We can imagine an object that can only be in one state, like a switch that is stuck at the OFF position, never changing or evolving in any way, and we can certainly consider a complete graph of interactions between subsystems (the equivalent of a black hole in what we think of as space) with no local geometry in our space of configurations. So what would be more fundamental than time and space? Let’s start with time: The notion of an unordered set of numbers, such as \{4,2,5,1,3,6,8,7,12,9,11,10\}, is a generalization of a clock, since we are only keeping the labels, but not their ordering. If we can show that a particular ordering emerges from a more fundamental assumption about the very existence of a theory of everything, then we have an understanding of time as a set of ordered labels, where each label corresponds to a particular configuration in the mathematical space containing our degrees of freedom. In that sense, the existence of the labels in the first place corresponds to a fundamental notion of potential for change, which is a prerequisite for the concept of time, which itself corresponds to constrained (ordered in some way) change from one label to the next. Our task is first to figure out where the labels of the clock come from, then where the illusion of evolution comes from in a static universe (Heisenberg evolution), and finally, where the arrow of time comes from in a macroscopic world (the illusion of irreversible evolution).

The axioms we ultimately choose must satisfy the following conditions simultaneously: 1. the implications stemming from these assumptions are not contradicted by observations, 2. replacing any one of these assumptions by its negation would lead to observable contradictions, and 3. the assumptions contain enough power to specify non-trivial structures in our theory. In short, as Immanuel Kant put it in his accessible bedtime story The critique of Pure Reason, we are looking for synthetic a priori knowledge that can explain space and time, which ironically were Kant’s answer to that same question.

The fundamental ingredients of the ultimate theory

Before someone decides to delve into the math behind the emergence of unitarity (Heisenberg evolution) and the nature of time, there is another reason why the grand unified theory of everything has to do more than just give a complete theory of how the most elementary subsystems in our universe interact and evolve. What is missing is the fact that quantity has a quality all its own. In other words, patterns emerge from seemingly complex data when we zoom out enough. This “zooming out” procedure manifests itself in two ways in physics: as coarse-graining of the data and as truncation and renormalization. These simple ideas allow us to reduce the computational complexity of evaluating the next state of a complex system: If most of the complexity of the system is hidden at a level you cannot even observe (think pre retina-display era), then all you have to keep track of is information at the macroscopic, coarse-grained level. On top of that, you can use truncation and renormalization to zero in on the most likely/ highest weight configurations your coarse-grained data can be in – you can safely throw away a billion configurations, if their combined weight is less than 0.1% of the total, because your super-compressed data will still give you the right answer with a fidelity of 99.9%. This is how you get to reduce a 9 GB raw video file down to a 300 MB Youtube video that streams over your WiFi connection without losing too much of the video quality.

I will not focus on the second requirement for the “theory of everything”, the dynamics of apparent complexity. I think that this fundamental task is the purview of other sciences, such as chemistry, biology, anthropology and sociology, which look at the “laws” of physics from higher and higher vantage points (increasingly coarse-graining the topology of the space of possible configurations). Here, I would like to argue that the foundation on which a theory of everything rests, at the basement level if such a thing exists, consists of four ingredients: Math, Hilbert spaces with tensor decompositions into subsystems, stability and compressibility. Now, you know about math (though maybe not of Zermelo-Fraenkel set theory), you may have heard of Hilbert spaces if you majored in math and/or physics, but you don’t know what stability, or compressibility mean in this context. So let me motivate the last two with a question and then explain in more detail below: What are the most fundamental assumptions that we sweep under the rug whenever we set out to create a theory of anything that can fit in a book – or ten thousand books – and still have predictive power? Stability and compressibility.

Math and Hilbert spaces are fundamental in the following sense: A theory needs a Language in order to encode the data one can extract from that theory through synthesis and analysis. The data will be statistical in the most general case (with every configuration/state we attach a probability/weight of that state conditional on an ambient configuration space, which will often be a subset of the total configuration space), since any observer creating a theory of the universe around them only has access to a subset of the total degrees of freedom. The remaining degrees of freedom, what quantum physicists group as the Environment, affect our own observations through entanglement with our own degrees of freedom. To capture this richness of correlations between seemingly uncorrelated degrees of freedom, the mathematical space encoding our data requires more than just a metric (i.e. an ability to measure distances between objects in that space) – it requires an inner-product: a way to measure angles between different objects, or equivalently, the ability to measure the amount of overlap between an input configuration and an output configuration, thus quantifying the notion of incremental change. Such mathematical spaces are precisely the Hilbert spaces mentioned above and contain states (with wavefunctions being a special case of such states) and operators acting on the states (with measurements, rotations and general observables being special cases of such operators). But, let’s get back to stability and compressibility, since these two concepts are not standard in physics.


Stability is that quality that says that if the theory makes a prediction about something observable, then we can test our theory by making observations on the state of the world and, more importantly, new observations do not contradict our theory. How can a theory fall apart if it is unstable? One simple way is to make predictions that are untestable, since they are metaphysical in nature (think of religious tenets). Another way is to make predictions that work for one level of coarse-grained observations and fail for a lower level of finer coarse-graining (think of Newtonian Mechanics). A more extreme case involves quantum mechanics assumed to be the true underlying theory of physics, which could still fail to produce a stable theory of how the world works from our point of view. For example, say that your measurement apparatus here on earth is strongly entangled with the current state of a star that happens to go supernova 100 light-years from Earth during the time of your experiment. If there is no bound on the propagation speed of the information between these two subsystems, then your apparatus is engulfed in flames for no apparent reason and you get random data, where you expected to get the same “reproducible” statistics as last week. With no bound on the speed with which information can travel between subsystems of the universe, our ability to explain and/or predict certain observations goes out the window, since our data on these subsystems will look like white noise, an illusion of randomness stemming from the influence of inaccessible degrees of freedom acting on our measurement device. But stability has another dimension; that of continuity. We take for granted our ability to extrapolate the curve that fits 1000 data points on a plot. If we don’t assume continuity (and maybe even a certain level of smoothness) of the data, then all bets are off until we make more measurements and gather additional data points. But even then, we can never gather an infinite (let alone, uncountable) number of data points – we must extrapolate from what we have and assume that the full distribution of the data is close in norm to our current dataset (a norm is a measure of distance between states in the Hilbert space).

The emergence of the speed of light

The assumption of stability may seem trivial, but it holds within it an anthropic-style explanation for the bound on the speed of light. If there is no finite speed of propagation for the information between subsystems that are “far apart”, from our point of view, then we will most likely see randomness where there is order. A theory needs order. So, what does it mean to be “far apart” if we have made no assumption for the existence of an underlying geometry, or spacetime for that matter? There is a very important concept in mathematical physics that generalizes the concept of the speed of light for non-relativistic quantum systems whose subsystems live on a graph (i.e. where there may be no spatial locality or apparent geometry): the Lieb-Robinson velocity. Those of us working at the intersection of mathematical physics and quantum many-body physics, have seen first-hand the powerful results one can get from the existence of such an effective and emergent finite speed of propagation of information between quantum subsystems that, in principle, can signal to each other instantaneously through the action of a non-local unitary operator (rotation of the full system under Heisenberg evolution). It turns out that under certain natural assumptions on the graph of interactions between the different subsystems of a many-body quantum system, such a finite speed of light emerges naturally. The main requirement on the graph comes from the following intuitive picture: If each node in your graph is connected to only a few other nodes and the number of paths between any two nodes is bounded above in some nice way (say, polynomially in the distance between the nodes), then communication between two distant nodes will take time proportional to the distance between the nodes (in graph distance units, the smallest number of nodes among all paths connecting the two nodes). Why? Because at each time step you can only communicate with your neighbors and in the next time step they will communicate with theirs and so on, until one (and then another, and another) of these communication cascades reaches the other node. Since you have a bound on how many of these cascades will eventually reach the target node, the intensity of the communication wave is bounded by the effective action of a single messenger traveling along a typical path with a bounded speed towards the destination. There should be generalizations to weighted graphs, but this area of mathematical physics is still really active and new results on bounds on the Lieb-Robinson velocity gather attention very quickly.

Escaping black holes

If this idea holds any water, then black holes are indeed nearly complete graphs, where the notion of space and time breaks down, since there is no effective bound on the speed with which information propagates from one node to another. The only way to escape is to find yourself at the boundary of the complete graph, where the nodes of the black hole’s apparent horizon are connected to low-degree nodes outside. Once you get to a low-degree node, you need to keep moving towards other low-degree nodes in order to escape the “gravitational pull” of the black hole’s super-connectivity. In other words, gravitation in this picture is an entropic force: we gravitate towards massive objects for the same reason that we “gravitate” towards the direction of the arrow of time: we tend towards higher entropy configurations – the probability of reaching the neighborhood of a set of highly connected nodes is much, much higher than hanging out for long near a set of low-degree nodes in the same connected component of the graph. If a graph has disconnected components, then their is no way to communicate between the corresponding spacetimes – their states are in a tensor product with each other. One has to carefully define entanglement between components of a graph, before giving a unified picture of how spatial geometry arises from entanglement. Somebody get to it.

Erik Verlinde has introduced the idea of gravity as an entropic force and Fotini Markopoulou, et al. have introduced the notion of quantum graphity (gravity emerging from graph models). I think these approaches must be taken seriously, if only because they work with more fundamental principles than the ones found in Quantum Field Theory and General Relativity. After all, this type of blue sky thinking has led to other beautiful connections, such as ER=EPR (the idea that whenever two systems are entangled, they are connected by a wormhole). Even if we were to disagree with these ideas for some technical reason, we must admit that they are at least trying to figure out the fundamental principles that guide the things we take for granted. Of course, one may disagree with certain attempts at identifying unifying principles simply because the attempts lack the technical gravitas that allows for testing and calculations. Which is why a technical blog post on the emergence of time from entanglement is in the works.


So, what about that last assumption we seem to take for granted? How can you have a theory you can fit in a book about a sequence of events, or snapshots of the state of the observable universe, if these snapshots look like the static noise on a TV screen with no transmission signal? Well, you can’t! The fundamental concept here is Kolmogorov complexity and its connection to randomness/predictability. A sequence of data bits like:


has higher complexity (and hence looks more random/less predictable) than the sequence:


because there is a small computer program that can output each successive bit of the latter sequence (even if it had a million bits), but (most likely) not of the former. In particular, to get the second sequence with one million bits one can write the following short program:

string s = ’10′;
for n=1 to 499,999:
print s;

As the number of bits grows, one may wonder if the number of iterations (given above by 499,999), can be further compressed to make the program even smaller. The answer is yes: The number 499,999 in binary requires \log_2 499,999 bits, but that binary number is a string of 0s and 1s, so it has its own Kolmogorov complexity, which may be smaller than \log_2 499,999. So, compressibility has a strong element of recursion, something that in physics we associate with scale invariance and fractals.

You may be wondering whether there are truly complex sequences of 0,1 bits, or if one can always find a really clever computer program to compress any N bit string down to, say, N/100 bits. The answer is interesting: There is no computer program that can compute the Kolmogorov complexity of an arbitrary string (the argument has roots in Berry’s Paradox), but there are strings of arbitrarily large Kolmogorov complexity (that is, no matter what program we use and what language we write it in, the smallest program (in bits) that outputs the N-bit string will be at least N bits long). In other words, there really are streams of data (in the form of bits) that are completely incompressible. In fact, a typical string of 0s and 1s will be almost completely incompressible!

Stability, compressibility and the arrow of time

So, what does compressibility have to do with the theory of everything? It has everything to do with it. Because, if we ever succeed in writing down such a theory in a physics textbook, we will have effectively produced a computer program that, given enough time, should be able to compute the next bit in the string that represents the data encoding the coarse-grained information we hope to extract from the state of the universe. In other words, the only reason the universe makes sense to us is because the data we gather about its state is highly compressible. This seems to imply that this universe is really, really special and completely atypical. Or is it the other way around? What if the laws of physics were non-existent? Would there be any consistent gravitational pull between matter to form galaxies and stars and planets? Would there be any predictability in the motion of the planets around suns? Forget about life, let alone intelligent life and the anthropic principle. Would the Earth, or Jupiter even know where to go next if it had no sense that it was part of a non-random plot in the movie that is spacetime? Would there be any notion of spacetime to begin with? Or an arrow of time? When you are given one thousand frames from one thousand different movies, there is no way to make a single coherent plot. Even the frames of a single movie would make little sense upon reshuffling.

What if the arrow of time emerged from the notions of stability and compressibility, through coarse-graining that acts as a compression algorithm for data that is inherently highly-complex and, hence, highly typical as the next move to make? If two strings of data look equally complex upon coarse-graining, but one of them has a billion more ways of appearing from the underlying raw data, then which one will be more likely to appear in the theory-of-everything book of our coarse-grained universe? Note that we need both high compressibility after coarse-graining in order to write down the theory, as well as large entropy before coarse-graining (from a large number of raw strings that all map to one string after coarse-graining), in order to have an arrow of time. It seems that we need highly-typical, highly complex strings that become easy to write down once we coarse grain the data in some clever way. Doesn’t that seem like a contradiction? How can a bunch of incompressible data become easily compressible upon coarse-graining? Here is one way: Take an N-bit string and define its 1-bit coarse-graining as the boolean AND of its digits. All but one strings will default to 0. The all 1s string will default to 1. Equally compressible, but the probability of seeing the 1 after coarse-graining is 2^{-N}. With only 300 bits, finding the coarse-grained 1 is harder than looking for a specific atom in the observable universe. In other words, if the coarse-graining rule at time t is the one given above, then you can be pretty sure you will be seeing a 0 come up next in your data. Notice that before coarse-graining, all 2^N strings are equally likely, so there is no arrow of time, since there is no preferred string from a probabilistic point of view.

Conclusion, for now

When we think about the world around us, we go to our intuitions first as a starting point for any theory describing the multitude of possible experiences (observable states of the world). If we are to really get to the bottom of this process, it seems fruitful to ask “why do I assume this?” and “is that truly fundamental or can I derive it from something else that I already assumed was an independent axiom?” One of the postulates of quantum mechanics is the axiom corresponding to the evolution of states under Schrodinger’s equation. We will attempt to derive that equation from the other postulates in an upcoming post. Until then, your help is wanted with the march towards more fundamental principles that explain our seemingly self-evident truths. Question everything, especially when you think you really figured things out. Start with this post. After all, a theory of everything should be able to explain itself.

UP NEXT: Entanglement, Schmidt decomposition, concentration measure bounds and the emergence of discrete time and unitary evolution.

Top 10 questions for your potential PhD adviser/group

Everyone in grad school has taken on the task of picking the perfect research group at some point.  Then some among us had the dubious distinction of choosing the perfect research group twice.  Luckily for me, a year of grad research taught me a lot and I found myself asking group members and PIs (primary investigators) very different questions.  And luckily for you, I wrote these questions down to share with future generations.  My background as an experimental applied physicist showed through initially, so I got Shaun Maguire and Spiros Michalakis to help make it applicable for theorists too, and most of them should be useful outside physics as well.

Questions to break that silence when your potential advisor asks “So, do you have any questions for me?”

1. Are you taking new students?
– 2a. if yes: How many are you looking to take?
– 2b. if no: Ask them about the department or other professors.  They’ve been there long enough to have opinions.  Alternatively, ask what kinds of questions they would suggest you ask other PIs
3. What is the procedure for joining the group?
4. (experimental) Would you have me TA?  (This is the nicest way I thought of to ask if a PI can fund you with a research assistance-ship (RA), though sometimes they just like you to TA their class.)
4. (theory) Funding routes will often be covered by question 3 since TAs are the dominant funding method for theory students, unlike for experimentalists. If relevant, you can follow up with: How does funding for your students normally work? Do you have funding for me?
5. Do new students work for/report to other grad students, post docs, or you directly?
6. How do you like students to arrange time to meet with you?
7. How often do you have group meetings?
8. How much would you like students to prepare for them?
9. Would you suggest I take any specific classes?
10. What makes someone a good fit for this group?

And then for the high bandwidth information transfer.  Grill the group members themselves, and try to ask more than one group member if you can.

1. How much do you prepare for meetings with PI?
2. How long until people lead their own project? – Equivalently, who’s working on what projects.
3. How much do people on different projects communicate? (only group meeting or every day)
4. Is the PI hands on (how often PI wants to meet with you)?
5. Is the PI accessible (how easily can you meet with the PI if you want to)?
6. What is the average time to graduation? (if it’s important to you personally)
7. Does the group/subgroup have any bonding activities?
8. Do you think I should join this group?
9. What are people’s backgrounds?
10. What makes someone a good fit for this group?

Hope that helps.  If you have any other suggested questions, be sure to leave them in the comments.

Clocking in at a Cambridge conference

Science evolves on Facebook.

On Facebook last fall, I posted about statistical mechanics. Statistical mechanics is the physics of hordes of particles. Hordes of molecules, for example, form the stench seeping from a clogged toilet. Hordes change in certain ways but not in the reverse ways, suggesting time points in a direction. Once a stink diffuses into the hall, it won’t regroup in the bathroom. The molecules’ locations distinguish past from future.

The post attracted a comment by Ian Durham, associate professor of physics at St. Anselm College. Minutes later, we were instant-messaging about infinitely long evolutions.*

The next day, I sent Ian a paper draft. His reply made me jump more than a whiff of a toilet would. Would I discuss the paper at a conference he was co-organizing?

I almost replied, Are you sure?

Then I almost replied, Yes, please!

The conference, “Eddington and Wheeler: Information and Interaction,” unfolded this March at the University of Cambridge. Cambridge employed Sir Arthur Eddington, the astronomer whose 1919 observation of starlight during an eclipse catapulted Einstein’s general relativity to fame. Decades later, John Wheeler laid groundwork for quantum information.

Though aware of Eddington’s observation, I hadn’t known he’d researched stat mech. I hadn’t known his opinions about time. Time owns a high-rise in my heart; see the fussiness with which I catalogue “last fall,” “minutes later,” and “the next day.” Conference-goers shared news about time in the Old Combination Room at Cambridge’s Trinity College. Against the room’s wig-filled portraits, our projector resembled a souvenir misplaced by a time traveler.


Trinity College, Cambridge.

Presenter one, Huw Price, argued that time has no arrow. It appears to in our universe: We remember the past and anticipate the future. Once a stench diffuses, it doesn’t regroup. The stench illustrates the Second Law of Thermodynamics, the assumption that entropy increases.

If “entropy” doesn’t ring a bell, never mind; we’ll dissect it in future articles. Suffice it to say that (1) thermodynamics is a branch of physics related to stat mech; (2) according to the Second Law of Thermodynamics, something called “entropy” increases; (3) entropy’s rise distinguishes the past from the future by associating the former with a low entropy and the latter with a large entropy; and (4) a stench’s diffusion illustrates the Second Law and time’s flow.

In as many universes in which entropy increases (time flows in one direction), in so many universe does entropy decrease (does time flow oppositely). So, said Huw Price, postulated the 19th-century stat-mech founder Ludwig Boltzmann. Why would universes pair up? For the reason why, driving across a pothole, you not only fall, but also rise. Each fluctuation from equilibrium—from a flat road—involves an upward path and a downward. The upward path resembles a universe in which entropy increases; the downward, a universe in which entropy decreases. Every down pairs with an up. Averaged over universes, time has no arrow.

Freidel Weinert, presenter five, argued the opposite. Time has an arrow, he said, and not because of entropy.

Ariel Caticha discussed an impersonator of time. Using a cousin of MaxEnt, he derived an equation identical to Schrödinger’s. MaxEnt, short for “the Maximum Entropy Principle,” is a tool used in stat mech. Schrödinger’s Equation describes how quantum systems evolve. To draw from Schrödinger’s Equation predictions about electrons and atoms, physicists assume that features of reality resemble certain bits of math. We assume, for example, that the t in Schrödinger’s Equation represents time.

A t appeared in Ariel’s twin of Schrödinger’s Equation. But Ariel didn’t assume what physicists usually assume. MaxEnt motivated his assumptions. Interpreting Ariel’s equation poses a challenge. If a variable acts like time and smells like time, does it represent time?**

IMG_0064 copy - Version 2

A presenter uses the anachronistic projector. The head between screen and camera belongs to David Finkelstein, who helped develop the theory of general relativity checked by Eddington.

Like Ariel, Bill Wootters questioned time’s role in arguments. The co-creator of quantum teleportation wondered why one tenet of quantum physics has the form it has. Using quantum mechanics, we can’t predict certain experiments’ outcomes. We can predict probabilities—the chance that some experiment will yield Possible Outcome 1, the chance that the experiment will yield Possible Outcome 2, and so on. To calculate these probabilities, we square numbers. Why square? Why don’t the probabilities depend on cubes?

To explore this question, Bill told a story. Suppose some experimenter runs these experiments on Monday and those on Tuesday. When evaluating his story, Bill pointed out a hole: Replacing “Monday” and “Tuesday” with “eight o’clock” and “nine” wouldn’t change his conclusion. Which replacements wouldn’t change it, and which would? To what can we generalize those days?

We couldn’t answer his questions on the Sunday he asked them.

Little of presentation twelve concerned time. Rüdiger Schack introduced QBism, an interpretation of quantum mechanics that sounds like “cubism.” Casting quantum physics in terms of experimenters’ actions, Rüdiger mentioned time. By the time of the mention, I couldn’t tell what anyone meant by “time.” Raising a hand, I asked for clarification.

“You are young,” Rüdiger said. “But you will grow old and die.”

The comment clanged like the slam of a door. It echoed when I followed Ian into Ascension Parish Burial Ground. On Cambridge’s outskirts, conference-goers visited Eddington’s headstone. We found Wittgenstein’s near an uneven footpath; near tangles of undergrowth, Nobel laureates’. After debating about time, we marked its footprints. Paths of glory lead but to the grave.


Here lies one whose name was writ in a conference title: Sir Arthur Eddington’s grave.

Paths touched by little glory, I learned, have perks. As Rüdiger noted, I was the greenest participant. As he had the manners not to note, I was the least distinguished and the most ignorant. Studenthood freed me to raise my hand, to request clarification, to lack opinions about time. Perhaps I’ll evolve opinions at some t, some Monday down the road. That Monday feels infinitely far off. These days, I’ll stick to evolving science—using that other boon of youth, Facebook.


* You know you’re a theoretical physicist (or a physicist-in-training) when you debate about processes that last till kingdom come.

** As long as the variable doesn’t smell like a clogged toilet.


For videos of the presentations—including the public lecture by best-selling author Neal Stephenson—stay tuned to

With gratitude to Ian Durham and Dean Rickles for organizing “Information and Interaction” and for the opportunity to participate. With thanks to the other participants for sharing their ideas and time.

Inflation on the back of an envelope

Last Monday was an exciting day!

After following the BICEP2 announcement via Twitter, I had to board a transcontinental flight, so I had 5 uninterrupted hours to think about what it all meant. Without Internet access or references, and having not thought seriously about inflation for decades, I wanted to reconstruct a few scraps of knowledge needed to interpret the implications of r ~ 0.2.

I did what any physicist would have done … I derived the basic equations without worrying about niceties such as factors of 3 or 2 \pi. None of what I derived was at all original —  the theory has been known for 30 years — but I’ve decided to turn my in-flight notes into a blog post. Experts may cringe at the crude approximations and overlooked conceptual nuances, not to mention the missing references. But some mathematically literate readers who are curious about the implications of the BICEP2 findings may find these notes helpful. I should emphasize that I am not an expert on this stuff (anymore), and if there are serious errors I hope better informed readers will point them out.

By tradition, careless estimates like these are called “back-of-the-envelope” calculations. There have been times when I have made notes on the back of an envelope, or a napkin or place mat. But in this case I had the presence of mind to bring a notepad with me.

Notes from a plane ride

Notes from a plane ride

According to inflation theory, a nearly homogeneous scalar field called the inflaton (denoted by \phi)  filled the very early universe. The value of \phi varied with time, as determined by a potential function V(\phi). The inflaton rolled slowly for a while, while the dark energy stored in V(\phi) caused the universe to expand exponentially. This rapid cosmic inflation lasted long enough that previously existing inhomogeneities in our currently visible universe were nearly smoothed out. What inhomogeneities remained arose from quantum fluctuations in the inflaton and the spacetime geometry occurring during the inflationary period.

Gradually, the rolling inflaton picked up speed. When its kinetic energy became comparable to its potential energy, inflation ended, and the universe “reheated” — the energy previously stored in the potential V(\phi) was converted to hot radiation, instigating a “hot big bang”. As the universe continued to expand, the radiation cooled. Eventually, the energy density in the universe came to be dominated by cold matter, and the relic fluctuations of the inflaton became perturbations in the matter density. Regions that were more dense than average grew even more dense due to their gravitational pull, eventually collapsing into the galaxies and clusters of galaxies that fill the universe today. Relic fluctuations in the geometry became gravitational waves, which BICEP2 seems to have detected.

Both the density perturbations and the gravitational waves have been detected via their influence on the inhomogeneities in the cosmic microwave background. The 2.726 K photons left over from the big bang have a nearly uniform temperature as we scan across the sky, but there are small deviations from perfect uniformity that have been precisely measured. We won’t worry about the details of how the size of the perturbations is inferred from the data. Our goal is to achieve a crude understanding of how the density perturbations and gravitational waves are related, which is what the BICEP2 results are telling us about. We also won’t worry about the details of the shape of the potential function V(\phi), though it’s very interesting that we might learn a lot about that from the data.

Exponential expansion

Einstein’s field equations tell us how the rate at which the universe expands during inflation is related to energy density stored in the scalar field potential. If a(t) is the “scale factor” which describes how lengths grow with time, then roughly

\left(\frac{\dot a}{a}\right)^2 \sim \frac{V}{m_P^2}.

Here \dot a means the time derivative of the scale factor, and m_P = 1/\sqrt{8 \pi G} \approx 2.4 \times 10^{18} GeV is the Planck scale associated with quantum gravity. (G is Newton’s gravitational constant.) I’ve left our a factor of 3 on purpose, and I used the symbol ~ rather than = to emphasize that we are just trying to get a feel for the order of magnitude of things. I’m using units in which Planck’s constant \hbar and the speed of light c are set to one, so mass, energy, and inverse length (or inverse time) all have the same dimensions. 1 GeV means one billion electron volts, about the mass of a proton.

(To persuade yourself that this is at least roughly the right equation, you should note that a similar equation applies to an expanding spherical ball of radius a(t) with uniform mass density V. But in the case of the ball, the mass density would decrease as the ball expands. The universe is different — it can expand without diluting its mass density, so the rate of expansion \dot a / a does not slow down as the expansion proceeds.)

During inflation, the scalar field \phi and therefore the potential energy V(\phi) were changing slowly; it’s a good approximation to assume V is constant. Then the solution is

a(t) \sim a(0) e^{Ht},

where H, the Hubble constant during inflation, is

H \sim \frac{\sqrt{V}}{m_P}.

To explain the smoothness of the observed universe, we require at least 50 “e-foldings” of inflation before the universe reheated — that is, inflation should have lasted for a time at least 50 H^{-1}.

Slow rolling

During inflation the inflaton \phi rolls slowly, so slowly that friction dominates inertia — this friction results from the cosmic expansion. The speed of rolling \dot \phi is determined by

H \dot \phi \sim -V'(\phi).

Here V'(\phi) is the slope of the potential, so the right-hand side is the force exerted by the potential, which matches the frictional force on the left-hand side. The coefficient of \dot \phi has to be H on dimensional grounds. (Here I have blown another factor of 3, but let’s not worry about that.)

Density perturbations

The trickiest thing we need to understand is how inflation produced the density perturbations which later seeded the formation of galaxies. There are several steps to the argument.

Quantum fluctuations of the inflaton

As the universe inflates, the inflaton field is subject to quantum fluctuations, where the size of the fluctuation depends on its wavelength. Due to inflation, the wavelength increases rapidly, like e^{Ht}, and once the wavelength gets large compared to H^{-1}, there isn’t enough time for the fluctuation to wiggle — it gets “frozen in.” Much later, long after the reheating of the universe, the oscillation period of the wave becomes comparable to the age of the universe, and then it can wiggle again. (We say that the fluctuations “cross the horizon” at that stage.) Observations of the anisotropy of the microwave background have determined how big the fluctuations are at the time of horizon crossing. What does inflation theory say about that?

Well, first of all, how big are the fluctuations when they leave the horizon during inflation? Then the wavelength is H^{-1} and the universe is expanding at the rate H, so H is the only thing the magnitude of the fluctuations could depend on. Since the field \phi has the same dimensions as H, we conclude that fluctuations have magnitude

\delta \phi \sim H.

From inflaton fluctuations to density perturbations

Reheating occurs abruptly when the inflaton field reaches a particular value. Because of the quantum fluctuations, some horizon volumes have larger than average values of \phi and some have smaller than average values; hence different regions reheat at slightly different times. The energy density in regions that reheat earlier starts to be reduced by expansion (“red shifted”) earlier, so these regions have a smaller than average energy density. Likewise, regions that reheat later start to red shift later, and wind up having larger than average density.

When we compare different regions of comparable size, we can find the typical (root-mean-square) fluctuations \delta t in the reheating time, knowing the fluctuations in \phi and the rolling speed \dot \phi:

\delta t \sim \frac{\delta \phi}{\dot \phi} \sim \frac{H}{\dot\phi}.

Small fractional fluctuations in the scale factor a right after reheating produce comparable small fractional fluctuations in the energy density \rho. The expansion rate right after reheating roughly matches the expansion rate H right before reheating, and so we find that the characteristic size of the density perturbations is

\delta_S\equiv\left(\frac{\delta \rho}{\rho}\right)_{hor} \sim \frac{\delta a}{a} \sim \frac{\dot a}{a} \delta t\sim \frac{H^2}{\dot \phi}.

The subscript hor serves to remind us that this is the size of density perturbations as they cross the horizon, before they get a chance to grow due to gravitational instabilities. We have found our first important conclusion: The density perturbations have a size determined by the Hubble constant H and the rolling speed \dot \phi of the inflaton, up to a factor of order one which we have not tried to keep track of. Insofar as the Hubble constant and rolling speed change slowly during inflation, these density perturbations have a strength which is nearly independent of the length scale of the perturbation. From here on we will denote this dimensionless scale of the fluctuations by \delta_S, where the subscript S stands for “scalar”.

Perturbations in terms of the potential

Putting together \dot \phi \sim -V' / H and H^2 \sim V/{m_P}^2 with our expression for \delta_S, we find

\delta_S^2 \sim \frac{H^4}{\dot\phi^2}\sim \frac{H^6}{V'^2} \sim \frac{1}{{m_P}^6}\frac{V^3}{V'^2}.

The observed density perturbations are telling us something interesting about the scalar field potential during inflation.

Gravitational waves and the meaning of r

The gravitational field as well as the inflaton field is subject to quantum fluctuations during inflation. We call these tensor fluctuations to distinguish them from the scalar fluctuations in the energy density. The tensor fluctuations have an effect on the microwave anisotropy which can be distinguished in principle from the scalar fluctuations. We’ll just take that for granted here, without worrying about the details of how it’s done.

While a scalar field fluctuation with wavelength \lambda and strength \delta \phi carries energy density \sim \delta\phi^2 / \lambda^2, a fluctuation of the dimensionless gravitation field h with wavelength \lambda and strength \delta h carries energy density \sim m_P^2 \delta h^2 / \lambda^2. Applying the same dimensional analysis we used to estimate \delta \phi at horizon crossing to the rescaled field h/m_P, we estimate the strength \delta_T of the tensor fluctuations as

\delta_T^2 \sim \frac{H^2}{m_P^2}\sim \frac{V}{m_P^4}.

From observations of the CMB anisotropy we know that \delta_S\sim 10^{-5}, and now BICEP2 claims that the ratio

r = \frac{\delta_T^2}{\delta_S^2}

is about r\sim 0.2 at an angular scale on the sky of about one degree. The conclusion (being a little more careful about the O(1) factors this time) is

V^{1/4} \sim 2 \times 10^{16}~GeV \left(\frac{r}{0.2}\right)^{1/4}.

This is our second important conclusion: The energy density during inflation defines a mass scale, which turns our to be 2 \times 10^{16}~GeV for the observed value of r. This is a very interesting finding because this mass scale is not so far below the Planck scale, where quantum gravity kicks in, and is in fact pretty close to theoretical estimates of the unification scale in supersymmetric grand unified theories. If this mass scale were a factor of 2 smaller, then r would be smaller by a factor of 16, and hence much harder to detect.

Rolling, rolling, rolling, …

Using \delta_S^2 \sim H^4/\dot\phi^2, we can express r as

r = \frac{\delta_T^2}{\delta_S^2}\sim \frac{\dot\phi^2}{m_P^2 H^2}.

It is convenient to measure time in units of the number N = H t of e-foldings of inflation, in terms of which we find

\frac{1}{m_P^2} \left(\frac{d\phi}{dN}\right)^2\sim r;

Now, we know that for inflation to explain the smoothness of the universe we need N larger than 50, and if we assume that the inflaton rolls at a roughly constant rate during N e-foldings, we conclude that, while rolling, the change in the inflaton field is

\frac{\Delta \phi}{m_P} \sim N \sqrt{r}.

This is our third important conclusion — the inflaton field had to roll a long, long, way during inflation — it changed by much more than the Planck scale! Putting in the O(1) factors we have left out reduces the required amount of rolling by about a factor of 3, but we still conclude that the rolling was super-Planckian if r\sim 0.2. That’s curious, because when the scalar field strength is super-Planckian, we expect the kind of effective field theory we have been implicitly using to be a poor approximation because quantum gravity corrections are large. One possible way out is that the inflaton might have rolled round and round in a circle instead of in a straight line, so the field strength stayed sub-Planckian even though the distance traveled was super-Planckian.

Spectral tilt

As the inflaton rolls, the potential energy, and hence also the Hubble constant H, change during inflation. That means that both the scalar and tensor fluctuations have a strength which is not quite independent of length scale. We can parametrize the scale dependence in terms of how the fluctuations change per e-folding of inflation, which is equivalent to the change per logarithmic length scale and is called the “spectral tilt.”

To keep things simple, let’s suppose that the rate of rolling is constant during inflation, at least over the length scales for which we have data. Using \delta_S^2 \sim H^4/\dot\phi^2, and assuming \dot\phi is constant, we estimate the scalar spectral tilt as

-\frac{1}{\delta_S^2}\frac{d\delta_S^2}{d N} \sim - \frac{4 \dot H}{H^2}.

Using \delta_T^2 \sim H^2/m_P^2, we conclude that the tensor spectral tilt is half as big.

From H^2 \sim V/m_P^2, we find

\dot H \sim \frac{1}{2} \dot \phi \frac{V'}{V} H,

and using \dot \phi \sim -V'/H we find

-\frac{1}{\delta_S^2}\frac{d\delta_S^2}{d N} \sim \frac{V'^2}{H^2V}\sim m_P^2\left(\frac{V'}{V}\right)^2\sim \left(\frac{V}{m_P^4}\right)\left(\frac{m_P^6 V'^2}{V^3}\right)\sim \delta_T^2 \delta_S^{-2}\sim r.

Putting in the numbers more carefully we find a scalar spectral tilt of r/4 and a tensor spectral tilt of r/8.

This is our last important conclusion: A relatively large value of r means a significant spectral tilt. In fact, even before the BICEP2 results, the CMB anisotropy data already supported a scalar spectral tilt of about .04, which suggested something like r \sim .16. The BICEP2 detection of the tensor fluctuations (if correct) has confirmed that suspicion.

Summing up

If you have stuck with me this far, and you haven’t seen this stuff before, I hope you’re impressed. Of course, everything I’ve described can be done much more carefully. I’ve tried to convey, though, that the emerging story seems to hold together pretty well. Compared to last week, we have stronger evidence now that inflation occurred, that the mass scale of inflation is high, and that the scalar and tensor fluctuations produced during inflation have been detected. One prediction is that the tensor fluctuations, like the scalar ones, should have a notable spectral tilt, though a lot more data will be needed to pin that down.

I apologize to the experts again, for the sloppiness of these arguments. I hope that I have at least faithfully conveyed some of the spirit of inflation theory in a way that seems somewhat accessible to the uninitiated. And I’m sorry there are no references, but I wasn’t sure which ones to include (and I was too lazy to track them down).

It should also be clear that much can be done to sharpen the confrontation between theory and experiment. A whole lot of fun lies ahead.

Added notes (3/25/2014):

Okay, here’s a good reference, a useful review article by Baumann. (I found out about it on Twitter!)

From Baumann’s lectures I learned a convenient notation. The rolling of the inflaton can be characterized by two “potential slow-roll parameters” defined by

\epsilon = \frac{m_p^2}{2}\left(\frac{V'}{V}\right)^2,\quad \eta = m_p^2\left(\frac{V''}{V}\right).

Both parameters are small during slow rolling, but the relationship between them depends on the shape of the potential. My crude approximation (\epsilon = \eta) would hold for a quadratic potential.

We can express the spectral tilt (as I defined it) in terms of these parameters, finding 2\epsilon for the tensor tilt, and 6 \epsilon - 2\eta for the scalar tilt. To derive these formulas it suffices to know that \delta_S^2 is proportional to V^3/V'^2, and that \delta_T^2 is proportional to H^2; we also use

3H\dot \phi = -V', \quad 3H^2 = V/m_P^2,

keeping factors of 3 that I left out before. (As a homework exercise, check these formulas for the tensor and scalar tilt.)

It is also easy to see that r is proportional to \epsilon; it turns out that r = 16 \epsilon. To get that factor of 16 we need more detailed information about the relative size of the tensor and scalar fluctuations than I explained in the post; I can’t think of a handwaving way to derive it.

We see, though, that the conclusion that the tensor tilt is r/8 does not depend on the details of the potential, while the relation between the scalar tilt and r does depend on the details. Nevertheless, it seems fair to claim (as I did) that, already before we knew the BICEP2 results, the measured nonzero scalar spectral tilt indicated a reasonably large value of r.

Once again, we’re lucky. On the one hand, it’s good to have a robust prediction (for the tensor tilt). On the other hand, it’s good to have a handle (the scalar tilt) for distinguishing among different inflationary models.

One last point is worth mentioning. We have set Planck’s constant \hbar equal to one so far, but it is easy to put the powers of \hbar back in using dimensional analysis (we’ll continue to assume the speed of light c is one). Since Newton’s constant G has the dimensions of length/energy, and the potential V has the dimensions of energy/volume, while \hbar has the dimensions of energy times length, we see that

\delta_T^2 \sim \hbar G^2V.

Thus the production of gravitational waves during inflation is a quantum effect, which would disappear in the limit \hbar \to 0. Likewise, the scalar fluctuation strength \delta_S^2 is also O(\hbar), and hence also a quantum effect.

Therefore the detection of primordial gravitational waves by BICEP2, if correct, confirms that gravity is quantized just like the other fundamental forces. That shouldn’t be a surprise, but it’s nice to know.

Oh, the Places You’ll Do Theoretical Physics!

I won’t run lab tests in a box.
I won’t run lab tests with a fox.
But I’ll prove theorems here or there.
Yes, I’ll prove theorems anywhere…

Physicists occupy two camps. Some—theorists—model the world using math. We try to predict experiments’ outcomes and to explain natural phenomena. Others—experimentalists—gather data using supermagnets, superconductors, the world’s coldest atoms, and other instruments deserving of superlatives. Experimentalists confirm that our theories deserve trashing or—for this we pray—might not model the world inaccurately.

Theorists, people say, can work anywhere. We need no million-dollar freezers. We need no multi-pound magnets.* We need paper, pencils, computers, and coffee. Though I would add “quiet,” colleagues would add “iPods.”

Theorists’ mobility reminds me of the book Green Eggs and Ham. Sam-I-am, the antagonist, drags the protagonist to spots as outlandish as our workplaces. Today marks the author’s birthday. Since Theodor Geisel stimulated imaginations, and since imagination drives physics, Quantum Frontiers is paying its respects. In honor of Oh, the Places You’ll Go!, I’m spotlighting places you can do theoretical physics. You judge whose appetite for exotica exceeds whose: Dr. Seuss’s or theorists’.

I’ve most looked out-of-place doing physics by a dirt road between sheep-populated meadows outside Lancaster, UK. Lancaster, the War of the Roses victor, is a city in northern England. The year after graduating from college, I worked in Lancaster University as a research assistant. I studied a crystal that resembles graphene, a material whose superlatives include “superstrong,” “supercapacitor,” and “superconductor.” From morning to evening, I’d submerse in math till it poured out my ears. Then I’d trek from “uni,” as Brits say, to the “city centre,” as they write.

The trek wound between trees; fields; and, because I was in England, puddles. Many evenings, a rose or a sunset would arrest me. Other evenings, physics would. I’d realize how to solve an equation, or that I should quit banging my head against one. Stepping off the road, I’d fish out a notebook and write. Amidst the puddles and lambs. Cyclists must have thought me the queerest sight since a cloudless sky.

A colleague loves doing theory in the sky. On planes, he explained, hardly anyone interrupts his calculations. And who minds interruptions by pretzels and coffee?

“A mathematician is a device for turning coffee into theorems,” some have said, and theoretical physicists live down the block from mathematicians in the neighborhood of science. Turn a Pasadena café upside-down and shake it, and out will fall theorists. Since Hemingway’s day, the romanticism has faded from the penning of novels in cafés. But many a theorist trumpets about an equation derived on a napkin.

Trumpeting filled my workplace in Oxford. One of Clarendon Lab’s few theorists, I neighbored lasers, circuits, and signs that read “DANGER! RADIATION.” Though radiation didn’t leak through our walls (I hope), what did contributed more to that office’s eccentricity more than radiation would. As early as 9:10 AM, the experimentalists next door blasted “Born to Be Wild” and Animal House tunes. If you can concentrate over there, you can concentrate anywhere.

One paper I concentrated on had a Crumple-Horn Web-Footed Green-Bearded Schlottz of an acknowledgements section. In a physics paper’s last paragraph, one thanks funding agencies and colleagues for support and advice. “The authors would like to thank So-and-So for insightful comments,” papers read. This paper referenced a workplace: “[One coauthor] is grateful to the Half Moon Pub.” Colleagues of the coauthor confirmed the acknowledgement’s aptness.

Though I’ve dwelled on theorists’ physical locations, our minds roost elsewhere. Some loiter in atoms; others, in black holes; some, on four-dimensional surfaces; others, in hypothetical universes. I hobnob with particles in boxes. As Dr. Seuss whisks us to a Bazzim populated by Nazzim, theorists tell of function spaces populated by Rényi entropies.

The next time you see someone standing in a puddle, or in a ditch, or outside Buckingham Palace, scribbling equations, feel free to laugh. You might be seeing a theoretical physicist. You might be seeing me. To me, physics has relevance everywhere. Scribbling there and here should raise eyebrows no more than any setting in a Dr. Seuss book.

The author would like to thank this emporium of Seussoria. And Java & Co.

*We need for them to confirm that our theories deserve trashing, but we don’t need them with us. Just as, when considering quitting school to break into the movie business, you need for your mother to ask, “Are you sure that’s a good idea, dear?” but you don’t need for her to hang on your elbow. Except experimentalists don’t say “dear” when crushing theorists’ dreams.