Bell’s inequality 50 years later

This is a jubilee year.* In November 1964, John Bell submitted a paper to the obscure (and now defunct) journal Physics. That paper, entitled “On the Einstein Podolsky Rosen Paradox,” changed how we think about quantum physics.

The paper was about quantum entanglement, the characteristic correlations among parts of a quantum system that are profoundly different than correlations in classical systems. Quantum entanglement had first been explicitly discussed in a 1935 paper by Einstein, Podolsky, and Rosen (hence Bell’s title). Later that same year, the essence of entanglement was nicely and succinctly captured by Schrödinger, who said, “the best possible knowledge of a whole does not necessarily include the best possible knowledge of its parts.” Schrödinger meant that even if we have the most complete knowledge Nature will allow about the state of a highly entangled quantum system, we are still powerless to predict what we’ll see if we look at a small part of the full system. Classical systems aren’t like that — if we know everything about the whole system then we know everything about all the parts as well. I think Schrödinger’s statement is still the best way to explain quantum entanglement in a single vigorous sentence.

To Einstein, quantum entanglement was unsettling, indicating that something is missing from our understanding of the quantum world. Bell proposed thinking about quantum entanglement in a different way, not just as something weird and counter-intuitive, but as a resource that might be employed to perform useful tasks. Bell described a game that can be played by two parties, Alice and Bob. It is a cooperative game, meaning that Alice and Bob are both on the same side, trying to help one another win. In the game, Alice and Bob receive inputs from a referee, and they send outputs to the referee, winning if their outputs are correlated in a particular way which depends on the inputs they receive.

But under the rules of the game, Alice and Bob are not allowed to communicate with one another between when they receive their inputs and when they send their outputs, though they are allowed to use correlated classical bits which might have been distributed to them before the game began. For a particular version of Bell’s game, if Alice and Bob play their best possible strategy then they can win the game with a probability of success no higher than 75%, averaged uniformly over the inputs they could receive. This upper bound on the success probability is Bell’s famous inequality.**

Classical and quantum versions of Bell's game. If Alice and Bob share entangled qubits rather than classical bits, then they can win the game with a higher success probability.

Classical and quantum versions of Bell’s game. If Alice and Bob share entangled qubits rather than classical bits, then they can win the game with a higher success probability.

There is also a quantum version of the game, in which the rules are the same except that Alice and Bob are now permitted to use entangled quantum bits (“qubits”)  which were distributed before the game began. By exploiting their shared entanglement, they can play a better quantum strategy and win the game with a higher success probability, better than 85%. Thus quantum entanglement is a useful resource, enabling Alice and Bob to play the game better than if they shared only classical correlations instead of quantum correlations.

And experimental physicists have been playing the game for decades, winning with a success probability that violates Bell’s inequality. The experiments indicate that quantum correlations really are fundamentally different than, and stronger than, classical correlations.

Why is that such a big deal? Bell showed that a quantum system is more than just a probabilistic classical system, which eventually led to the realization (now widely believed though still not rigorously proven) that accurately predicting the behavior of highly entangled quantum systems is beyond the capacity of ordinary digital computers. Therefore physicists are now striving to scale up the weirdness of the microscopic world to larger and larger scales, eagerly seeking new phenomena and unprecedented technological capabilities.

1964 was a good year. Higgs and others described the Higgs mechanism, Gell-Mann and Zweig proposed the quark model, Penzias and Wilson discovered the cosmic microwave background, and I saw the Beatles on the Ed Sullivan show. Those developments continue to reverberate 50 years later. We’re still looking for evidence of new particle physics beyond the standard model, we’re still trying to unravel the large scale structure of the universe, and I still like listening to the Beatles.

Bell’s legacy is that quantum entanglement is becoming an increasingly pervasive theme of contemporary physics, important not just as the source of a quantum computer’s awesome power, but also as a crucial feature of exotic quantum phases of matter, and even as a vital element of the quantum structure of spacetime itself. 21st century physics will advance not only by probing the short-distance frontier of particle physics and the long-distance frontier of cosmology, but also by exploring the entanglement frontier, by elucidating and exploiting the properties of increasingly complex quantum states.

frontiersSometimes I wonder how the history of physics might have been different if there had been no John Bell. Without Higgs, Brout and Englert and others would have elucidated the spontaneous breakdown of gauge symmetry in 1964. Without Gell-Mann, Zweig could have formulated the quark model. Without Penzias and Wilson, Dicke and collaborators would have discovered the primordial black-body radiation at around the same time.

But it’s not obvious which contemporary of Bell, if any, would have discovered his inequality in Bell’s absence. Not so many good physicists were thinking about quantum entanglement and hidden variables at the time (though David Bohm may have been one notable exception, and his work deeply influenced Bell.) Without Bell, the broader significance of quantum entanglement would have unfolded quite differently and perhaps not until much later. We really owe Bell a great debt.

*I’m stealing the title and opening sentence of this post from Sidney Coleman’s great 1981 lectures on “The magnetic monopole 50 years later.” (I’ve waited a long time for the right opportunity.)

**I’m abusing history somewhat. Bell did not use the language of games, and this particular version of the inequality, which has since been extensively tested in experiments, was derived by Clauser, Horne, Shimony, and Holt in 1969.

I spy with my little eye…something algebraic.

Look at this picture.

Peter 1

Does any part of it surprise you? Look more closely.

Peter 2

Now? Try crossing your eyes.

Peter 3

Do you see a boy’s name?

I spell “Peter” with two e’s, but “Piotr” and “Pyotr” appear as authors’ names in papers’ headers. Finding “Petr” in a paper shouldn’t have startled me. But how often does “Gretchen” or “Amadeus” materialize in an equation?

When I was little, my reading list included Eye Spy, Where’s Waldo?, and Puzzle Castle. The books teach children to pay attention, notice details, and evaluate ambiguities.

That’s what physicists do. The first time I saw the picture above, I saw a variation on “Peter.” I was reading (when do I not?) about the intersection of quantum information and thermodynamics. The authors were discussing heat and algebra, not saints or boys who picked pecks of pickled peppers. So I looked more closely.

Each letter resolved into part of a story about a physical system. The P represents a projector. A projector is a mathematical object that narrows one’s focus to a particular space, as blinders on a horse do. The E tells us which space to focus on: a space associated with an amount E of energy, like a country associated with a GDP of $500 billion.

Some of the energy E belongs to a heat reservoir. We know so because “reservoir” begins with r, and R appears in the picture. A heat reservoir is a system, like a colossal bathtub, whose temperature remains constant. The Greek letter \tau, pronounced “tau,” represents the reservoir’s state. The reservoir occupies an equilibrium state: The bath’s large-scale properties—its average energy, volume, etc.—remain constant. Never mind about jacuzzis.

Piecing together the letters, we interpret the picture as follows: Imagine a vast, constant-temperature bathtub (R). Suppose we shut the tap long enough ago that the water in the tub has calmed (\tau). Suppose the tub neighbors a smaller system—say, a glass of Perrier.* Imagine measuring how much energy the bath-and-Perrier composite contains (P). Our measurement device reports the number E.

Quite a story to pack into five letters. Didn’t Peter deserve a second glance?

The equation’s right-hand side forms another story. I haven’t seen Peters on that side, nor Poseidons nor Gallahads. But look closely, and you will find a story.

 

The images above appear in “Fundamental limitations for quantum and nanoscale thermodynamics,” published by Michał Horodecki and Jonathan Oppenheim in Nature Communications in 2013.

 

*Experts: The ρS that appears in the first two images represents the smaller system. The tensor product represents the reservoir-and-smaller-system composite.

Generally speaking

My high-school calculus teacher had a mustache like a walrus’s and shoulders like a rower’s. At 8:05 AM, he would demand my class’s questions about our homework. Students would yawn, and someone’s hand would drift into the air.

“I have a general question,” the hand’s owner would begin.

“Only private questions from you,” my teacher would snap. “You’ll be a general someday, but you’re not a colonel, or even a captain, yet.”

Then his eyes would twinkle; his voice would soften; and, after the student asked the question, his answer would epitomize why I’ve chosen a life in which I use calculus more often than laundry detergent.

http://www.sell-buy-machines.com/2013/02/why-prefer-second-hand-equipment-over-new.html

Many times though I witnessed the “general” trap, I fell into it once. Little wonder: I relish generalization as other people relish hiking or painting or Michelin-worthy relish. When inferring general principles from examples, I abstract away details as though they’re tomato stains. My veneration of generalization led me to quantum information (QI) theory. One abstract theory can model many physical systems: electrons, superconductors, ion traps, etc.

Little wonder that generalizing a QI model swallowed my summer.

QI has shed light on statistical mechanics and thermodynamics, which describe energy, information, and efficiency. Models called resource theories describe small systems’ energies, information, and efficiencies. Resource theories help us calculate a quantum system’s value—what you can and can’t create from a quantum system—if you can manipulate systems in only certain ways.

Suppose you can perform only operations that preserve energy. According to the Second Law of Thermodynamics, systems evolve toward equilibrium. Equilibrium amounts roughly to stasis: Averages of properties like energy remain constant.

Out-of-equilibrium systems have value because you can suck energy from them to power laundry machines. How much energy can you draw, on average, from a system in a constant-temperature environment? Technically: How much “work” can you draw? We denote this average work by < W >. According to thermodynamics, < W > equals the change ∆F in the system’s Helmholtz free energy. The Helmholtz free energy is a thermodynamic property similar to the energy stored in a coiled spring.

http://www.telegraph.co.uk/property/propertyadvice/jeffhowell/8013593/Home-improvements-Slime-does-come-out-in-the-wash.html

One reason to study thermodynamics?

Suppose you want to calculate more than the average extractable work. How much work will you probably extract during some particular trial? Though statistical physics offers no answer, resource theories do. One answer derived from resource theories resembles ∆F mathematically but involves one-shot information theory, which I’ve discussed elsewhere.

If you average this one-shot extractable work, you recover < W > = ∆F. “Helmholtz” resource theories recapitulate statistical-physics results while offering new insights about single trials.

Helmholtz resource theories sit atop a silver-tasseled pillow in my heart. Why not, I thought, spread the joy to the rest of statistical physics? Why not generalize thermodynamic resource theories?

The average work <W > extractable equals ∆F if heat can leak into your system. If heat and particles can leak, <W > equals the change in your system’s grand potential. The grand potential, like the Helmholtz free energy, is a free energy that resembles the energy in a coiled spring. The grand potential characterizes Bose-Einstein condensates, low-energy quantum systems that may have applications to metrology and quantum computation. If your system responds to a magnetic field, or has mass and occupies a gravitational field, or has other properties, <W > equals the change in another free energy.

A collaborator and I designed resource theories that describe heat-and-particle exchanges. In our paper “Beyond heat baths: Generalized resource theories for small-scale thermodynamics,” we propose that different thermodynamic resource theories correspond to different interactions, environments, and free energies. I detailed the proposal in “Beyond heat baths II: Framework for generalized thermodynamic resource theories.”

“II” generalizes enough to satisfy my craving for patterns and universals. “II” generalizes enough to merit a hand-slap of a pun from my calculus teacher. We can test abstract theories only by applying them to specific systems. If thermodynamic resource theories describe situations as diverse as heat-and-particle exchanges, magnetic fields, and polymers, some specific system should shed light on resource theories’ accuracy.

If you find such a system, let me know. Much as generalization pleases aesthetically, the detergent is in the details.

Reading the sub(linear) text

Physicists are not known for finesse. “Even if it cost us our funding,” I’ve heard a physicist declare, “we’d tell you what we think.” Little wonder I irked the porter who directed me toward central Cambridge.

The University of Cambridge consists of colleges as the US consists of states. Each college has a porter’s lodge, where visitors check in and students beg for help after locking their keys in their rooms. And where physicists ask for directions.

Last March, I ducked inside a porter’s lodge that bustled with deliveries. The woman behind the high wooden desk volunteered to help me, but I asked too many questions. By my fifth, her pointing at a map had devolved to jabbing.

Read the subtext, I told myself. Leave.

Or so I would have told myself, if not for that afternoon.

That afternoon, I’d visited Cambridge’s CMS, which merits every letter in “Centre for Mathematical Sciences.” Home to Isaac Newton’s intellectual offspring, the CMS consists of eight soaring, glass-walled, blue-topped pavilions. Their majesty walloped me as I turned off the road toward the gatehouse. So did the congratulatory letter from Queen Elizabeth II that decorated the route to the restroom.

P1040733

I visited Nilanjana Datta, an affiliated lecturer of Cambridge’s Faculty of Mathematics, and her student, Felix Leditzky. Nilanjana and Felix specialize in entropies and one-shot information theory. Entropies quantify uncertainties and efficiencies. Imagine compressing many copies of a message into the smallest possible number of bits (units of memory). How few bits can you use per copy? That number, we call the optimal compression rate. It shrinks as the number of copies compressed grows. As the number of copies approaches infinity, that compression rate drops toward a number called the message’s Shannon entropy. If the message is quantum, the compression rate approaches the von Neumann entropy.

Good luck squeezing infinitely many copies of a message onto a hard drive. How efficiently can we compress fewer copies? According to one-shot information theory, the answer involves entropies other than Shannon’s and von Neumann’s. In addition to describing data compression, entropies describe the charging of batteriesthe concentration of entanglementthe encrypting of messages, and other information-processing tasks.

Speaking of compressing messages: Suppose one-shot information theory posted status updates on Facebook. Suppose that that panel on your Facebook page’s right-hand side showed news weightier than celebrity marriages. The news feed might read, “TRENDING: One-shot information theory: Second-order asymptotics.”

Second-order asymptotics, I learned at the CMS, concerns how the optimal compression rate decays as the number of copies compressed grows. Imagine compressing a billion copies of a quantum message ρ. The number of bits needed about equals a billion times the von Neumann entropy HvN(ρ). Since a billion is less than infinity, 1,000,000,000 HvN(ρ) bits won’t suffice. Can we estimate the compression rate more precisely?

The question reminds me of gas stations’ hidden pennies. The last time I passed a station’s billboard, some number like $3.65 caught my eye. Each gallon cost about $3.65, just as each copy of ρ costs about HvN(ρ) bits. But a 9/10, writ small, followed the $3.65. If I’d budgeted $3.65 per gallon, I couldn’t have filled my tank. If you budget HvN(ρ) bits per copy of ρ, you can’t compress all your copies.

Suppose some station’s owner hatches a plan to promote business. If you buy one gallon, you pay $3.654. The more you purchase, the more the final digit drops from four. By cataloguing receipts, you calculate how a tank’s cost varies with the number of gallons, n. The cost equals $3.65 × n to a first approximation. To a second approximation, the cost might equal $3.65 × n + an, wherein a represents some number of cents. Compute a, and you’ll have computed the gas’s second-order asymptotics.

Nilanjana and Felix computed a’s associated with data compression and other quantum tasks. Second-order asymptotics met information theory when Strassen combined them in nonquantum problems. These problems developed under attention from Hayashi, Han, Polyanski, Poor, Verdu, and others. Tomamichel and Hayashi, as well as Li, introduced quantumness.

In the total-cost expression, $3.65 × n depends on n directly, or “linearly.” The second term depends on √n. As the number of gallons grows, so does √n, but √n grows more slowly than n. The second term is called “sublinear.”

Which is the word that rose to mind in the porter’s lodge. I told myself, Read the sublinear text.

Little wonder I irked the porter. At least—thanks to quantum information, my mistake, and facial expressions’ contagiousness—she smiled.

 

 

With thanks to Nilanjana Datta and Felix Leditzky for explanations and references; to Nilanjana, Felix, and Cambridge’s Centre for Mathematical Sciences for their hospitality; and to porters everywhere for providing directions.

“Feveral kinds of hairy mouldy fpots”

The book had a sheepskin cover, and mold was growing on the sheepskin. Robert Hooke, a pioneering microbiologist, slid the cover under one of the world’s first microscopes. Mold, he discovered, consists of “nothing elfe but feveral kinds of fmall and varioufly figur’d Mufhroms.” He described the Mufhroms in his treatise Micrographia, a 1665 copy of which I found in “Beautiful Science.” An exhibition at San Marino’s Huntington Library, “Beautiful Science” showcases the physics of rainbows, the stars that enthralled Galileo, and the world visible through microscopes.

Hooke image copy

Beautiful science of yesterday: An illustration, from Hooke’s Micrographia, of the mold.

“[T]hrough a good Microfcope,” Hooke wrote, the sheepskin’s spots appeared “to be a very pretty fhap’d Vegetative body.”

How like a scientist, to think mold pretty. How like quantum noise, I thought, Hooke’s mold sounds.

Quantum noise hampers systems that transmit and detect light. To phone a friend or send an email—“Happy birthday, Sarah!” or “Quantum Frontiers has released an article”—we encode our message in light. The light traverses a fiber, buried in the ground, then hits a detector. The detector channels the light’s energy into a current, a stream of electrons that flows down a wire. The variations in the current’s strength is translated into Sarah’s birthday wish.

If noise doesn’t corrupt the signal. From encoding “Happy birthday,” the light and electrons might come to encode “Hsappi birthdeay.” Quantum noise arises because light consists of packets of energy, called “photons.” The sender can’t control how many photons hit the detector.

To send the letter H, we send about 108 photons.* Imagine sending fifty H’s. When we send the first, our signal might contain 108- 153 photons; when we send the second, 108 + 2,083; when we send the third, 108 – 6; and so on. Receiving different numbers of photons, the detector generates different amounts of current. Different amounts of current can translate into different symbols. From H, our message can morph into G.

This spring, I studied quantum noise under the guidance of IQIM faculty member Kerry Vahala. I learned to model quantum noise, to quantify it, when to worry about it, and when not. From quantum noise, we branched into Johnson noise (caused by interactions between the wire and its hot environment); amplified-spontaneous-emission, or ASE, noise (caused by photons belched by ions in the fiber); beat noise (ASE noise breeds with the light we sent, spawning new noise); and excess noise (the “miscellaneous” folder in the filing cabinet of noise types).

Vahala image copy

Beautiful science of today: A microreso-nator—a tiny pendulum-like device— studied by the Vahala group.

Noise, I learned, has structure. It exhibits patterns. It has personalities. I relished studying those patterns as I relish sending birthday greetings while battling noise. Noise types, I see as a string of pearls unearthed in a junkyard. I see them as “pretty fhap[es]” in Hooke’s treatise. I see them—to pay a greater compliment—as “hairy mouldy fpots.”

P1040754

*Optical-communications ballpark estimates:

  • Optical power: 1 mW = 10-3 J/s
  • Photon frequency: 200 THz = 2 × 1014 Hz
  • Photon energy: h𝜈 = (6.626 × 10-34 J . s)(2 × 1014 Hz) = 10-19 J
  • Bit rate: 1 GB = 109 bits/s
  • Number of bits per H: 10
  • Number of photons per H: (1 photon / 10-19 J) (10-3 J/s)(1 s / 109 bits)(10 bits / 1 H) = 108

 

An excerpt from this post was published today on Verso, the blog of the Huntington Library, Art Collection, and Botanical Gardens.

With thanks to Bassam Helou, Dan Lewis, Matt Stevens, and Kerry Vahala for feedback. With thanks to the Huntington Library (including Catherine Wehrey) and the Vahala group for the Micrographia image and the microresonator image, respectively.

The theory of everything: Help wanted

When Scientific American writes that physicists are working on a theory of everything, does it sound ambitious enough to you? Do you lie awake at night thinking that a theory of everything should be able to explain, well, everything? What if that theory is founded on quantum mechanics and finds a way to explain gravitation through the microscopic laws of the quantum realm? Would that be a grand unified theory of everything?

The answer is no, for two different, but equally important reasons. First, there is the inherent assumption that quantum systems change in time according to Schrodinger’s evolution: i \hbar \partial_t \psi(t) = H \psi(t). Why? Where does that equation come from? Is it a fundamental law of nature, or is it an emergent relationship between different states of the universe? What if the parameter t, which we call time, as well as the linear, self-adjoint operator H, which we call the Hamiltonian, are both emergent from a more fundamental, and highly typical phenomenon: the large amount of entanglement that is generically found when one decomposes the state space of a single, static quantum wavefunction, into two (different in size) subsystems: a clock and a space of configurations (on which our degrees of freedom live)? So many questions, so few answers.

The static multiverse

The perceptive reader may have noticed that I italicized the word ‘static’ above, when referring to the quantum wavefunction of the multiverse. The emphasis on static is on purpose. I want to make clear from the beginning that a theory of everything can only be based on axioms that are truly fundamental, in the sense that they cannot be derived from more general principles as special cases. How would you know that your fundamental principles are irreducible? You start with set theory and go from there. If that assumes too much already, then you work on your set theory axioms. On the other hand, if you can exhibit a more general principle from which your original concept derives, then you are on the right path towards more fundamentalness.

In that sense, time and space as we understand them, are not fundamental concepts. We can imagine an object that can only be in one state, like a switch that is stuck at the OFF position, never changing or evolving in any way, and we can certainly consider a complete graph of interactions between subsystems (the equivalent of a black hole in what we think of as space) with no local geometry in our space of configurations. So what would be more fundamental than time and space? Let’s start with time: The notion of an unordered set of numbers, such as \{4,2,5,1,3,6,8,7,12,9,11,10\}, is a generalization of a clock, since we are only keeping the labels, but not their ordering. If we can show that a particular ordering emerges from a more fundamental assumption about the very existence of a theory of everything, then we have an understanding of time as a set of ordered labels, where each label corresponds to a particular configuration in the mathematical space containing our degrees of freedom. In that sense, the existence of the labels in the first place corresponds to a fundamental notion of potential for change, which is a prerequisite for the concept of time, which itself corresponds to constrained (ordered in some way) change from one label to the next. Our task is first to figure out where the labels of the clock come from, then where the illusion of evolution comes from in a static universe (Heisenberg evolution), and finally, where the arrow of time comes from in a macroscopic world (the illusion of irreversible evolution).

The axioms we ultimately choose must satisfy the following conditions simultaneously: 1. the implications stemming from these assumptions are not contradicted by observations, 2. replacing any one of these assumptions by its negation would lead to observable contradictions, and 3. the assumptions contain enough power to specify non-trivial structures in our theory. In short, as Immanuel Kant put it in his accessible bedtime story The critique of Pure Reason, we are looking for synthetic a priori knowledge that can explain space and time, which ironically were Kant’s answer to that same question.

The fundamental ingredients of the ultimate theory

Before someone decides to delve into the math behind the emergence of unitarity (Heisenberg evolution) and the nature of time, there is another reason why the grand unified theory of everything has to do more than just give a complete theory of how the most elementary subsystems in our universe interact and evolve. What is missing is the fact that quantity has a quality all its own. In other words, patterns emerge from seemingly complex data when we zoom out enough. This “zooming out” procedure manifests itself in two ways in physics: as coarse-graining of the data and as truncation and renormalization. These simple ideas allow us to reduce the computational complexity of evaluating the next state of a complex system: If most of the complexity of the system is hidden at a level you cannot even observe (think pre retina-display era), then all you have to keep track of is information at the macroscopic, coarse-grained level. On top of that, you can use truncation and renormalization to zero in on the most likely/ highest weight configurations your coarse-grained data can be in – you can safely throw away a billion configurations, if their combined weight is less than 0.1% of the total, because your super-compressed data will still give you the right answer with a fidelity of 99.9%. This is how you get to reduce a 9 GB raw video file down to a 300 MB Youtube video that streams over your WiFi connection without losing too much of the video quality.

I will not focus on the second requirement for the “theory of everything”, the dynamics of apparent complexity. I think that this fundamental task is the purview of other sciences, such as chemistry, biology, anthropology and sociology, which look at the “laws” of physics from higher and higher vantage points (increasingly coarse-graining the topology of the space of possible configurations). Here, I would like to argue that the foundation on which a theory of everything rests, at the basement level if such a thing exists, consists of four ingredients: Math, Hilbert spaces with tensor decompositions into subsystems, stability and compressibility. Now, you know about math (though maybe not of Zermelo-Fraenkel set theory), you may have heard of Hilbert spaces if you majored in math and/or physics, but you don’t know what stability, or compressibility mean in this context. So let me motivate the last two with a question and then explain in more detail below: What are the most fundamental assumptions that we sweep under the rug whenever we set out to create a theory of anything that can fit in a book – or ten thousand books – and still have predictive power? Stability and compressibility.

Math and Hilbert spaces are fundamental in the following sense: A theory needs a Language in order to encode the data one can extract from that theory through synthesis and analysis. The data will be statistical in the most general case (with every configuration/state we attach a probability/weight of that state conditional on an ambient configuration space, which will often be a subset of the total configuration space), since any observer creating a theory of the universe around them only has access to a subset of the total degrees of freedom. The remaining degrees of freedom, what quantum physicists group as the Environment, affect our own observations through entanglement with our own degrees of freedom. To capture this richness of correlations between seemingly uncorrelated degrees of freedom, the mathematical space encoding our data requires more than just a metric (i.e. an ability to measure distances between objects in that space) – it requires an inner-product: a way to measure angles between different objects, or equivalently, the ability to measure the amount of overlap between an input configuration and an output configuration, thus quantifying the notion of incremental change. Such mathematical spaces are precisely the Hilbert spaces mentioned above and contain states (with wavefunctions being a special case of such states) and operators acting on the states (with measurements, rotations and general observables being special cases of such operators). But, let’s get back to stability and compressibility, since these two concepts are not standard in physics.

Stability

Stability is that quality that says that if the theory makes a prediction about something observable, then we can test our theory by making observations on the state of the world and, more importantly, new observations do not contradict our theory. How can a theory fall apart if it is unstable? One simple way is to make predictions that are untestable, since they are metaphysical in nature (think of religious tenets). Another way is to make predictions that work for one level of coarse-grained observations and fail for a lower level of finer coarse-graining (think of Newtonian Mechanics). A more extreme case involves quantum mechanics assumed to be the true underlying theory of physics, which could still fail to produce a stable theory of how the world works from our point of view. For example, say that your measurement apparatus here on earth is strongly entangled with the current state of a star that happens to go supernova 100 light-years from Earth during the time of your experiment. If there is no bound on the propagation speed of the information between these two subsystems, then your apparatus is engulfed in flames for no apparent reason and you get random data, where you expected to get the same “reproducible” statistics as last week. With no bound on the speed with which information can travel between subsystems of the universe, our ability to explain and/or predict certain observations goes out the window, since our data on these subsystems will look like white noise, an illusion of randomness stemming from the influence of inaccessible degrees of freedom acting on our measurement device. But stability has another dimension; that of continuity. We take for granted our ability to extrapolate the curve that fits 1000 data points on a plot. If we don’t assume continuity (and maybe even a certain level of smoothness) of the data, then all bets are off until we make more measurements and gather additional data points. But even then, we can never gather an infinite (let alone, uncountable) number of data points – we must extrapolate from what we have and assume that the full distribution of the data is close in norm to our current dataset (a norm is a measure of distance between states in the Hilbert space).

The emergence of the speed of light

The assumption of stability may seem trivial, but it holds within it an anthropic-style explanation for the bound on the speed of light. If there is no finite speed of propagation for the information between subsystems that are “far apart”, from our point of view, then we will most likely see randomness where there is order. A theory needs order. So, what does it mean to be “far apart” if we have made no assumption for the existence of an underlying geometry, or spacetime for that matter? There is a very important concept in mathematical physics that generalizes the concept of the speed of light for non-relativistic quantum systems whose subsystems live on a graph (i.e. where there may be no spatial locality or apparent geometry): the Lieb-Robinson velocity. Those of us working at the intersection of mathematical physics and quantum many-body physics, have seen first-hand the powerful results one can get from the existence of such an effective and emergent finite speed of propagation of information between quantum subsystems that, in principle, can signal to each other instantaneously through the action of a non-local unitary operator (rotation of the full system under Heisenberg evolution). It turns out that under certain natural assumptions on the graph of interactions between the different subsystems of a many-body quantum system, such a finite speed of light emerges naturally. The main requirement on the graph comes from the following intuitive picture: If each node in your graph is connected to only a few other nodes and the number of paths between any two nodes is bounded above in some nice way (say, polynomially in the distance between the nodes), then communication between two distant nodes will take time proportional to the distance between the nodes (in graph distance units, the smallest number of nodes among all paths connecting the two nodes). Why? Because at each time step you can only communicate with your neighbors and in the next time step they will communicate with theirs and so on, until one (and then another, and another) of these communication cascades reaches the other node. Since you have a bound on how many of these cascades will eventually reach the target node, the intensity of the communication wave is bounded by the effective action of a single messenger traveling along a typical path with a bounded speed towards the destination. There should be generalizations to weighted graphs, but this area of mathematical physics is still really active and new results on bounds on the Lieb-Robinson velocity gather attention very quickly.

Escaping black holes

If this idea holds any water, then black holes are indeed nearly complete graphs, where the notion of space and time breaks down, since there is no effective bound on the speed with which information propagates from one node to another. The only way to escape is to find yourself at the boundary of the complete graph, where the nodes of the black hole’s apparent horizon are connected to low-degree nodes outside. Once you get to a low-degree node, you need to keep moving towards other low-degree nodes in order to escape the “gravitational pull” of the black hole’s super-connectivity. In other words, gravitation in this picture is an entropic force: we gravitate towards massive objects for the same reason that we “gravitate” towards the direction of the arrow of time: we tend towards higher entropy configurations – the probability of reaching the neighborhood of a set of highly connected nodes is much, much higher than hanging out for long near a set of low-degree nodes in the same connected component of the graph. If a graph has disconnected components, then their is no way to communicate between the corresponding spacetimes – their states are in a tensor product with each other. One has to carefully define entanglement between components of a graph, before giving a unified picture of how spatial geometry arises from entanglement. Somebody get to it.

Erik Verlinde has introduced the idea of gravity as an entropic force and Fotini Markopoulou, et al. have introduced the notion of quantum graphity (gravity emerging from graph models). I think these approaches must be taken seriously, if only because they work with more fundamental principles than the ones found in Quantum Field Theory and General Relativity. After all, this type of blue sky thinking has led to other beautiful connections, such as ER=EPR (the idea that whenever two systems are entangled, they are connected by a wormhole). Even if we were to disagree with these ideas for some technical reason, we must admit that they are at least trying to figure out the fundamental principles that guide the things we take for granted. Of course, one may disagree with certain attempts at identifying unifying principles simply because the attempts lack the technical gravitas that allows for testing and calculations. Which is why a technical blog post on the emergence of time from entanglement is in the works.

Compressibility

So, what about that last assumption we seem to take for granted? How can you have a theory you can fit in a book about a sequence of events, or snapshots of the state of the observable universe, if these snapshots look like the static noise on a TV screen with no transmission signal? Well, you can’t! The fundamental concept here is Kolmogorov complexity and its connection to randomness/predictability. A sequence of data bits like:

10011010101101001110100001011010011101010111010100011010110111011110

has higher complexity (and hence looks more random/less predictable) than the sequence:

10101010101010101010101010101010101010101010101010101010101010101010

because there is a small computer program that can output each successive bit of the latter sequence (even if it had a million bits), but (most likely) not of the former. In particular, to get the second sequence with one million bits one can write the following short program:

string s = ’10′;
for n=1 to 499,999:
s.append(’10’);
n++;
end
print s;

As the number of bits grows, one may wonder if the number of iterations (given above by 499,999), can be further compressed to make the program even smaller. The answer is yes: The number 499,999 in binary requires \log_2 499,999 bits, but that binary number is a string of 0s and 1s, so it has its own Kolmogorov complexity, which may be smaller than \log_2 499,999. So, compressibility has a strong element of recursion, something that in physics we associate with scale invariance and fractals.

You may be wondering whether there are truly complex sequences of 0,1 bits, or if one can always find a really clever computer program to compress any N bit string down to, say, N/100 bits. The answer is interesting: There is no computer program that can compute the Kolmogorov complexity of an arbitrary string (the argument has roots in Berry’s Paradox), but there are strings of arbitrarily large Kolmogorov complexity (that is, no matter what program we use and what language we write it in, the smallest program (in bits) that outputs the N-bit string will be at least N bits long). In other words, there really are streams of data (in the form of bits) that are completely incompressible. In fact, a typical string of 0s and 1s will be almost completely incompressible!

Stability, compressibility and the arrow of time

So, what does compressibility have to do with the theory of everything? It has everything to do with it. Because, if we ever succeed in writing down such a theory in a physics textbook, we will have effectively produced a computer program that, given enough time, should be able to compute the next bit in the string that represents the data encoding the coarse-grained information we hope to extract from the state of the universe. In other words, the only reason the universe makes sense to us is because the data we gather about its state is highly compressible. This seems to imply that this universe is really, really special and completely atypical. Or is it the other way around? What if the laws of physics were non-existent? Would there be any consistent gravitational pull between matter to form galaxies and stars and planets? Would there be any predictability in the motion of the planets around suns? Forget about life, let alone intelligent life and the anthropic principle. Would the Earth, or Jupiter even know where to go next if it had no sense that it was part of a non-random plot in the movie that is spacetime? Would there be any notion of spacetime to begin with? Or an arrow of time? When you are given one thousand frames from one thousand different movies, there is no way to make a single coherent plot. Even the frames of a single movie would make little sense upon reshuffling.

What if the arrow of time emerged from the notions of stability and compressibility, through coarse-graining that acts as a compression algorithm for data that is inherently highly-complex and, hence, highly typical as the next move to make? If two strings of data look equally complex upon coarse-graining, but one of them has a billion more ways of appearing from the underlying raw data, then which one will be more likely to appear in the theory-of-everything book of our coarse-grained universe? Note that we need both high compressibility after coarse-graining in order to write down the theory, as well as large entropy before coarse-graining (from a large number of raw strings that all map to one string after coarse-graining), in order to have an arrow of time. It seems that we need highly-typical, highly complex strings that become easy to write down once we coarse grain the data in some clever way. Doesn’t that seem like a contradiction? How can a bunch of incompressible data become easily compressible upon coarse-graining? Here is one way: Take an N-bit string and define its 1-bit coarse-graining as the boolean AND of its digits. All but one strings will default to 0. The all 1s string will default to 1. Equally compressible, but the probability of seeing the 1 after coarse-graining is 2^{-N}. With only 300 bits, finding the coarse-grained 1 is harder than looking for a specific atom in the observable universe. In other words, if the coarse-graining rule at time t is the one given above, then you can be pretty sure you will be seeing a 0 come up next in your data. Notice that before coarse-graining, all 2^N strings are equally likely, so there is no arrow of time, since there is no preferred string from a probabilistic point of view.

Conclusion, for now

When we think about the world around us, we go to our intuitions first as a starting point for any theory describing the multitude of possible experiences (observable states of the world). If we are to really get to the bottom of this process, it seems fruitful to ask “why do I assume this?” and “is that truly fundamental or can I derive it from something else that I already assumed was an independent axiom?” One of the postulates of quantum mechanics is the axiom corresponding to the evolution of states under Schrodinger’s equation. We will attempt to derive that equation from the other postulates in an upcoming post. Until then, your help is wanted with the march towards more fundamental principles that explain our seemingly self-evident truths. Question everything, especially when you think you really figured things out. Start with this post. After all, a theory of everything should be able to explain itself.

UP NEXT: Entanglement, Schmidt decomposition, concentration measure bounds and the emergence of discrete time and unitary evolution.

Top 10 questions for your potential PhD adviser/group

Everyone in grad school has taken on the task of picking the perfect research group at some point.  Then some among us had the dubious distinction of choosing the perfect research group twice.  Luckily for me, a year of grad research taught me a lot and I found myself asking group members and PIs (primary investigators) very different questions.  And luckily for you, I wrote these questions down to share with future generations.  My background as an experimental applied physicist showed through initially, so I got Shaun Maguire and Spiros Michalakis to help make it applicable for theorists too, and most of them should be useful outside physics as well.

Questions to break that silence when your potential advisor asks “So, do you have any questions for me?”

1. Are you taking new students?
– 2a. if yes: How many are you looking to take?
– 2b. if no: Ask them about the department or other professors.  They’ve been there long enough to have opinions.  Alternatively, ask what kinds of questions they would suggest you ask other PIs
3. What is the procedure for joining the group?
4. (experimental) Would you have me TA?  (This is the nicest way I thought of to ask if a PI can fund you with a research assistance-ship (RA), though sometimes they just like you to TA their class.)
4. (theory) Funding routes will often be covered by question 3 since TAs are the dominant funding method for theory students, unlike for experimentalists. If relevant, you can follow up with: How does funding for your students normally work? Do you have funding for me?
5. Do new students work for/report to other grad students, post docs, or you directly?
6. How do you like students to arrange time to meet with you?
7. How often do you have group meetings?
8. How much would you like students to prepare for them?
9. Would you suggest I take any specific classes?
10. What makes someone a good fit for this group?

And then for the high bandwidth information transfer.  Grill the group members themselves, and try to ask more than one group member if you can.

1. How much do you prepare for meetings with PI?
2. How long until people lead their own project? – Equivalently, who’s working on what projects.
3. How much do people on different projects communicate? (only group meeting or every day)
4. Is the PI hands on (how often PI wants to meet with you)?
5. Is the PI accessible (how easily can you meet with the PI if you want to)?
6. What is the average time to graduation? (if it’s important to you personally)
7. Does the group/subgroup have any bonding activities?
8. Do you think I should join this group?
9. What are people’s backgrounds?
10. What makes someone a good fit for this group?

Hope that helps.  If you have any other suggested questions, be sure to leave them in the comments.