Tag Archives: causality

Explaining away the “fixity” of the past (4 of 5ish)

Intuitively, we think of the future as open and the past as fixed. Meaning that the future is up to us; dependent on our actions. And the past is not; it’s independent of our actions. This way of thinking is very natural and goes deep. We think that being in the past makes those events fixed. But that’s wrong: it’s an oversimplification. It’s the fact that those events (that we are thinking of) represent a lower entropy state that makes them fixed. And an occurrence of a lower entropy state requires a large number of microscopic states which all count as the same state at some coarse-grained level, such as “the pressure of the air in this tire.”

Let us count the Ways

If all you know about “entropy” is that it’s related to “disorder” (true in a limited range of cases), the fact that entropy is only defined statistically will come as a surprise. But the classic definition for entropy given by Ludwig Boltzmann is S = k ln W. S is entropy, k is the Boltzmann constant, and W is the probability, given by the count of the ways that the macroscopic state can be realized by various microscopic arrangements. Because the numbers of microscopic states in question are enormous (18 grams of water contains 6 x 10^23 molecules for example), the probabilities quickly become overwhelming for macroscopic systems. Ultimately, the increase of entropy is “merely” probabilistic. But those probabilities can come damn close to certainty.

Why are so many processes irreversible? By reversing a process, we mean: removing a present condition, to give the future a condition like the one had in the past.  For example, suppose I dropped an egg on the kitchen floor, making a mess.  Why can’t I undo that?  The molecules of egg shell and yolk are still there on the floor (and a few in the air), and they traced in-principle reversible paths (just looking at the micro-physics of molecular motion) to get there.  So why can’t I make an intact egg from this?

The answer is entropy, and therefore the count of the Ways.  There are many ways to get from a broken egg to a more-broken egg.  There are many orders of magnitude fewer ways to get from a broken egg to a whole egg.  One would have much better odds guessing the winning lottery number, rather than trying to find a manipulation that makes the egg whole.  There is some extremely narrow range of velocities of yolk and shell-bits such that if one launched the bits with just those velocities, molecules would in the immediate future bond to form whole egg-shell, with yolk inside – but finding those conditions, even aside from implementing them, is impossible in practice. Because the more-broken egg states so vastly outnumber the whole-egg states, our attempts to reverse the mess have vanishing probability of success.

On a local level, some macroscopic processes are reversible. I accidentally knock a book off a table; I pick it up and put it back. The room is unchanged, on a suitably coarse-grained analysis — but I have changed. I used up some glucose to do that mechanical work. I could eat some more food to get it back, but the growth of the relevant plants ultimately depends on thermodynamically irreversible processes in the sun. On a global analysis, even the restoration of the book to its place is an irreversible process.

The familiar part of the past is fixed …

Entropy thus explains why we can’t arrange the future to look just like the past.  The different problem of trying to affect the past faces similar obstacles.  The “immutability of the past” arises because the events we humans care about are human-sized, naturally enough, i.e. macroscopic.  Macroscopic changes in practice always involve entropy increases, and always leave myriad microphysical traces such as emitted sounds and reflected and radiated light and heat.  These go on to interact with large systems of particles, typically causing macroscopic consequences.  While phonons (quanta of sound) and photons follow CPT-reversible paths, that does not mean we can collect those microscopic energies and their macroscopic consequences in all the right places and arrange to have the past events that we want.  As in the broken egg case, even if we had the engineering skills to direct the energies, we face insurmountable information deficits.  We know neither where to put the bits, nor with what energy to launch them.

In addition to the time-asymmetry of control over macroscopic events, we have time-asymmetric knowledge, for closely related reasons.  Stephen Hawking connected the “psychological arrow of time”, based on memory, to the “entropic arrow of time”, which orients such that lower-entropy times count as past, and higher as future.  Mlodinow and Brun argue that if a memory system is capable of remembering more than one thing, and exists in an environment where entropy increases in one time-direction, then the recording of a memory happens at a lower-entropy time than its recall.  Our knowledge of the past is better than our knowledge of the future because we have memories of the past, which are records, and the creation of records requires increasing entropy.

Consider an example adapted from David Albert.  Suppose we now, at t1, observe the aftermath of an avalanche and want to know the position of a particular rock (call it r) an hour ago, at t0, the start of the avalanche.  We can attempt to retrodict it, using the present positions and shapes of r and all other nearby rocks, the shape of the remnant of the slope they fell down, the force of gravity, our best estimates of recent wind speeds, etc.  In this practically impossible endeavor, we would be trying to reconstruct the complete history of r between t0 and t1.  Or we might be lucky enough to have a photograph of r from t0, which has been kept safe and separate from the avalanche.  In that case our knowledge about r at t0 is independent of what happened to r after t0, although it does depend on some knowledge of the fate of the photograph.  As Albert writes [p. 57], “the fact  that  our  experience  of  the  world  offers  us  such  vivid  and  plentiful  examples  of  this epistemic independence [of earlier events from later ones] very naturally brings with it the feeling of a causal and counterfactual independence as well.”

Contrast our knowledge of the future position of r an hour from now.  Here there are no records to consult, and prediction is our only option.  Almost any feature of r’s environment could be relevant to its future position, from further avalanches to freak weather events to meddling human beings.   The plenitude of causal handles on future events is what makes them so manipulable.

Note that it is not that our knowledge of the macroscopic past puts it beyond our control: we cannot keep past eggs from breaking even if we did not know about them.  Nor is it our ignorance of the future that gives us control over future macroscopic states (nor the illusion of control).  Rather, it is the increase of entropy over time, and the related fact that macroscopic changes typically leave macroscopic records at entropically-future times but not past times, that explains both the time-asymmetry of control and of memory.  A memory is a record of the past.  And a future macroscopic event (for example, a footprint) that we influence by a present act (walking in the mud) is a record of that act. If we could refer to a set of microphysical past events that did not pose insurmountable information deficits preventing us from seeing their relation to present events, might they become up to us? 

…But not the whole of the past is fixed

Yes, some microphysical arrangements, under a peculiar description, are up to us. We’ve been here before, in Betting on The Past, in the previous post in this series. There, you could guarantee that the past state of the world was such as to correspond, according to laws of nature, to your action to take Bet 2. You could do so just by taking Bet 2. Or you could guarantee that the microphysical states in question were those corresponding to your later action to take Bet 1. When you’re drawing a self-referential pie chart, you can fill it in however you like. Dealing with events specified in terms of their relation to you now is dealing in self-reference, regardless of whether those events are past, present, or future. Of course, you have no idea which microscopic events, described in microscopic terms, will have been different depending on your choice. But who cares? You have no need to know that in order to get what you want.

We’re used to the idea of asymmetric dependence relations between events, such as one causing another. And we’re used to the idea of independent events that have no link whatsoever. We’re not used to the idea of events and processes that are bidirectionally linked, with neither being master and neither being slave. But these bidirectional links are ubiquitous at the microscopic level. It is only by using our macroscopic concepts, and lumping together event-classes of various probabilities (various counts of microscopic ways to constitute the macroscopic properties), that we can find a unidirectional order in history.

There’s nothing wrong with attributing asymmetric causality to macroscopic processes – entropy and causality are reasonably well-defined there. But if we overgeneralize and attribute the asymmetry to all processes extending through time, we make a mistake. Indeed, following Hawking and Carroll [2010] and others, we can define “the arrow of time” as the direction in which entropy increases.

This gets really interesting when we consider cosmological theories which allow for times further from our time than the Big Bang, but at which entropy is higher than at the Big Bang. Don Page has a model like this for our universe. Sean Carroll and Jennifer Chen [2004] have a multiverse model with a similar feature, pictured below:

Carroll and Chen [2004] multiverse

The figure shows a parent universe spawning various baby universes. One of the ((great-(etc))grand)babies is ours. The parent universe has a timeline infinite in both directions, with a lowest (but not necessarily low!) entropy state in the middle. Observers in baby universes at the top of the diagram will think of the bottom of the diagram, including any baby universes and their occupants, as being in their past. And any observers in the babies at the bottom will return the favor. Each set of observers is equally entitled to their view. At the central time-slice, where entropy is approximately steady, there is no arrow of time. As one traverses the diagram from top to bottom, the arrow of time falters, then flips. Where the arrow of time points depends on where you sit. The direction of time and the flow of cause and effect are very different in modern physics than they are in our intuitions.

Another route to the same conclusion

So far we’ve effectively equated causation to entropy-increasing processes, where the cause is the lower-entropy state and the effect is the corresponding higher-entropy state. But there’s another way to approach causality, one which finds its roots in the way science and engineering investigations actually proceed. On Judea Pearl’s approach in his book Causality, an investigation starts with the delineation of system being investigated. Then we construct directed acyclic graphs to try to model the system. For example, a slippery sidewalk may be thought to be the result of the weather and/or people watering their grass, as shown in the tentative causal model below, side (a):

Causal modeling example from Pearl

Certain events and properties are considered endogenous, i.e. parts of the system (season, rain…), and other variables are considered exogenous (civil engineers investigating pedestrian safety …). To test the model, and determine causal relations within the system, we Do(X=x) where X is some system variable and x one of its particular states. This Do(X=x), called an “intervention”, need not involve human action, despite the name. But it does need to involve an exogenous variable setting the value of X in a way that breaks any tendencies of other endogenous variables to raise or lower the probabilities of values of X. In side (b) of the diagram this shows as the disappearance of the arrow from X1, season, to X3, sprinkler use. The usual affect of season causing dry (wet) lawns and thus inspiring sprinkler use (disuse) has been preempted by the engineer turning on a sprinkler to investigate pedestrian safety.

As Pearl writes,

If you wish to include the entire universe in the model, causality disappears because interventions disappear—the manipulator and the manipulated [lose] their distinction. … The scientist carves a piece from the universe and proclaims that piece in – namely, the focus of the investigation. The rest of the universe is then considered out. …This choice of ins and outs creates asymmetry in the way we look at things and it is this asymmetry that permits us to talk about ‘outside intervention’ and hence about causality and cause-effect directionality.

Judea Pearl, Causality (2nd ed.): 419-420

It’s only by turning variables on and off from outside the system that we can put arrow-heads on the lines connecting one variable to another. In the universe as a whole, there is no “outside the system”, and we are left with undirected links.

In Judea Pearl’s exposition of the scientific investigation of causality, causality disappears at the whole-universe level. In the entropy-based definition of causality, causality doesn’t apply between fully (microscopically) specified descriptions of different times because irreversibility only applies where the number of ways of making up the “effect” state is far greater than the number of ways of making up the “cause” state – but the number of ways to make a fully-specified state is 1.

The bottom line

Laws of nature / Causality / Determinism can be:

(A) Universal, applying to everything

(B) Unidirectional, making for controllers and the controlled

(C) Scientific.

Choose not more than two.


Albert, David Z. After Physics. Cambridge: Harvard College, 2015.

Carroll,  Sean M. From Eternity to Here: the Quest for the Ultimate Theory of Time. New York: Penguin, 2010.

Carroll, Sean M. and Jennifer Chen 2004. Spontaneous Inflation and the Origin of the Arrow of Time, URL = <https://arxiv.org/abs/hep-th/0410270v1/>

Hawking, Stephen. A Brief History of Time. New York: Bantam Books, 1988.

Mlodinow, Leonard and Todd A. Brun. Relation between the psychological and thermodynamic arrows of time, Phys. Rev. E 89: 052102, 2014.

Page, Don 2009. Symmetric Bounce Quantum State of the Universe. URL = <https://arxiv.org/abs/0907.1893v4/>.

Pearl, Judea. Causality: Models, Reasoning, and Inference. New York: Cambridge University Press, 2000 (2nd edition 2009).

Free will, part 2 of (5-ish)

As we said last time “alpha is unavoidable for you” means “For every action A you could take, if you did A, alpha would (still) happen / be true”. A key concept here is a would statement, which logicians call a “counterfactual conditional”. For example, if I had written that logicians call a would statement “the cat’s meow”, you would think I was joking. The term “counterfactual” is a bit misleading because there can be would-counterfactuals with factual antecedents. For example if I had written that the Consequence Argument was formulated by Peter van Inwagen, you would have read the name Peter van Inwagen. And I did; and you did. What the would statement adds, beyond the simple statement that I did so write and you did so read, is the idea that there’s a robust connection between those things. A counterfactual is a bigger claim than the “if” from propositional logic (which logicians often symbolize with “⊃” — so A ⊃ B simply means that it’s not the case both that A is true and B is false.) Note that counterfactual antecedents and consequents (the “if…” and “would…” parts, respectively) range over processes, events (including boring events like a particular state obtaining at a particular time), and actions.

The premises of the Consequence Argument that we’ll question are (and here let’s spell out the counterfactuals contained in the shorter versions):

(2) The distant past state of the universe is such that, for every action A you could take, if you did A, that past state would still obtain.

(3) The laws of nature are such that, for every action A you could take, if you did A, those laws would still obtain.

And we won’t question premise (1) from the last post, nor the formulation of scientific determinism from which it follows, but we will take a good hard look at that formulation. It turns out not to say some of the things that we might on first glance think it says.

Laws of nature

As I mentioned last time, premise (3) is controversial among philosophers. There’s a dispute over what kind of thing the “laws of nature” are. On the view called the Best Systems Analysis, laws of nature are just a sort of summary of facts. This view is held by the most zealous fans of Occam’s Razor — or at least, that’s what I think they would claim — because it avoids treating laws of nature as a deeply separate category of facts. (Occam’s Razor says “Entities should not be multiplied without necessity.”) Terrance Tomkow has a good explanation of the Best Systems Analysis at: https://tomkow.typepad.com/tomkowcom/2013/09/the-computational-theory-of-natural-laws.html and https://tomkow.typepad.com/tomkowcom/2014/02/computation-laws-and-supervenience.html

The Best Systems Analysis has a model in Algorithmic Information Theory, for which a very crude model is a .zip file, such as you might create from a .txt document on a computer. The .zip file contains compression rules plus compressed data (and the program that reads and writes zip files, say 7zip, contains additional compression rules). The compression rules (including those in the zip-making program) are like the laws of nature, such as the mass and charge of an electron; the remaining data is like the remaining facts, such as the locations of electrons at particular times. Different programs, say 7zip vs the Windows file compressor, might make somewhat different decisions about how to divide up the totality of information in a text file into “compression rules” vs “raw data”. And of course, if you put a different text file into the zip program, you typically get output containing both different “rules” and different “raw data”. For example, if I have a text file where most lines consist of a lot of spaces, the compressor program might make a rule where one special character represents 9 consecutive spaces and another represents 3 consecutive spaces. Then a line consisting of 13 spaces can be abbreviated with 3 characters. But if the text file instead contains a lot of consecutive z’s and no consecutive spaces, the rules part of the compressed file will contain rules representing z’s instead.

It’s this last point that casts doubt on the immunity of laws of nature to human action. If human actions were distributed differently, the totality of physical facts of the universe would be different, so different “compression rules” might most-efficiently summarize the total physical information. If all that laws of nature are, are just efficient summary rules for physical information, then they depend on all that information: the human related bits included every bit as much as the rest. The individual ground-level facts, including what people are doing, are fundamental, on the Best Systems Analysis. The laws are consequences of those facts, not governors of them.

Now, it would be nice if I could say whether the Best Systems Analysis is correct. But all I can do is register my hazy suspicion that it’s not. (My thoughts on that aren’t even worth setting down.) So it seems we are stuck on this part. But wait! What’s that smell? Yes, it’s the sweet smell of unnecessary work! (Hat tip: Dilbert comic.) There’s a chance we don’t have to decide about Premise 3 of the Consequence Argument, and thus we don’t have to decide about the Best Systems Analysis of laws. We don’t need to evaluate Premise 3 if we can undermine Premise 2 of the Consequence Argument. And we can.

The Past

Let’s look again at Premise 1 of the Consequence Argument, or better yet, at our definition of scientific determinism (abbreviated SD), from which Premise 1 followed:

(SD) Determinism requires a world that (a) has a well-defined state or description, at any given time, and (b) laws of nature that are true at all places and times. If we have all these, then if (a) and (b) together logically entail the state of the world at all other times (or, at least, all times later than that given in (a)), the world is deterministic.

Stanford Encyclopedia of Philosophy entry on “causal determinism”

SD says that the laws plus a complete description can logically entail the state of the world, either symmetrically both into the past and future, or asymmetrically just into the future. But the actual laws of physics that science has given us to this point are time-symmetric in exactly this sense, at least when they are deterministic. Conservation of information in quantum mechanics is a case in point. (There are deterministic interpretations of quantum mechanics, such as the Everett Interpretation, which interpret quantum probabilities as statements of rational expectation in the face of partial ignorance.) Because of conservation of information, the final state of a quantum system plus the environment, after an interaction, must contain the information that the system had beforehand. In other words, from the later state plus the laws of quantum mechanics, the earlier state is logically implied. Scientific determinism is a two-way street.

But now notice: “causality” is supposed to be a one-way street. A cause is not supposed to be itself caused by the very thing it supposedly caused. Let’s make this part of the definition of “cause”: causation is asymmetric, so that “A causes B” and “B causes A” are contraries. It immediately follows that

(Determinism ≠ Causality) The existence of laws of nature that logically entail state B at one time given state A at another, does not suffice to show that A causes B.

Causation is not interdependence, but one-way dependence. For example, a room-temperature cup of water could be caused by leaving a cup of steaming hot water in the room until the temperature equilibrated with the room. Or it could be caused by leaving a cup of ice water in the room for a long time, instead. The effect is guaranteed by the cause, but no particular cause is guaranteed by the later state. Without this kind of asymmetry in the relationship, causality is lacking, as physicist Sean Carroll explains in this 3-minute video. “There’s just a pattern that particles follow,” he says. “Kind of like how the integer after 42 is 43, and the integer before it is 41, but 42 doesn’t cause 41 or 43, there’s just a pattern traced out by those numbers.”

So where does causality come from? I’ll give two answers reflecting different interpretations of “causality” – both of them useful in different ways. On one reasonable interpretation (used by Sean Carroll), causality comes from entropy. On another reasonable interpretation, given by Judea Pearl in his book Causality, causality comes from our division of the world into a system of interest vs exogenous variables.

Now let’s look again at the Consequence Argument’s premise

(2) The distant past state of the universe is such that, for every action A you could take, if you did A, that past state would still obtain.

Is it true? If the “distant past state” is given macroscopically, describing such things as glasses of water and their temperature, then (2) is true, but not adequate for the argument, because (as we’ll see later) the present state of the universe doesn’t follow from the past macroscopic state. But if the “distant past state” is given in microscopic detail, then it is not independent of the present state including what we are doing now.

There is no reason to believe (2), where the states in question are described in microscopic detail. Or rather, there is a reason, but it evaporates once you realize that there’s another explanation for why we never observe the past depending on the present or future. We don’t need to posit a magical “flow of time”, or a universal master/slave relationship between past and future. The idea that the past has power over the future but not vice versa is an overgeneralization from our experience of the macroscopic world: our experience of states and processes large and complex enough for entropy to be well-defined and increasing in only one temporal direction.

All this has gone by way too fast, and there are many points that need further justification and explanation. Along the way, we’ll use modern science to deeply challenge our intuitive conceptions of time and causality, then show how those wrong intuitions about how our universe works have affected our views of the free will “problem”. Scientific determinism isn’t the problem – our misconceptions of it are. The traditional free will problem doesn’t hinge on the definition of “free will”, but of “determinism” and “causality”.