An Alternative View of Statistical Mechanics

Probabilistic Systems: An Alternative View of Statistical Mechanics 1. Abstract (or why I am writing this and why you should read it)Thermodynamics is cool (pun intended). Its main characters are heat, the equipartition theorem, and the Boltzmann distribution. If you aren't familiar with these figures, you will be soon. But its star is that most elusive of concepts, a mainstay in physics, probability, and computer science. The purveyor of inevitable order and universal doom; entropy. Wanting to understand entropy on a deeper level while avoiding the messiness of the second law and Maxwell's demon, and understanding its application to other fields, is the driving force of this work. Other thermodynamic theorems were in close reach in investigating entropy. This work is the casual presentation of that overarching investigation, trying to be more Feynman lecture than a physics paper. Within this piece, you will find alternative derivations for many significant theorems of thermodynamics. The necessary mathematics isn't exactly a walk in the park. Still, it shouldn't be out of reach if you know a little about calculus, probability and linear algebra. If it is, let me know. And especially let me know where I made one of many inevitable mistakes. The work isn't even that physical. The theorems are all derived with the minimum possible ingredients, making them applicable to a broad class of probabilistic systems rather than the specific class where the laws of physics apply.2. Probabilistic Systems (or, how to answer silly questions)Two people are throwing a ball at each other. Where is the ball? This may seem like a silly question, and it sort of is. We have been given painfully little information! If we knew nothing else, we could only say the ball is at *some* point between the two, or x∈

[

0,L

]

. As we have no further information, we'll supply a uniform distribution, 𝜌

(

)

=1/L for x∈

[

0,L

]

. This is saying is that some specific claim about where the ball is, such as it being at point x=0.2, is no more reasonable than saying x=0.7. Our probability distribution would be unfair to assign a greater probability to either one. Therefore, it is uniform from 0 through to L. If we were given more information, we could change this distribution. For example, if it is given that the ball is closer to person A than person B, we would have 𝜌

(

)

=2/L for x∈

[

0,L/2

]

. Therefore, we keep the situations where the ball is closer to A, and discard those where it is closer to B. This is called conditioning, and is how we incorporate new information into a model. If the ball were moving with velocity v to the right, a state where it is in position x would, dt seconds later, be a state where it is in position x+v.dt. We could then update our model to 𝜌

(

)

=2/L for x∈

[

v.dt,v.dt+L/2

]

. Our best model of the ball's position is now a uniform distribution between those two new bounds. Good job, we have just built a model to handle states in a probabilistic system! In physics, we are often dealing with equally silly questions and, indeed, far sillier ones. Where are the gas molecules? All trillions of them. Trillions of trillions of them. Or what we innocently call a "mole of molecules" (Avagadro's number, the number of molecules in a mole, is N

=6.6022×10

≈

(

)

). Thankfully, silly questions that don't give much information can be satisfied with silly answers that give equally little information. I don't have to say exactly how molecule X is going to bounce around, if a general pattern is all which is reasonable. Complex systems with unfathomably many variables are boiled down into simple models which fit our simple observations. Thus we get emergence, a most wonderful process where a compound system adopts properties and patterns not found in its constituents.3. States and ModelsLet's try to mathematically work with these sorts of systems. We'll need to set up some core mechanics for our models. So let's represent states, physical or not, as continuous variables in R

that evolve deterministically. These individuals states have predictable dynamics, updating according to some evolution map

𝜏:R

→

. These can be thought of as offering a naturally evolving path for states,

𝜏

(

)

In practice, however, we will work with uncertain systems where one of many states may be the case. A model Φ gives the likelihood of states by some density function 𝜌

(

)

. This is the proportion of states in some small volume around

, divided by that volume. As the volume gets really really small, halving it will half the proportion of states within it. As its really small, the probability is pretty much the same through out it. So ever smaller volumes give a converged 𝜌

(

)

density value, assuming necessary smoothness and all that. Models also have bounds dΦ such that states are always contained within the bounds, probability 1. A calculus trick involving surface integrals will make bounds very important, and allow various theorems to be derived. Bounds, after all, ensure models are not completely erratic.3.1. Change in ProbabilityIf we only know how states

(

)

evolve, how do we find how models 𝜌

(

)

evolve? Consider a model before and after an evolution and how the density 𝜌 changes. We will have evolutions occur infinitesimally and continuously. Basically, without jumps. Things don't usually teleport, so we should be ok there. For the maths to make sense, we need to be able to think of a sort of map of different states, called a phase space. The probability of states then becomes akin to asking where is everyone? We've already established that we know where everyone is going, by

𝜏, so if we know where people are, we can find out where they will be! Let's draw out an area Z of our map, and see how the number of people there changes over time. This is the same as seeing how many people enter, and how many leave. We go along the boundary, see how many people are nearby with 𝜌

(

)

, and see whether their direction of movement

𝜏

(

)

means they are entering, exiting, or just travelling along the border

dn. Mathematically,

(

)

∫dZ𝜌

(

)

⏠

⏣

⏡

⏣

⏢

Moveable

𝜌.

𝜏

(

)

⋅

⏠

⏣

⏡

⏣

⏢

Motion against the surface of

Now we'll be using an incredibly useful mathematical trick. Suppose I wanted to find the difference between this years earnings for some company and its earnings 10 years ago. I could either just, well, take the difference, or add the incremental differences in earnings from year to year.

	Δ	=f(10)-f(0)
		=(f(10)-f(9))+(f(9)-f(8))+...+(f(1)-f(0))

This same principal can be applied to our volume integral across the surface! Instead of taking the value at the surface of the volume, we add up all the differences inside the volume. With

∇

(called nabla) adding up the changes along each dimension.

∇

=∂

, this will be explained in my upcoming linear algebra post. But basically, it acts very similar to a bunch of derivitives in each dimension added together because, well, it is. We get,

(

)

∫

∇

(

𝜌

(

)

𝜏

(

)

·ds

(1)

Now we make Z very very small, and compensate for the change in volume. This leaves us with,

	d dt𝜌(s)	=-∇(𝜌(s).𝜏(s))⋅ds/ds(2)
		=-∇(𝜌(s).𝜏(s))

And we can expand this, using the derivative product law.

𝜌

(

)

∇

𝜌

(

)

𝜏

(

)

⏠

⏣

⏡

⏣

⏢

Liouville's Theorem

1-𝜌

(

)

∇

𝜏

(

)

(3)

3.2. ConvergenceA converged model ceases to change over time. This occurs when the underlying probability distribution which defines the model is static, and therefore, we have;

𝜌

(

)

=0Using

(2)

we get,

∴-

∇

(

𝜌

(

)

𝜏

(

)

And with the expanded expression

(3)

∇

𝜌

(

)

𝜏

(

)

=𝜌

(

)

∇

𝜏

(

)

(4)

This also implies that when

∇

𝜏

(

)

=0, that convergence is achieved by

∇

𝜌

(

)

=0. In other words, a uniform distribution. We will often deal with systems which we know little about other than that they have converged. When dealing with systems in such an equilibrium, these expressions will be the starting block for figuring out 𝜌

(

)

values, and will prove useful.4. EntropyA model's entropy S

measures its self-uncertainty. In statistics, there is the concept of likelihood, which is the probability of a model producing a set of observations. The closer the set of observations approaches a typical sample from the model, the higher the likelihood. The likelihood of a model over samples drawn from itself is a measure of its inherent randomness. A low self-likelihood means samples drawn appear erratic even when the distribution is known. A high self-likelihood means we have a reasonable idea of what a typical sample looks like. Entropy is two operations away from likelihood. It takes a logarithm followed by negation, so high entropy means a chaotic system, while low entropy means a predictable one. For a continuous model Φ, the expression is,

=-∫𝜌

(

)

𝜌

(

)

.ds

This is closely related to ideas from information theory, the mathematics of messiness and lossiness. Ideas from information theory will be used to better understand some of the strange behaviours that entropy exhibits.4.1. Entropy of Models4.1.1. Quick Note on Poisson BracketsPoisson brackets, indicated by

⬜

, gives a short hand for averages according to the current model. In general, we have,

∫f

(

)

.𝜌

(

)

.ds

(

)

Meaning we can express entropy as,S

(

)

where

(

)

𝜌

(

)

4.1.2. Change in EntropyLet's try to figure out how entropy changes over time. The relationship between entropy and time is very, very important. It forms the basis of the second law of thermodynamics, which states entropy can never decrease. Fortunately, we can use a few mathematical tricks to get a relatively clean expression for the change of entropy over time. Starting with,

	d dtSΦ	=-d dt∫Φ𝜌(s).ln𝜌(s).ds
		=-∫Φ(d dt𝜌(s).ln𝜌(s)).ds1
		=-∫Φ(d dt𝜌(s))ln𝜌(s)+𝜌(s)(d dtln𝜌(s)).ds
		=-∫Φd𝜌(s) dt.ln𝜌(s)+𝜌(s).d𝜌(s) dt.1 𝜌(s)⏠⏣⏣⏣⏡⏣⏣⏣⏢Chain rule.ds
		=∫Φ(-d𝜌(s) dt).(ln𝜌(s)+1).ds

And substituting

(2)

∫

∇

(

𝜌

(

)

𝜏

(

)

(

𝜌

(

)

From the product rule, we have d

(

)

=u.dv+v.du, and therefore du.v=d

(

)

-u.dv. This same idea is applied to the multidimensional derivative

∇

. You can think of it like a sum of derivatives over multiple dimensions, because that's exactly what it is.

∫

∇

𝜌

(

)

𝜏

(

)

(

𝜌

(

)

(

𝜌

(

)

𝜏

(

)

∇

(

𝜌

(

)

.ds

Here we do the surface-volume integral trick from

(1)

in reverse. The same way a sum of small changes is equal to the total change, a sum of small changes across a region is the same as the difference in values around the edges. This gives us a surface integral around the bounds,

∫dΦ

(

𝜌

(

)

.𝜌

(

)

𝜏

(

)

(

)

⏠

⏣

⏡

⏣

⏢

at the bounds

∫

𝜌

(

)

𝜏

(

)

∇

𝜌

(

)

.ds

The bounds are defined as the volume containing all states, so movement through the bounds has to be zero. If it weren't, they wouldn't be the bounds! This leaves us with,

	d dtSΦ	=-∫Φ𝜌(s).𝜏(s)·∇ln𝜌(s).ds
		=<𝜏(s)·∇l(s)>(5)

This is one way of viewing the change in entropy, how aligned are evolutions

𝜏

(

)

with gradients in l

(

)

. This variable, l

(

)

𝜌

(

)

, related to the information-theoretic symbol length, has a very important physical analogue. Spoiler: it's a scalar multiple of energy and/or other conserved variables. The interpretation that energy represents limited resources capable of expression is somewhat justifiable. However, now is the time for maths, not philosophy, so starting at

(5)

we look at an alternative expression,

	d dtSΦ	=-∫Φ𝜌(s).𝜏(s)·∇𝜌(s) 𝜌(s).ds
		=-∫Φ𝜏(s)·∇𝜌(s).ds

Again we use the product rule trick, realising that u.dv=d

(

)

-v.du,

	d dtSΦ	=-∫Φ∇(𝜏(s).𝜌(s))-𝜌(s).∇𝜏(s).ds
		=-∫dΦ𝜌(s).𝜏(s)·dn(s)⏠⏣⏣⏣⏣⏡⏣⏣⏣⏣⏢=0 at the bounds+∫Φ𝜌(s).∇𝜏(s).ds
		=<∇𝜏(s)>(6)

So, the change in entropy is equivalent to the average of

∇

𝜏

(

)

. We will call

∇

𝜏

(

)

the feedback, a property of states, and it happens to be zero in Hamiltonian physics. This implies zero natural change in entropy for a perfectly evolving model. We can refine how various axioms are linked to each other to show an equivalence between the second law - entropy of physical models is non-decreasing - and feedback being zero everywhere.4.2. Equivalence of the Second Law and

∇

𝜏

(

)

=0This section of the work will quickly reveal how much of a logic nerd I am. I think its better to sometimes get the raw symbology out of the way, as paper is more efficient at storing working memory than a carbon-based neural network. It also has the added benefit of achieving one of the central goals of this work: showing that core thermodynamic theorems are equivalent to other constructs, revealing what makes them tick.First lets assume the second law,

⩾0, holds for all models. This implies that

∇

𝜏

(

)

⩾0 for all models, which naturally includes models tightly localised around any single point, so we get

∇

𝜏

(

)

⩾0 for all possible states. If we also assume that a converged model which includes all states inside dΦ is possible. In that case, we require that

∇

𝜏

(

)

=0 holds for that model, as convergence implies no property of the model changes. As no state can have negative feedback, all states within the converged model need to have

∇

𝜏

(

)

=0. Therefore, we get,

		∀Φ.d dtSΦ⩾0 and a converged model containing all states is possible
	⟹	∀s.∇𝜏(s)=0

On the other hand, we can show that,

(7)

∀s.

∇

𝜏

(

)

=0⟹

∇

𝜏

(

)

=0⟹

⩾0We showed in

(4)

that

∇

𝜏

(

)

=0 and a uniform distribution of states within a bound imply that convergence has occurred. A uniform distribution of states is possible if some sorts of bounds exists, so we get,

		∀s.∇𝜏(s)=0 and a bounds exists⟹
		∀Φ.d dtSΦ⩾0 and a converged model containing all states is possible

The pattern A⟹B and B⟹A is known by another name, equivalence! A⟺B is the same as saying two statements are the same.

		∀s.∇𝜏(s)=0 and a bounds exists⟺
		∀Φ.d dtSΦ⩾0 and a converged model containing all states is possible

A powerful implication of this is if

∇

𝜏

(

)

=0 and a bounds exists, then

⩾0, in addition to a few other key properties. A situation where any of the statements on the bottom row are false would imply that at least one of the statements on the top row is as well. Remember, equivalence! We can say that,∃Φ.

<0⟹∃s.

∇

𝜏

(

)

or a bounds does not exist

And in another way,∃Φ.

and a bounds exists

⟹∃s.

∇

𝜏

(

)

This solves for Maxwell's demon, and similar paradoxes that try to grapple with the second law. If someone makes the claim that entropy has decreased for some model, they are implying that

∇

𝜏

(

)

<0 holds for some specific state. If the model is bounded, which can be achieved by effectively placing it in a very large closed box, the situation would have to be breaking a physical law somewhere somehow by requiring negative feedback. Now consider the case of Maxwell's demon. In this case, a little devil opens and closes a tiny door so that Oxygen molecules are arranged on one side of a barrier and Nitrogens on another, decreasing the entropy of air. If the starting assumptions of the system are uniformly distributed and mixed molecules, the final system would have to be, on average, equally or more entropic. Otherwise, negative feedback would have had to occured for some state.5. Lossiness and Information TheoryBack in

(7)

we jumped past ␒S

=0 straight to ␒S

⩾0 in order to imitate the typical formulation of the second law. If we do not make this jump, we can show that in a perfectly evolving physical system,

		∀s.∇𝜏(s)=0 and a bounds exists.
	⟺	∀Φ.d dtSΦ=0, a converged model is possible, and a bounds exists.

␒S

⩾0 therefore comes from somewhere, but the requirement from physics that ∀s.

∇

𝜏

(

)

=0 isn't it. In fact, we haven't actually used much physics so far. Yes we've been describing physical systems, but with enough distance that a broad range of systems can have the above logic applied. The theorems so far haven't required knowledge of, say, Hamiltonian mechanics other than

∇

𝜏

(

)

=0, or conservation laws. A model Φ gives us a distribution of probabilities over states, 𝜌

(

)

. Over time, the states and the model evolves, giving some future 𝜌

(

)

. Different states in a model evolve in parallel, so if𝜌

(

)

(

𝜌

(

)

+𝜌

(

)

/2 for distributions 𝜌

,𝜌

, then we will have 𝜌

(

)

(

𝜌

(

)

+𝜌

(

)

/2. The evolution of a system is the same as the sum of the evolution of its constituent parts. This linear nature makes it a Markov chain, giving 𝜌

(

)

=∫T

(

)

⋅𝜌

(

)

⋅d

or 𝜌

(

)

(

)

·𝜌

(

)

. Ideally, our matrices T

(

)

are perfectly invertible just like Platonic ideal evolutions. Given some final state 𝜌

(

)

(

)

·𝜌

(

)

, we could deduce 𝜌

(

)

-1

(

)

·𝜌

(

)

. In practice, this is not the case. Each applied time evolution, other than for analytically reducible systems, has to incorporate some lossiness. This lossiness means an initial probability distribution does not evolve perfectly but rather picks up uncertainty over time. This can arise from computational errors, such as rounding and imperfect increments. Or, it can arise from imperfect analytical projections, such as assuming a gas stays uniformly distributed after being expanded.5.1. Rounding ErrorLet's consider a specific error, rounding. Suppose we let rounding collapse multiple continuous states into a single rounded state. In that case, we will find a reduction in entropy, as the space of outputs is smaller than the space of inputs. We effectively imported an assumption, that the system collapses into discrete values, for which we do not exactly have a justification. Instead, let's think of rounding as adding a random aspect to the model. It is arbitrary to run a model on x∈

[

0,L

]

instead of y=x+𝛿,y∈

[

𝛿,L+𝛿

]

. The 𝛿 gives a random seed for how rounding will impact future outcomes. Now, instead of the model evolving simply by 𝜌

(

)

(

)

·𝜌

(

)

, it evolves by 𝜌

(

)

=R·T

(

)

·𝜌

(

)

, where R gives a rounding error. As physical models should evolve independently of the shift 𝛿, our computer simulation incorporates an unacknowledged assumption into the model. Another way to think about it is that running a simulation only once with one choice of a shifting 𝛿 is not physically robust. We would get a more accurate idea by running 10 models with 𝛿=

{

0,0.1,0.2...0.9

}

. These 10 simulations, though divergent, acknowledge that rounding randomness has been included, and the chaotic fingerprints of the second law arise. This rounding error extends to any invariant, from a shift in positions as shown to a rotation in any dimension. In the case of gases, lifting a barrier or the like will technically result in a state which if played in reverse will eventually return all the molecules behind the now hypothetical barrier. However, this information is typically discarded in analysing such systems. As the information is lost, the gases are assumed to follow the converged model under their new limitations, and hence have a higher entropy.6. Conserved Properties, Stochastic Transfer and the Boltzmann DistributionSuppose all states in a model have some conserved quantity Q

(

)

and states with effectively random evolutions. This is the case for the most common application of thermodynamics, gases with incalculable collisions. The key to modelling these sorts of systems is to impose conservation on micro-transfers between variables, such that any overall sum of micro-transfers also maintains conservation. The randomness becomes irrelevant, and even in some way useful. Knowing that effectively anything can happen limits us to considering the sorts of rules which have to hold for everything. For some state

∈R

, we will say variables indexed by k and l are connected if micro-transfers between them are possible, which is to say one can randomly alter the other one by some amount. Let K be a set of tuples of all connected pairs, and w

(

)

be a weighting for

𝛿

(

)

. Letting Q

=∂Q/∂s

, we have

𝛿

(

)

be a micro-transfer equal to,

𝛿

(

)

=ˆx

(

)

-ˆx

(

)

We can rewrite any evolution map

𝜏

(

)

as,

𝜏

(

)

= ∑k,l∈Kw

(

)

𝛿

(

)

The significance of

𝛿

(

)

is that it conserves Q on a local level, and thus any sum of

𝛿

(

)

also maintain Q as constant, even if the weightings are effectively random. Mathematically, we can show this through,

	Q(s+dt. ∑k,l∈Kwkl(s)(ˆxkQl-ˆxlQk))	=Q(s)+dt. ∑k,l∈Kwkl(s)(QkQl-QlQk)
		=Q(s)+dt. ∑k,l∈Kwkl(s)(0)
		=Q(s)

This means our micro-transfers expression captures the possible continuous evolutions.6.1. Change in ExpectationAt convergence, models stop changing. This naturally implies expectations don't change. This gives us,

	d dt<f>	=∫Φf(s).d𝜌(s) dt.ds
	0	=∫Φf(s).∇(𝜌(s).𝜏(s)).ds
		=∫Φ∇(f(s).𝜌(s).𝜏(s))-∇f(s)·(𝜌(s).𝜏(s)).ds
	0	=∫dΦf(s).𝜌(s).𝜏(s)·dn(s)-∫Φ∇f(s)·𝜏(s).𝜌(s)ds

Now, we can analyse this with our micro-transfers method. To ensure the above holds for the effectively random

𝜏

(

)

we see in practice, we will analyse it for arbitrary

𝛿

(

)

values. So we get,

∫dΦf

(

)

.𝜌

(

)

𝛿

(

)

(

)

∫

∇

(

)

𝛿

(

)

.𝜌

(

)

(8)

6.2. Unbounded k,l and the Generating FunctionFirst, lets assume k,l are unbounded, or close enough that the probability that either is at their bounds is essentially zero, 𝜌

(

)

≈0 if

∈dΦ. This could describe the energy of a single particle out of millions or a single consumer in an economy. The small chance that a single particle has all the energy or that one consumer spends all the money is negligible. The below can be thought of as holding for some sufficiently large number of particles N. Looking at

(8)

, we get,

	0	=∫Φ∇f(s)·𝛿kl(s).𝜌(s)ds
		=<∇f·𝛿kl>
		=<fk,.Ql,-fl,.Qk,>
	<fk,.Ql,>	=<fl,.Qk,>

This then becomes the generating function. Replacing f with some specific value automatically generates different theorems regarding converged models.6.3. The Equipartition TheoremFirst, we substitute f

(

)

·s

to get;

	<∂ ∂sk(sk·sl).Ql,>	=<∂ ∂sl(sk·sl).Qk,>
	<sl.Ql,>	=<sk.Qk,>
	TQl	=TQk

This equilibrium T

is shared across k and l if they are connected for the conserved variable Q. Next, substituting f

(

)

(

)

we get,

We now also assume that s

and s

are independently distributed across the model. In addition, we assert that Q

is not dependent on s

, and Q

is dependent not on s

. This is an assumption of overall independence within the system, and in a sufficiently chaotic one two random particles should behave in such a way.

If any of the above averages are zero, for any of the connected components, we get,

(9)

This cascades across connected components. If

(9)

holds for any l and k, it also has to hold for any variable connected to either one. Finally, we see that we can express the general case, and overall we can state,

(10)

TQi		,i=j
0		,i≠j

For connected components i, j. We get the equipartition theorem in its typical form.6.4. Gravity in a BoxFor those familiar with thermodynamics, why we are going beyond the typical formulation of the equipartition theorem

(10)

can be explained by an example, a gas in a box. The distribution of gases in the atmosphere is given by the baric equation. Or;

𝜌

(

)

∝

exp

(

)

exp

(

mgh

)

Using Wolfram Mathematica, we can calculate the expected value of

h.E

h.mg

in an infinite atmosphere as,

	<h.mg>	=∞∫0hmgexp(-mgh kBT)dh ∞∫0exp(-mgh kBT)dh
		=(k2BT2 gm)(kBT gm)-1=kBT

Which gives the equipartition theorem, the movement of particles between 0 and ∞ adds a degree of freedom. Heating an atmosphere is about one degree of freedom more difficult than heating a closed box. However, in a closed box, we do not quite get the equipartition theorem.

	<h.mg>	=𝛽∫0hmgexp(-mgh kBT)dh 𝛽∫0exp(-mgh kBT)dh
		=(kBT(kBT-e-gm𝛽 kBT(kBT+gm𝛽)) gm)((1-e-gm𝛽 kBT)kBT gm)-1
		=kBT-gm𝛽 egm𝛽 kBT-1≠kBT(11)

So the presence of bounds seems to change the nature of the equipartition theorem. To understand the theorem mathematically, we need to be able to understand this case. It will take us on a bit of a journey, but ultimately, will show the Boltzmann distribution can be derived from the stochastic transfer assumption.6.5. Bounded l and the Boltzmann DistributionFrom

(8)

we got,

	0	=∫Φf(s).d dt𝜌(s)·ds
		=∫dΦf(s).𝜌(s).𝛿kl(s)·dn(s)-∫Φ∇f(s)·𝛿kl(s).𝜌(s)ds(12)

We will assume s

is bounded between 𝛽

↓

and 𝛽

↑

and s

is unbounded. We will also be assuming that their distributions are independent. This reduces

(12)

down to,

=𝛽

↑

∫𝛽

↓

(

)

.𝜌

(

)

𝛿

(

)

(

)

∫

∇

(

)

𝛿

(

)

.𝜌

(

)

(13)

We want to separate out

, as we know it to be equal to T

by the equipartition theorem. We do this by substituting f

(

)

(

)

, where g

(

)

is some continuous function.6.5.1. Breaking down the boundsAt the boundaries of s

, the first part of

(13)

breaks down into,

𝛽

↑

∫𝛽

↓

(

)

.𝜌

(

)

𝛿

(

)

(

)

=𝛽

↑

∫𝛽

↓

(

)

.𝜌

(

)

(

)

.ds

𝛿

(

)

(

)

at the bounds is the movement of

𝛿

(

)

against the bounds, which gives Q

(

)

, the scale of movement in the bounded k direction, by the "area" of a volume unit with k removed. We will be generously using the independence of s

and s

, which allows us to expand E

[

𝜙

(

)

𝜓

(

)

]

[

𝜙

(

)

]

[

𝜓

(

)

]

for any functions 𝜙 and 𝜓.

	𝛽↑k∫𝛽↓ksl.Ql(s).g(sk).𝜌(s).ds\k	=𝛽↑k∫𝛽↓ksl.Ql(s).𝜌(s).ds\k⏠⏣⏣⏣⏣⏣⏡⏣⏣⏣⏣⏣⏢Split expectations
		=<sl.Ql(s) \| sk=𝛽↑k or sk=𝛽↓k⏠⏣⏣⏣⏣⏡⏣⏣⏣⏣⏢Independent to sl>.𝛽↑k∫𝛽↓kg(sk).𝜌(s).ds\k⏠⏣⏣⏣⏣⏡⏣⏣⏣⏣⏢Surface integral at both extremes
		=<sl.Ql(s)>.[g(sk).𝜌sk(sk)]𝛽↑k𝛽↓k
		=TQk.[g(sk).𝜌sk(sk)]𝛽↑k𝛽↓k(14)

6.5.2. Breaking down the interiorLooking at second part of

(13)

, and treating s

and s

as independent variables in the model, we get;

∫

∇

(

)

𝛿

(

)

.𝜌

(

)

∫

.g'

(

)

(

)

(

)

-Q

(

)

.𝜌

(

)

∫

(

)

(

)

.𝜌

(

)

-g

(

)

(

)

.𝜌

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(15)

6.5.3. Bringing it togetherCombining

(14)

and

(15)

we get,

	TQk.[g(sk).𝜌sk(sk)]𝛽↑k𝛽↓k	=<g'(sk)>.TQk-<g(sk).Qk,(s)>
	<g(sk).Qk,(s)>	=TQk.(<g'(sk)>-[g(sk).𝜌sk(sk)]𝛽↑k𝛽↓k)(16)

Letting g

(

)

=x, we get,

(

)

(

[

.𝜌

(

)

]

𝛽

↑

𝛽

↓

)

(17)

Which explains

(11)

! However, we can do a slight expansion to get Boltzmann's distribution. One way is to let g

(

)

=𝛿

(

x-𝛽

)

, the Kronecker delta. However, we can use a perhaps less gimmicky method with the fundamental lemma of the calculus of variations (what a long and hence terrible name). I'll show you how it works in a second. Looking at

(16)

we have,

	0	=TQk.(𝛽↑k∫𝛽↓k(d dskg(sk))𝜌sk(sk)-(d dskg(sk).𝜌sk(sk))⏠⏣⏣⏣⏣⏡⏣⏣⏣⏣⏢From [g(sk).𝜌sk(sk)]𝛽↑k𝛽↓kdsk)-𝛽↑k∫𝛽↓kg(sk).Qk,(s).𝜌sk(sk).dsk
		=𝛽↑k∫𝛽↓k-TQk.g(sk).𝜌'sk(sk)-g(sk).Qk,(s).𝜌sk(sk).dsk
	0	=𝛽↑k∫𝛽↓kg(sk).(TQk.𝜌'sk(sk)+Qk,(s).𝜌sk(sk)).dsk

Now let's apply math's worst named theorem. As the right hand side is zero, independent of the choice of g

(

)

, we need to have,

0	=(TQk.𝜌'sk(sk)+Qk,(s).𝜌sk(sk))

This is the only way to ensure the right hand side is zero no matter what we choose for g

(

)

. Rearranging this expression we can derive,

	0	=(TQk.𝜌'sk(sk)+Qk,(s).𝜌sk(sk))
	𝜌'sk(sk)	=-Qk,(s) TQk.𝜌sk(sk)(18)

This suggests to us that 𝜌

(

)

has some sort of exponential form with regards to Q

(

)

. We will define q

(

)

as the contribution of k to Q

(

)

, such that Q

(

)

(

)

. Typically, this q

(

)

value should only be a function of s

	∴𝜌sk(sk)	=A0.exp(-qk(s) TQk), where qk(s) is the contribution of k to Q(s)(19)
	l(sk)	=C0+qk(s) TQk(20)

Giving the Boltzmann distribution.7. Changing Bounds and EntropySo far, we have only considered internal or endogenous evolution maps which conserve properties. Often we want to look at what happens when bounds change, or when an external force is applied to a system. Let some evolution map on states be dictated by

𝜈, which is distinct from

𝜏, and has no issue not leaving conserved properties Q intact.Letting the path of state under an exogenous change be dictated by the map

(

)

𝜈

(

)

, we have S

(

)

evolve similary to

(5)

and

(6)

, giving,

(

)

𝜈·

∇

𝜈

Rearranging

(18)

, we can get

∇

(

)

. If all states are connected so that there is only one T

value, we have,

(

)

𝜈·

∇

(

)

∇

𝜈

The change in Q

(

)

for some specific state from applying

𝜈 is given by

𝜈·

∇

(

)

, the direction of change times how much Q

(

)

changes in that direction. We will assume this is constant for all states in the model, so that we only have to deal with one Q

(

)

value.

	d dvQ(s)	=𝜈·∇Q(s)
	1 TQd dvQ(s)	=<𝜈·∇Q(s) TQ>
	d dvSΦ(v)	=1 TQd dvQ(s)=<∇𝜈>

If multiple conserved variables exist, each one contributes to the change in entropy. We can express this as,

= ∑i

(21)

7.1. A Note on Reversibility

(21)

refers to a natural evolution, where the system evolves according to deterministic laws applied on scale. However, entropy can also change because of the factors explained in Section 5. Changes due to lost information, such as rounding error, are irreversible and can cause entropy to change without carefully exchanging quantities between conserved variables.

(21)

refers to entropy changes under a managed situation, and this caveat applies to all expressions derived from it. More accurately, we could write

(21)

as,

=dS

lost

+ ∑i

(22)

7.2. Volume as a Conserved QuantityVolume and heat are the two conserved variables of particular interest for this work. To perform an expansion of volume, we will map every particle along some dimension from x

(

1+dV/V

)

. This gives an evolution

𝜈=

and feedback,

	<∇𝜈(s)>	=N∑i=1⏠⏣⏣⏡⏣⏣⏢Particles∂ ∂xi(dV Vxi)
		=N∑i=1(dV V)
	dSΦ	=N.dV V(23)

Alternatively, we can derive volume using density. To do this, we will assign to each particle an occupancy w

, the volume of space to which it is the closest particle, such that the sum of occupancies gives V. The relevant s

variables are occupancies of different particles, and V

(

)

=1. The occupancies are effectively unbounded as V and N become large.

	<sk.Qk,(s)>	=<wk.1>
	<wk>	=TV
	N∑k=1<wk>	=N.TV
	V N	=TV(24)

Using

(21)

, we can more directly get

(23)

∂

∂V

(25)

Combining the energy and volume expressions we get,

(

)

N.dV

(26)

7.3. The Ideal Gas Law and the First LawRearranging

(26)

, we get a different expression for dU

(

)

. This gives us the first law of thermodynamics, stated as;

	dU(s)	=TU.dSΦ⏠⏣⏣⏡⏣⏣⏢"Heat"-TU.N V.dV⏠⏣⏣⏣⏡⏣⏣⏣⏢"Work"
		"=dQ-p.dV"

We will let

=p, for "pressure". We can show that,

p.V

Which is the ideal gas law. Setting dS

=0 to analyse a zero entropy natural expansion we see that pressure gives the marginal energy of a naturally changing volume of gas.

	dU(s)	=TU.0-TU.N V.dV
	dU(s) dV	=-TU.N V=-p

The significance of these derivations is not in what we have done but rather what we haven't. There has been no mention of Hamiltonian mechanics, other than

∇

𝜏

=0, or of

, or arguments involving particles hitting the sides of chambers. An implication of this is that the ideal gas law is independent of

, such as in the case of relativistic physics. Both the ideal gas law and the first law of thermodynamics can be derived with little physics. The only necessary assumptions so far are that large chaotic arrangements can be modelled with stochastic transfers with conserved U and V, that endogenous feedback

∇

𝜏 is zero, and that external changes lead to a change in U and V which is constant for all states. This imposes a certain relationship between energy and volume.7.4. Generalising PressureThe above derivation is only for a particular case. It is common in thermodynamics for Van der Waals forces and other forms of pressure to arise. This requires an explanation and a more general way of viewing pressure. Pressure is perhaps best thought of as the volumetric density of energy, the sort of latent expansive power from altering the volume of a substance by some amount. So dU/dV will be our starting point. This allows us to think of pressure of conserved variables other than volume. Volume is ultimately just a conserved variable of a system, but others might arise in future investigations, and we will define pressure for them as dU/dQ

. A natural evolution with zero entropy change can be taken from

(21)

,dS

= ∑i

If we set this to zero and consider only two conserved variables, U and Q

𝛼

, we get,

𝛼

And from this we get the pressure ratio,

(27)

𝛼

=-p

𝛼

For volumetric pressure we write,-

This is pretty neat! Esepcially if we consider that T

values have not been obviously intuitive for things other than energy. We can rewrite some previous values, including T

, in terms of their natural pressures p

	TQi	=TU pQi
		=TUdQi dU

When dealing with a large number of conserved variables, we'd often start dealing with partial derivitives and their associated... difficulties. From

(21)

, we write,

∂S

∂Q

This also allows us to rewrite

(22)

into what many would consider a more recognisable and standard form,

=dS

loss

+ ∑i\

{

}

(28)

7.5. Van der Waals ForceThe Van der Waals force is an attempt to have a more precise form of pressure than the ideal gas law for certain applications. In many cases, it has nicer experimental validity than the standard ideal gas law. Fortunately, it can be readily integrated into the system we have built up so far. To do this we will look at a form of a Van der Waals equation of state and then see how we can use our previous equations to mathematically examine it. The ideal gas law states that,

N.T

=p.V

However, the attractive force between particles means a small factor reduces pressure as more are added. A convenient way to represent this is to reduce pressure by a quadratic factor a

(

N/V

)

→

p'-a

(

)

So we have to compensate our expression for pressure in the ideal gas law by writing, N.T=

(

p+a

(

)

.V Furthermore, V is limited by a sort of minimal amount that each particle occupies. Crowding out naturally leaves less free volume for particles to take up, so we instead use the free volume V-nb in the ideal gas law,

(29)

N.T=

(

p+a

(

)

(

V-Nb

)

This form of the ideal gas law is rather complex, but we're going to try find a way to incorporate it into our current model! We sort of know what pressure means, so if we can isolate it, we can link the Van der Waals equation of state to the rest of our equations.

	N.T	=(p+a(N V)2).(V-Nb)
	N.T V-nb	=p+a(N V)2
	p	=N.T V-Nb-a(N V)2(30)

The equation for entropy from

(28)

links changes in entropy to pressure,

p.dV

Substituting in the Van der Waals pressure equation

(30)

	dSΦ	=dU T+N.T V-NbdV T-a(N V)2dV T
		=dU T-a(N V)2dV T+N V-NbdV(31)

First, let's try to reinterpret what volume means in this case. We can integrate the volume term here to get an expression for entropy from volume;

(

V-Nb

)

The entropy per particle, or how unknown its position is, is given by,

(

V-Nb

)

Which is the same entropy as we'd get from a particle being uniformly distributed in a volume of space V-Nb. Basically, particles' minimum occupancy reduces the flexibility of their position, and this is represented in our equations. We can do a quick sanity check for

(17)

. It would be rather embarrassing if this version of things, where we employ a minimum occupancy of b, somehow breaks our previous equations!

	<sk.Qk,(s)>	=TQk.(1-[sk.𝜌sk(sk)]𝛽↑k𝛽↓k)
	<wk.∂ ∂wkV>	=TV.(1-[wk.𝜌wk(sk)]∞b)
	<wk.1>⏠⏣⏣⏡⏣⏣⏢Average occupancy	=TV.(1+b.e-b/TV TV.e-b/TV)
	∞∫bx.e-x/TV TV.e-b/TV.dx	=TV.(1+b TV)
	TV+b	=TV+b

Which shockingly... works! The left hand side, being the average occupancy, is equal to V/N, the average volume which is closest to any particle. This gives,

-b

Or, the free volume per particle. The first two components of

(31)

lead to another interesting idea. If we look at the following,

-a

(

)

We see a similar factor of 1/T, indicating these two might be related. We know that attraction, which the Van der Waals force represents, come from a type of energy so this isn't too surprising. However, the dU and dV being distinct presents some difficulties. To overcome this, we present a contribution to energy from volume, so that U

now represents energy contributions independent of volume, and A

(

)

is an attraction energy. This means we can rewrite, dS

(

)

V-Nb

dVAs,dS

V-Nb

dVLooking at

(31)

-a

(

)

V-Nb

We deduce that we have,

=-a

(

)

Integrating to derive A we can get,

	A	=aN2 V
	A N⏦Attraction energy per particle	=a(N V)⏠⏣⏣⏡⏣⏣⏢Density of particles

Which gives an expression for energy of attraction! A beneficial result is that any equation of net attraction for a body of gas (or perhaps other can be integrated into our model. Experimental analysis of pressure responses can be used to derive the total force of attraction for gases, and perhaps even liquids and solids. Whether this gives a measure of latent energy would be an interesting experiment to run.8. CapacityHow does a conserved variable Q relate to its T

value? We will start with the standard case. In

(17)

we showed that,

(

)

(

[

.𝜌

(

)

]

𝛽

↑

𝛽

↓

)

⏠

⏣

⏡

⏣

⏢

Adjustment factor

The standard case has Q

(

)

composed of independent functions, each of which is simply a scalar 𝜔

times a variable s

to a power n

. Additionally, the bounds are infinite in this standard case, so the adjustment factor is simply 1. This gives us,

	qi(s)	=𝜔isnii
	Q(s)	= ∑iqi(s)= ∑i𝜔isnii

We plug in the equipartition theorem to relate T

to Q. For an individual variable we have,

	∴sk.Qk,(s)	=nk𝜔ksnkk
		=nkqi(s)

This gives a nice expression for the average energy contribution of any dimension by,

	<sk.Qk,(s)>	=TQ=nk<qk(s)>
	∴TQ1 nk	=<qk(s)>

And finally, we relate this to the total energy offering,

∴T

∑i

⏦

𝛾

= ∑i

(

)

(

)

We let ∑in

-1

=Γ

, the capacity for Q

(

)

in the standard case. We can plug Γ

(

)

into the equation for changing entropy

(22)

. This lets us relate capacity, entropy, and the conserved quantity. Letting Q

refer to the ith conserved quantity;

	dSΦ	=dSloss+ ∑i1 TQidQi
	∂ ∂QiSΦ	=1 TQi=ΓQi Q(s)
	𝛥SΦ	=ΓQi𝛥lnQi(s)

We can also get,

	dSΦ	=dSloss+ ∑i1 TQid(TQiΓQi)
		=dSloss+ ∑iΓQi TQidTQi
	𝛥SΦ	=ΓQi𝛥lnTQi

These are interesting results! If the ratio between a conserved quantity Q and its T

value is constant, changes in entropy are closely related to the capacity of that conserved variable. For gases, we often have some reasonable estimate of the Γ

ratio for both energy and volume. Though as shown in Section 7.5., this can vary depending on the current properties of the model use. Deriving these Γ

values for gases, we get energy as a sum over energetic degrees of freedom per particle, so we have,

	∑ip2i 2m	=U
	∴ni	=2
	ΓU	=Degree of Freedom∑1 2
		=F/2, where F are the total degrees of freedom.
		=Nf/2, where f are the average degrees of freedom per particle.

For volume, we again treat occupancy as the constituent variable of the model

(24)

	∑iwi	=V
	∴ni	=1
	ΓV	=Particle∑1
		=N

Therefore, over a range where Γ

and Γ

are roughly constant, we get,

	dSΦ	=N(f 2dlnU+dlnV)
	𝛥SΦ	=N(f 2𝛥lnT+𝛥ln(V N))

9. ConclusionThis work offered a look into thermodynamics that emphasises a statistical view over a physical one. This gives alternative derivations but ultimately the same theorems and shows which axioms the theorems ultimately rely on. States and models are defined distinctly, with models giving a probability distribution over states which are specific, deterministically evolving situations. This provides a robust form of the second law, efficiently overcoming problems like Maxwell's demon. Furthermore, it shows what I believe is a fascinating link between the second law and mechanics' non-divergence of time. There is further work to be done in this direction. For instance, bringing in the broken typewriter example from information theory to better explain entropy increases due to lossiness. The current unitless presentation effectively assumes k

≡1, which hides how all the equations are related to SI units. The difference between canonical, microcanonical, and grand canonical systems can be elaborated without simply repeating that "N is large". Finally, as with all projects, I would have written a shorter paper if I had more time. Especially the derivation of Boltzmann's distribution can be reduced. I believe an entropy maximisation approach akin to that used for canonical functions in statistics would work well.

The first half of the expansion is related to Liouville's theorem, a known result in thermodynamics. The second half of the expansion is not seen in the typical formulation of Liouville's theorem, showing how that theorem is dependent on the assumption that

∇

𝜏

(

)

=0.