@androcat @Ntropic 0 days since having a discussion on the meaning of #entropy 🙂
And a discussion that compares the Boltzmann and Von Neumann/Shannon interpretations would drag me here for hours (and after days of both scientific and philosophical thought exercize over their implications I still can’t come up with a definition that satisfies both the “thermodynamic” and the “information” interpretation).
Let me clarify what I mean by “low/high entropy of a closed system” in this context with a better (but also abused) analogy when we talk about entropy: the coffee and milk case.
It’s an old adage taught to kids from an early age that when you mix coffee and milk you start with a “low entropy” system (coffee and milk as separate states with their own boundaries) and end up with a “high entropy” system (coffee and milk mixed together).
The underlying concept in this definition of entropy is that of “reversibility” of a reaction. Not necessarily the amount of information or independent variables required to accurately model the system. I like this definition better because the principle of least action and the arrow of time just emerge as naturally corollaries of it.
It’s basically much easier to start with a system where the two components are separate and end up with a system where they are mixed than the other way around. This is a definition that even a kid can understand without delving into the logarithmic growth of information states. As @androcat intuitively but eloquently put it:
When a recursive system eats its own shit, it’s not leaking information to some other system, it’s just overwriting it through averaging.
This is exactly what I mean by “high entropy” state in this context.
You can’t get the original coffee grains out of your latte unless you apply some serious molecular wizardy (which btw would just transfer entropy from your cup to whatever machinery you’re using to do your molecular distillation, which is where the concept of “what are the boundaries of your system, where you need to put your entropic probes” becomes important).
Similarly, you can’t get the original girlfriend meme picture starting from the final “three asexual and featureless abuelas on a black background” picture unless you consume all the energy resources of the planet to do a backwards search through all the possible permutations of stable diffusion that may have led to that outcome (and maybe adding a huge additional adversarial network on top to tell how “realistic” each prediction is).
According to a strictly “informational” definition, however, the entropy of the final system in both cases (latte and three asexual abuelas) may actually be low. After all, we don’t need much information to model a low-variance system that converged onto a local minimum: it’s just the average value.
But the arrow of time of events that led to that state matters here: was the system in that state because the latte or the three abuelas were already in that averaged out state since the moment of the big bang, or is it the output of a system that we can reasonably approximate as a closed one (your coffee cup or a specific snapshot of weights saved on an OpenAI server) which recursively ate back its own shit through a tight feedback loop and simply ended up with a very low-variance normal distribution?
This is the conflict point I see between the two dominant interpretations. And I see it mainly as a problem on the “information” interpretation, because by focusing on “how many independent variables do I need to accurately describe the state of the system?” it settles for a stateless definition that misses out the dimension of time (and reversibility).