Earlier posts (in Russian):
- Shannon entropy and data compression. Part 1 (theory)
- Shannon etropy and data compression. Part 2 (example)
- Kolmoorov-Levin structures (theory)
- The notion of reproducible code (theory)
- On specifications and cryptographic hashing. Part 1 (theory)
- On specification and cryptographic hashing. Part 2 (example)
Discussion
From Table 3 in the previous post the following can be seen:
- Machine code is algorithmically simple (with a relatively low Kolmogorov complexity K) because the program length is a fixed value, independent of the data size (the size of an image file supplied as input to the program can be very large). When estimating the length of the shortest program implementing the SHA-256 algorithm, the compiler adds a term Ccomp, independent of the data; this is the computational overhead for optimizing the code executed by the operating system. Kolmogorov complexity of the program code is low, although the program code itself as a sequence of symbols is statistically complex, as evidenced by the relatively high value of its Shannon entropy.
- Note also that the entropy of machine code is higher than that of C code. This is expected, since machine code should have less structural redundancy than human-readable programming languages
- It is also worth noting that the entropy of program code, regardless of the implementation language, never reaches its maximum: a language invariably contains redundancy and ambiguity. Any linguist will tell you this. It was not our goal to prove this in general. Here we can only confirm it based on the results of our study.
- Even a relatively simple algorithm for computing a hash for a data file has both a relatively high Shannon entropy and a relatively low Kolmogorov complexity, and thus passes the design test.
- At the same time, our data isn't classified as designed. This is also expected, as the data in both files represents the results of random processes: paint drips on paper and black-and-white image noise.
Randomness is constantly manifested in nature. Specificity (regularity) is also constantly manifested. However, in naturally generated configurations of matter, randomness and specificity represent the two extremes of a spectrum of the effects of unintelligent causation (see Table 1):
- Descriptions of random weakly compressible configurations of matter are characterized by high complexity and low specificity;
- Regular structures are characterized by low complexity (they are highly compressible) and high specificity;
In contrast, the signature of specified complexity:
complexity (relatively high Shannon entropy) + specificity (relatively low Kolmogorov complexity)
is a reliable practical classification rule for design.
Both randomness and natural regularity are inert to pragmatics (function). Natural processes are undirected and do not have a pragmatic purpose. Natural selection, which our opponents often refer to as a counter-example to the design detection rule we are discussing here, operates on already existing functional phenotypes. Evolution does not select for a future function. From our study, it is clear that the Shannon information model is not an adequate means to highlight the specificity of biological functions. From the examples we have considered, it is clear that both natural regularity and randomness are killers of functional information. Indeed, it is impossible to encode a complex function using only random and regular strings: the former will exhibit high complexity and low specificity, while the latter are massively redundant, which corresponds to low complexity and high specificity. Functional strings, on the other hand, exhibit high Shannon entropy and high specificity (low Kolmogorov complexity) at the same time. Such structures in practice reliably point to design.
| Configurations of matter | Description Metric | Complexity | Specificity | Class | |
| Shannon entropy |
Kolmogorov complexity |
||||
| Liquids, gases | High | High | High | Low | Not a design |
| Graphical, audio, text noise | |||||
| Crystalls | Low | Low | Low | High | |
| Interference patterns, convection pattenrs, regular strings | |||||
| Literature, technical prose | Medium | Medium | Design | ||
| Byte code (executable files) | High | High | |||
| Protein coding parts of DNA molecules | |||||
| Functional parts of primary protein structures | |||||
Table 1. Characteristics of configurations of matter
( Read more... )