Size distribution of human DNA encoding via information theory
Statistical Mechanics, Information Entropy, DNA.
We analyze the coding sequence for the Homo Sapiens via a model that naturally embraces correlations among the
bases in DNA sequences of living organisms. The model is based on the Shannon entropy's optimization, which is the core of all statistical arguments. As a result, we propose the double-exponential law distribution function of the length of DNA measured in base pairs (bp). The results show that the Short-Range-Correlations (SRC), always present in coding DNA sequences, are appropriately captured through the double-exponential distribution and adequately describes the cumulative length distribution of DNA bases. Based on this model, we use an Empirical cumulative distribution function and the database of proteins compiled by the Ensembl Project to show consistency with the data.