Methods Inf Med 2006; 45(02): 173-179
DOI: 10.1055/s-0038-1634063
Original Article
Schattauer GmbH

On Genetic Information, Diversity and Distance

J. Zvárová
1   European Center for Medical Informatics, Statistics and Epidemiology, Institute of Computer Science AS CR, Prague, Czech Republic
,
I. Vajda
1   European Center for Medical Informatics, Statistics and Epidemiology, Institute of Computer Science AS CR, Prague, Czech Republic
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: General information-theoretic concepts such as f-divergence, f-information and f-entropy are applied to the genetic models where genes are characterized by randomly distributed alleles. The paper thus presents an information-theoretic background for measuring genetic distances between populations, genetic information in various observations on individuals about their alleles and, finally, genetic diversities in various populations.

Methods: Genetic distances were derived as divergences between frequencies of alleles representing a gene in two different populations. Genetic information was derived as a measure of statistical association between the observations taken on individuals and the alleles of these individuals. Genetic diversities were derived from divergences and information.

Results: The concept of genetic f-information introduced in the paper seems to be new. We show that the measures of genetic distance and diversity used in the previous literature are special cases of the genetic f-divergence and f-diversity introduced in the paper and illustrated by examples. We also display intimate connections between the genetic f-information and the genetic f-divergence on one side and genetic f-diversity on the other side. The examples at the same time also illustrate practical computations and applications of the important concepts of quantitative genetics introduced in the paper.

Conclusions: We discussed a general class of f-divergence measures that are suitable measures of genetic distance between populations characterized by concrete frequencies of alleles. We have shown that a wide class of genetic information, called f-information, can be obtained from f-divergences and that a wide class of measures of genetic diversity, called f-diversities, can be obtained from the f-divergences and f-information.

 
  • References

  • 1 Bhattacharyya A. On some analogues to the amount of information and their uses in statistical estimation. Sankhya 1946; 8: 1-14.
  • 2 Chakraborty R, Rao CR. Measurement of genetic variation for evolutionary studies. Statistical and Medical Sciences. Elsevier Science Publ 1991; 271-316.
  • 3 Cover TM, Thomas JA. Elements of Information Theory. New York:: Wiley; 1991
  • 4 Havrda S, Charvát F. Concept of structural α-entropy. Kybernetika 1967; 3: 30-5.
  • 5 Kafka P, Österreicher F, Vincze I. On powers of divergences defining a distance. Studia Sci Math Hungar 1991; 26: 415-22.
  • 6 Le Cam L. Asymptotic Methods in Statistical Decision Theory. New York:: Springer; 1986
  • 7 Lewontin RC. The apportionment of human diversity. Evol Biol 1972; 6: 381-9.
  • 8 Liese F, Vajda I. Convex Statistical Distances. Leipzig:: Teubner Verlag; 1987
  • 9 Magurran AE. Ecological diversity and its measurement. Princeton:: University Press; 1988
  • 10 Morales D, Pardo L, Vajda I. Uncertainty of discrete stochastic systems. IEEE Trans On System, Man and Cybernetics, Part A: Systems and Humans 1996; 26: 681-97.
  • 11 Österreicher F, Vajda I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann Inst Statist Math 2003; 55: 639-53.
  • 12 Puri ML, Vincze I. Measures of Information and Contiguity. Statist Probab Letters 1990; 9: 223-8.
  • 13 Rao CR. Diversity and dissimilarity coefficients. A unified upproach. Theor Pop Biol 1982; 21: 24-43.
  • 14 Read RC, Cressie NAC. Goodeness-of-fit Statistics for Discrete Multivariate Data. New York:: Springer; 1988
  • 15 Sanghvi LD. Comparison of genetical and morphological methods for a study of biological difference. Amer J Phys Antrop 1953; 11: 385-404.
  • 16 Sibson R. Information radius. Zeitschr Wahrsch verw Geb 1969; 14: 149-60.
  • 17 Simpson EH. Measurement of diversity. Nature 1949; 163-688.
  • 18 Vajda I. Theory of Statistical Inference and Information. Boston:: Kluwer; 1989
  • 19 Zvárová J. Asymptotická distribuce výbérové informacní míry závislosti. Kybernetika 1969; 01 (05) 50-9.
  • 20 Zvárová J. On asymptotic behaviour of a sample estimator of Rényi’s information of order α. Transactions of 6th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes Prague:: Academia; 1973: 919-24.
  • 21 Zvárová J. On measures of statistical dependence. Casopis pro pestování matematiky 1974; 99: 15-29.
  • 22 Zvárová J. On medical informatics structure. Int Journal of Medical Informatics 1997; 44: 75-81.
  • 23 Zvárová J. Information Measures of Stochastic Dependence and Diversity: Theory and Medical Informatics Applications. Dissertation for Doctor of Sciences Degree, Academy of Sciences of the Czech Republic Institute of Computer Science: Prague; 1998
  • 24 Zvárová J, Mazura I. Stochastic Genetics (in Czech). Charles University, Karolinum: Prague; 2001