Comparison of compression-based measures with application to the evolution of primate genomes

Diogo Pratas*, Raquel M. Silva, Armando J. Pinho

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and acrossDNAsequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA.
Original languageEnglish
Article number393
JournalEntropy
Volume20
Issue number6
DOIs
Publication statusPublished - 1 Jun 2018
Externally publishedYes

Keywords

  • Data compression
  • DNA sequences
  • NCD
  • NRC
  • Primate evolution

Fingerprint

Dive into the research topics of 'Comparison of compression-based measures with application to the evolution of primate genomes'. Together they form a unique fingerprint.

Cite this