Encrypted audio dataset based on the Collatz conjecture

Diego Renza; Sebastian Mendoza; Dora M Ballesteros L

doi:10.1016/j.dib.2019.104537

Encrypted audio dataset based on the Collatz conjecture

Data Brief. 2019 Sep 17:26:104537. doi: 10.1016/j.dib.2019.104537. eCollection 2019 Oct.

Authors

Diego Renza¹, Sebastian Mendoza¹, Dora M Ballesteros L¹

Affiliation

¹ Universidad Militar Nueva Granada, Colombia.

Abstract

In information security, one way to keep a secret content is through encryption. The objective is to alter the content so that it is not intelligible, and therefore only the intended user can reveal the secret content. With the aim to provide examples of encrypted audio data, we applied a novel method of encryption based on the Collatz conjecture in five hundred speech recordings (50 speakers, 10 different messages), and then five hundred encrypted audio files were obtained. The main characteristics of our encrypted recordings are as follows: the spectrogram is quasi-uniform, histograms have a repetitive pattern, average of samples is around -0.4, standard deviation is around 0.55; Shannon entropy is around 7.5 (for 8-bits per sample). The novelty of the results consists in obtaining a completely different behavior than natural speech recordings, i.e.: spectrogram with higher energy in low frequencies, histogram with Gaussian behavior, average of samples around 0, standard deviation around 0.11, entropy around 5.5. A more comprehensive analysis of our encrypted signals may be obtained from the article "High-uncertainty audio signal encryption based on the Collatz conjecture" in the Journal of Information Security and Applications.

Keywords: Audio; Collatz conjecture; Cryptanalysis; Dataset; Encryption; Privacy; Security.