Musical exerpt: Slash - Anastasia, Released: 2012, Album: Apocalyptic Love
# Imports
import torch
import torchaudio
import IPython.display as ipd
# Load audio files
audio_wav, sr_wav = torchaudio.load('audio_original.wav')
audio_mp3, sr_mp3 = torchaudio.load('audio_mp3_128k.wav')
audio_quantized, sr_wav = torchaudio.load('audio_quantized.wav')
# Playback
print('Example Signal Orignal (PCM 16-bit 44.1kHz)')
display(ipd.Audio(audio_wav,rate=sr_wav))
print('Example Signal MP3 128k')
display(ipd.Audio(audio_mp3,rate=sr_mp3))
print('Example Signal Quantized (choosen quantization factor)')
display(ipd.Audio(audio_quantized,rate=sr_wav))
Example Signal Orignal (PCM 16-bit 44.1kHz)
Example Signal MP3 128k
Example Signal Quantized (choosen quantization factor)
# MSE Loss
loss_mse = torch.nn.MSELoss()
mse_mp3_original = loss_mse(audio_mp3,audio_wav)
print('MSE Loss (mp3 and original):', mse_mp3_original*100)
mse_quant_original = loss_mse(audio_quantized,audio_wav)
print('MSE Loss (quanitzed and original):', mse_quant_original*100)
MSE Loss (mp3 and original): tensor(5.8766) MSE Loss (quanitzed and original): tensor(1.3066)
Observe:
Reference:
Schuller, G. (2020). Filter Banks and Audio Coding. Springer International Publishing. https://doi.org/10.1007/978-3-030-51249-1
# Load pre-filtered audio files
audio_wav_pref, sr_wav = torchaudio.load('audio_originalpref.wav')
audio_mp3_pref, sr_mp3 = torchaudio.load('audio_mp3_128kpref.wav')
audio_quantized_pref, sr_wav = torchaudio.load('audio_quantizedpref.wav')
# Pre-Filtering + MSE Loss
loss_mse = torch.nn.MSELoss()
mse_mp3_original = loss_mse(audio_mp3_pref[0,:],audio_wav_pref[0,:])
print('Pre-Filtering + MSE Loss mp3:', mse_mp3_original.numpy()*10000)
mse_quant_original = loss_mse(audio_quantized_pref,audio_wav_pref)
print('Pre-Filtering + MSE Loss Quanitzed:', mse_quant_original.numpy()*10000)
Pre-Filtering + MSE Loss mp3: 1.00347948318813 Pre-Filtering + MSE Loss Quanitzed: 1.4562977594323456
Observe:
Reference:
Rabiner, L. and Juang, B., 1993. Fundamentals of speech recognition. Englewood Cliffs, N.J.: PTR Prentice Hall.
from lsd_loss import LSDLoss
loss_lsd = LSDLoss()
lsd_mp3_original = loss_lsd(audio_mp3[0,:],audio_wav[0,:])
print('LSD Loss mp3:', lsd_mp3_original)
lsd_quant_original = loss_lsd(audio_quantized[0,:],audio_wav[0,:])
print('LSD Loss Quanitzed:', lsd_quant_original)
LSD Loss mp3: tensor(0.9744) LSD Loss Quanitzed: tensor(1.9903)
Observe:
Reference:
Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, & Adam Roberts (2020). DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations.
from asteroid.losses import SingleSrcMultiScaleSpectral
loss_multiScaleSpectral = SingleSrcMultiScaleSpectral()
multiScale_mp3_original = loss_multiScaleSpectral(audio_mp3_pref,audio_wav_pref)
print('Multi Scale Spectral Loss mp3:', multiScale_mp3_original.numpy()/1000000)
multiScale_quant_original = loss_multiScaleSpectral(audio_quantized_pref,audio_wav_pref)
print('Multi Scale Spectral Loss Quanitzed:', multiScale_quant_original.numpy()/1000000)
Multi Scale Spectral Loss mp3: [1.14763687] Multi Scale Spectral Loss Quanitzed: [2.28411725]
Observe: