Pixinwav: Residual steganography for hiding pixels in audio
Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual steganography setup we propose allows an encoding of the hidden image that is independent from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offline—which can be later hidden, in a residual fashion, into any audio signal. ; Work partially supported by the European Union through the Erasmus+ student mobility program, Science Foundation Ireland (SFI) under grant numbers SFI/15/SIRG/3283 and SFI/12/RC/2289 P2, and the Spanish Research Agency (AEI) under project PID2020117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033. ; Peer Reviewed ; Postprint (author's final draft)