[edit]
Robust Blind Watermarking Framework for Hybrid Networks Combining CNN and Transformer
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1417-1432, 2024.
Abstract
As an essential means of copyright protection, the deep learning-based robust watermarking method is being studied extensively. Its framework consists of three main parts: the encoder, the noise layer and the decoder. But practically all of the schemes are directed at the encoder rather than the decoder. And the whole network is structured by shallow Convolutional Neural Networks (CNNs) for primary feature extraction, while CNNs capture local information and do not model non-local information in watermarked images well. To solve this problem, we consider the use of Transformer networks with a spatially self-attention mechanism. We propose to construct a novel decoder network by combining Transformer and CNNs, which can not only enriches local feature information but also enhances the ability to explore global representations. Meanwhile, to embed secret messages more perfectly, we design a multi-scale attentional feature fusion module to achieve an efficient aggregation of cover image features and secret message features, resulting in the encoded images with rich hybrid features. In addition, perceptual loss is introduced to better evaluate the visual quality of the watermarked images. Extensive experimental results show that our proposed method achieves better results in terms of imperceptibility and robustness compared with existing State-Of-The-Art (SOTA) methods.