Image harmonization is a crucial task in image composition and editing, which aims to adjust the foreground to be harmonious with the background. Due to the lack of clear reference standards for foreground appearance adjustment, image harmonization is a highly ill-posed problem. Recent works have shown that the appearance of similar content between the background and foreground can provide reliable guidance to reduce this ill-posedness. However, when there is no similar content to the foreground object in the background, these methods often fail to achieve satisfactory results. To address this problem, we propose a Reference-Aware Image Harmonization Network (RANet). First, we introduce a Style-aware Reference Module that utilizes a diffusion model as an estimator to generate reference style features, thereby achieving global style alignment. Second, we design a Region-aware Reference Module that adaptively identifies and utilizes relevant reference regions between the foreground and background by predicting soft masks, effectively avoiding interference from irrelevant background areas while leveraging useful reference information from the background for better local visual consistency. Experiments on the iHarmony4 dataset and a real mobile phone captured dataset validate the effectiveness of our method. The code and the new dataset are available at https://github.com/thenotshyshyshy/Reference-Aware-Image-Harmonization.
Keywords: Appearance translation; Diffusion model; Image editing; Image harmonization.
Copyright © 2025 Elsevier Ltd. All rights reserved.