Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing

Zhuoying Li*, Zhu Xu*, Yuxin Peng, Yang Liu
Peking University
ICML 2025

*Indicates Equal Contribution

Abstract

Instruction-based image editing, which aims to modify the image faithfully towards instruction while preserving irrelevant content unchanged, has made advanced progresses. However, there still lacks a comprehensive metric for assessing the editing quality. Existing metrics either require high costs concerning human evaluation, which hinders large-scale evaluation, or adapt from other tasks and lose specified concerns, failing to comprehensively evaluate the modification of instruction and the preservation of irrelevant regions, resulting in biased evaluation. To tackle it, we introduce a novel metric Balancing Preservation Modification (BPM), that tailored for instruction-based image editing by explicitly disentangling the image into editing-relevant and irrelevant regions for specific consideration. We first identify and locate editing-relevant regions, followed by a two-tier process to assess editing quality: Region-Aware Judge evaluates whether the position and size of the edited region align with instruction, and Semantic-Aware Judge further assesses the instruction compliance within editing-relevant regions as well as content preservation within irrelevant regions, yielding comprehensive and interpretable quality assessment. Moreover, the editing-relevant region localization in BPM can be integrated into image editing approaches to improve the editing quality, manifesting its wild application. We verify the effectiveness of BPM on comprehensive instruction-editing data, and the results show that we yield the highest alignment with human evaluation compared to existing metrics, indicating efficacy.

Method

Method overview diagram

The core idea of BPM is to explicitly divide the image into editing area and non-editing area, measuring the conformity to editing instructions in the edited area and assessing the preservation of the original image in the non-edited area. BPM mainly consists of three parts: (a) Edited area localization: LLM parsing instruction -> Grounded SAM localizing edited area; (b) Region-Aware Judge: Evaluate edit quality from the perspective of object size and position; (c) Semantic-Aware Judge: Evaluate editing from a semantic perspective, divided into modification score and preservation score.

Experiment

Method overview diagram

Human alignment test on BPM and existing image editing metrics for local edits.

Method overview diagram

Human alignment test on BPM and existing image editing metrics for global edits.

Visualization of Our BPM

Application for Editing Quality Enhanceme

The mask generated by the pipeline that locates the editing area in BPM can be applied to enhance editing quality by simply modifying the classifier-free guidance. Here are some visualized examples.

Method overview diagram

BibTeX

BibTex Code Here