PMID- 38005673 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20231127 IS - 1424-8220 (Electronic) IS - 1424-8220 (Linking) VI - 23 IP - 22 DP - 2023 Nov 20 TI - Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities. LID - 10.3390/s23229287 [doi] LID - 9287 AB - At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. FAU - Watanabe, Yuto AU - Watanabe Y AUID- ORCID: 0000-0002-9841-4089 AD - Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan. FAU - Togo, Ren AU - Togo R AUID- ORCID: 0000-0002-4474-3995 AD - Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan. FAU - Maeda, Keisuke AU - Maeda K AUID- ORCID: 0000-0001-8039-3462 AD - Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan. FAU - Ogawa, Takahiro AU - Ogawa T AUID- ORCID: 0000-0001-5332-8112 AD - Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan. FAU - Haseyama, Miki AU - Haseyama M AUID- ORCID: 0000-0003-1496-1761 AD - Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan. LA - eng GR - JP21H03456/Japan Society for the Promotion of Science/ GR - JP23K11141/Japan Society for the Promotion of Science/ GR - JP23K11211/Japan Society for the Promotion of Science/ PT - Journal Article DEP - 20231120 PL - Switzerland TA - Sensors (Basel) JT - Sensors (Basel, Switzerland) JID - 101204366 SB - IM PMC - PMC10675000 OTO - NOTNLM OT - evaluation metric OT - generative adversarial network OT - manipulation direction OT - text-guided image manipulation COIS- The authors declare no conflict of interest. EDAT- 2023/11/25 12:44 MHDA- 2023/11/25 12:45 PMCR- 2023/11/20 CRDT- 2023/11/25 01:29 PHST- 2023/10/06 00:00 [received] PHST- 2023/10/26 00:00 [revised] PHST- 2023/11/15 00:00 [accepted] PHST- 2023/11/25 12:45 [medline] PHST- 2023/11/25 12:44 [pubmed] PHST- 2023/11/25 01:29 [entrez] PHST- 2023/11/20 00:00 [pmc-release] AID - s23229287 [pii] AID - sensors-23-09287 [pii] AID - 10.3390/s23229287 [doi] PST - epublish SO - Sensors (Basel). 2023 Nov 20;23(22):9287. doi: 10.3390/s23229287.