Metrics
How to implement a new metrics
This guide explains how to implement a new metric to evaluate. For more examples, refer to the wibench.metrics module.
Create your_metric.py file in user_plugins directory.
Metric should return string, int or float value.
Post embed metrics
These kind of metrics should inherit PostEmbedMetric class and implement __call__ method. __call__ should take 3 arguments:
object data from dataset,
marked object,
watermark_data
Post attack metrics
These kind of metrics should inherit PostEmbedMetric class and implement __call__ method. __call__ should take 3 arguments:
marked object,
attacked object,
watermark_data
For example, for image-based metrics:
from wibench.typing import TorchImg
class MyMetric(PostEmbedMetric):
def __call__(
self,
img1: TorchImg,
img2: TorchImg,
watermark_data: Any,
):
...
return metric_res
Post extract metrics
These metrics should inherit PostExtractMetric class and implement __call__ method. __call__ should take 4 arguments:
object data from dataset,
marked object,
watermark_data,
extraction_result from extract method of an algorithm wrapper
For example, for image-based metrics:
from wibench.typing import TorchImg
class MyMetric(PostEmbedMetric):
def __call__(
self,
img1: TorchImg,
img2: TorchImg,
watermark_data: Any,
extraction_result: Any,
):
...
return metric_res
Implemented metrics
PSNR
SSIM
BER
TPRxFPR
- class wibench.metrics.base.TPRxFPR(fpr_rate: float)[source]
True Positive Rate at fixed False Positive Rate threshold.
Robustness metric for watermark detection systems.
Parameters
- fpr_ratefloat
Target false positive rate (e.g., 0.01 for 1% FPR)
Notes
Uses binomial distribution for threshold calculation
Caches thresholds for efficiency
Binary classification metric
P-value
- class wibench.metrics.base.PValue[source]
P-value of extraction result. P-value denotes probability to observe the same result as in case of extraction from not watermarked object.
Notes
For zero-bit methods we assume that extraction function returns p-value itself.
For multi-bit methods p-value is calculated as probability to get the same number of mismatched bits or less than observed in case of a random message with unified i.i.d. bit values.
Lower p-value stands for more confident “content is watermarked” decision.
LPIPS
- class wibench.metrics.lpips.lpips.LPIPS(net: str = 'alex', device: str = 'cpu')[source]
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric [paper].
The implementation is taken from the github repository.
Initialization Parameters
- netstr
Type of network architecture (default ‘alex’)
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- img1TorchImg
Input image tensor in (C, H, W) format
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
DreamSim
- class wibench.metrics.dreamsim.dreamsim.DreamSim(*args, **kwargs)[source]
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data.
The implementation is taken from the github repository.
Initialization Parameters
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- img1str
Input image tensor in (C, H, W) format
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
Aesthetic
- class wibench.metrics.aesthetic.aesthetic.Aesthetic(*args, **kwargs)[source]
Aesthetic score predictor based on a simple neural net that takes CLIP embeddings as inputs.
The implementation is taken from the github repository. Based on improved-aesthetic-predictor code base.
Initialization Parameters
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- img1TorchImg
Input image tensor in (C, H, W) format
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
BLIP
- class wibench.metrics.blip.blip.BLIP(device: str = 'cpu')[source]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
The implementation is taken from the github repository. Based on BLIP code base.
Initialization Parameters
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- promptstr
Text prompt for comparison
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
CLIPScore
- class wibench.metrics.clip.clip.CLIPScore(*args, **kwargs)[source]
CLIPScore: A Reference-free Evaluation Metric for Image Captioning.
The implementation is taken from the github repository. Based on CLIP code base.
Initialization Parameters
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- promptstr
Text prompt for comparison
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
CLIP_IQA
- class wibench.metrics.clip_iqa.clip_iqa.CLIP_IQA(prompts: Tuple[Union[str, Tuple[str]]] = ('quality',), device: str = 'cpu')[source]
Exploring CLIP for Assessing the Look and Feel of Images [paper].
The implementation is taken from the repository.
Initialization Parameters
- promptsTuple[Union[str, Tuple[str]]]
List of text prompts for assessing the visual quality of an image (default (“quality”,))
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- img1TorchImg
Input image tensor in (C, H, W) format
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
ImageReward
- class wibench.metrics.image_reward.image_reward.ImageReward(device: str = 'cpu')[source]
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation.
The implementation is taken from the github repository.
Initialization Parameters
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
Call Parameters
- promptstr
Text prompt for comparison
- img2TorchImg
Input image tensor in (C, H, W) format
- watermark_dataAny
Not used, can be anything
Notes
The watermark_data field is required for the pipeline to work correctly
FID
- class wibench.metrics.fid.fid.FID(dataset_type: Optional[str] = None, dataset_args: Dict[str, Any] = {'cache_dir': None, 'sample_range': None, 'split': 'val'}, device: str = 'cpu', feature: int = 2048, normalize: bool = True)[source]
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium [paper].
The implementation is taken from the repository.
Initialization Parameters
- dataset_typeOptional[str]
A dataset of images that will be used as real ones. If not specified, actual images will be added during the pipeline (default None)
- dataset_args: Dict[str, Any]
Arguments for the dataset_type dataset (default {“sample_range”: None, “split”: “val”, “cache_val”: None})
- devicestr
Device to run the model on (‘cuda’, ‘cpu’)
- feature: int
An integer will indicate the inceptionv3 feature layer to choose. Can be one of the following: 64, 192, 768, 2048 (default 2048)
- normalize: bool
Argument for controlling the input image dtype normalization (default True)