.. _pipeline-link: Pipeline ======== The system architecture consists of a sequence of processing configurable stages. .. _configuration-link: Configuration file ------------------ The YAML configuration file provides all necessary components of the benchmarking pipeline. Key sections are: .. code-block:: yaml algorithms: ... datasets: ... attacks: ... post_embed_metrics: ... post_attack_metrics: ... post_extract_metrics: ... post_pipeline_embed_metrics: ... post_pipeline_attack_metrics: ... pipeline: ... The configuration file supports `Jinja2`` inclusion syntax, e.g.: .. code-block:: yaml+jinja datasets: {% include 'common/diffusiondb.yml' %} This will expand to (note: indentation remains the same as in `common/diffusiondb.yml `_ configuration file): .. code-block:: yaml datasets: - DiffusionDB: cache_dir: ./datasets/diffusiondb subset: 2m_first_5k skip_nsfw: true The following sections describe the key components of the configuration in detail. Algorithms ~~~~~~~~~~ Provides parameters of the watermarking algorithm wrapper (an instance of a class inherited from `BaseAlgorithmWrapper `_) to test. This may be a single wrapper or a list of wrappers (all wrappers in the list will be tested with the same configuration). For example, you may test the same watermarking algorithm with different parameters. You can also redefine `report_name` so that different configurations are aggregated under different `method` fields. All other fields are passed to the class constructor as parameters for the watermarking algorithm. For example, the `__init__` method of `DCTMarkerWrapper `_ takes a single argument `params` of type `Dict`. The YAML configuration supports lists, dictionaries, strings, integers, floating-point numbers, and any combination of them. .. code-block:: yaml algorithms: - dct_marker: report_name: dct_256 # name of the algorithm recorded in the aggregation destination params: # dictionary with parameters of the algorithm (passed to __init__ of `DCTMarkerWrapper`) width: 256 height: 256 wm_length: 100 block_size: 256 ampl1: 0.01 ampl_ratio: 0.7 lambda_h: 4 - dct_marker: report_name: dct_512 params: width: 512 height: 512 wm_length: 100 block_size: 256 ampl1: 0.01 ampl_ratio: 0.7 lambda_h: 4 Datasets ~~~~~~~~ Provides parameters of the dataset (an instance of a class inherited from `BaseDataset `_) to test on. This may be a single dataset or a list of datasets. As with **Algorithms**, you can redefine `report_name`. All other fields are passed to the dataset constructor. .. code-block:: yaml datasets: - DiffusionDB: report_name: diffusion_db # if you want to redefine the default diffusiondb value cache_dir: ./datasets/diffusiondb # Parameters subset: 2m_first_5k # provided to __init__ method skip_nsfw: true # of the DiffusionDB class constructor Attacks ~~~~~~~ Provides parameters of attacks (an instance of a class inherited from `BaseAttack `_) applied to objects with embedded watermarks (for example, marked images). This may be a list of attacks; `report_name` may be redefined. Below is an example of a JPEG compression attack applied to images with different quality factors: .. code-block:: yaml attacks: - JPEG: report_name: jpeg_80 quality: 80 - JPEG: report_name: jpeg_50 quality: 50 - JPEG: report_name: jpeg_20 quality: 20 Note: applying the same attack with different parameters requires redefining `report_name`, as shown in the example above (otherwise, values will be overwritten). Metrics ~~~~~~~ Post-embed metrics ^^^^^^^^^^^^^^^^^^ Metrics evaluated after embedding the watermark. Must inherit the `PostEmbedMetric `_ class. Generally responsible for assessing the perceptual quality of the watermarked object. Supports redefining `report_name`. All parameters are passed to the metric class constructor. .. code-block:: yaml post_embed_metrics: - PSNR - SSIM - LPIPS: net: alex Post-attack metrics ^^^^^^^^^^^^^^^^^^^ Metrics evaluated after applying attacks. Generally responsible for assessing the perceptual quality after the attack. You may use the same metrics as in **post_embed_metrics**. .. code-block:: yaml post_attack_metrics: - PSNR - SSIM - LPIPS: net: alex Post-extract metrics ^^^^^^^^^^^^^^^^^^^^ Metrics evaluated after extracting the watermark. Must inherit the `PostExtractMetric `_ class. Generally responsible for assessing the robustness of the watermark extraction algorithm against applied attacks. Supports redefining `report_name`. All parameters are passed to the metric class constructor. .. code-block:: yaml post_extract_metrics: - ExtWm - BER - TPR@xFPR: report_name: tpr@0.001fpr fpr_rate: 0.001 Post-pipeline embed metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Metrics evaluated after pipeline execution. Can be used to compare set of original objects (e.g. set of not marked images) with set marked objects. Refer to ``configs/trustmark_fid_demo.yml`` for example .. code-block:: yaml post_pipeline_embed_metrics: - FID: report_name: fid_mscoco device: cuda dataset_type: mscoco # You can calculate FID in comparison with any supported dataset of images dataset_args: cache_dir: ./datasets/mscoco split: val - FID: # Comparison of marked images and not marked original images device: cuda Post-pipeline attack metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You may use the same metrics as in **post_pipeline_embed_metrics** to compare set of marked objects and set of attacked objects. Pipeline ~~~~~~~~ Parameters for the pipeline, including multiprocessing and results aggregation: .. code-block:: yaml pipeline: result_path: ./result_path aggregators: - CSV: table_name: table - ClickHouse: db_config: ./db_configs/dct_wm.ini min_batch_size: 100 seed: 42 dump_type: serialized workers: 2 cuda_visible_devices: 2,3 Description of parameters: * `result_path` — path to save intermediate results * `aggregators` — list of result aggregators * `CSV` — aggregates results into a table in a CSV file * `table_name` — name of the table to save results in; creates two tables: * `result_path/metrics_table_name.csv` — for metric results * `result_path/params_table_name.csv` — for algorithm parameters * `ClickHouse` — aggregates results into a `ClickHouse `__ database * `db_config` — path to the `.ini` file with database configuration * `min_batch_size` — minimum number of records to aggregate at once * `seed` — fixed random seed for experiment reproducibility. If not provided, results may differ with each run * `dump_type` — type of intermediate result dumping; supports two values: * `serialized` — serialized save (for example, all images saved as `.png` files) * `pickle` — intermediate results saved as a single pickle file for each object * `workers` — number of processes for parallel execution * `cuda_visible_devices` — if running the pipeline on a cluster with multiple GPUs, you may list GPU IDs here as comma-separated numbers. It is recommended to use the same number of GPU devices as workers.