AntiFake - Using Adversarial Audio to Prevent Unauthorized Speech Synthesis

This repository hosts the source code for the paper "AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis". The paper has been accepted by the 30th ACM Conference on Computer and Communications Security (CCS), 26-30 November 2023.

AntiFake protects unauthorized speech synthesis of an arbitrary audio sample from attackers by adding adversarial perturbation. AntiFake selects an optimal target speaker and optimizes the source audio sample towards it. The target selection process is a combination of comparing embeddings from a sample database of speakers, as well as human auditory judgments.

Dependencies and Setup

AntiFake is implemented and tested with Python 3.7 on Ubuntu. We recommend running AntiFake in a virtual environment (e.g. conda) with at least 16GB of memory space.

The dependencies are included and can be directly installed via conda:

conda env create -f antifake.yml

Alternatively, manually install all dependencies by running the following:

conda create --name antifake python=3.7
conda activate antifake
pip install torch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 --no-cache
pip install -r ./TTS/requirements.txt
pip install -r requirements.txt
sudo apt install ffmpeg

Due to the file size restrictions, some of the files are moved to the supplementary materials hosted on Drive. It contains the following files and folders: antifake_synthesizer_bundle, autoregressive.pth, diffusion_decoder.pth, synthesizer.pt, and add_random_noise.py. The autoregressive.pth and diffusion_decoder.pth are required to run the AntiFake system, please put them under the path of AntiFake/tortoise/; similarly, please put synthesizer.pt under the path saved_models/default/. The folder named antifake_synthesizer_bundle contains the speech synthesizers that can be used to validate the efficacy of the protected samples, and their usage is documented in antifake_synthesizer_bundle/README_synthesizers.md. The add_random_noise.py script is used for examine the efficacy of the optimized perturbations, for details please see Additional Experiments.

Execution Command Explained

In run.py under the "options" block, users can change the options of the output directory OUTPUT_DIR, as well as which encoders to use in the ensemble in case VRAM is limited. Additionally, the attack is default in target-based mode, user can also turn on/off THRESHOLD_BASE to use threshold base instead. For more details, please refer to our paper.

To convert the audio file to the acceptable format of AntiFake, please use the following:

ffmpeg -i <source_audio_path> -acodec pcm_s16le -ac 1 -ar 16000 -ab 256k output_audio.wav

AntiFake takes in two arguments of a source audio file to be protected (.wav), and a protected output audio file (e.g. "./output/completed.wav")

python run.py <source_wav_path> <output_wav_path>

Two examples of running the system are:

python run.py "./samples/libri_2/source/source.wav" "./samples/libri_2/protected/protected.wav"

python run.py "./samples/human_1/source/source.wav" "./samples/human_1/protected/protected.wav"

As shown in .samples, the source and protected wav files for the two sample speakers are located in ./samples/[speaker]/source, and ./samples/[speaker]/protected. The synthesized results from the voice cloning engine are also included:

*_rtvc.wav: synthesized using Real Time Voice Cloning (also known as SV2TTS)

*_avc.wav: synthesized using Adaptive Voice Conversion

*_coqui.wav: synthesized using COQUI TTS

*_tortoise.wav: synthesized using Tortoise TTS

Additional Experiments

Please note that due to the consideration of computational resources, we currently configure AntiFake to use three of the encoders. Different combinations of encoders used in AntiFake can lead to variations in the protection strength. To configure the encoders used in AntiFake, please change individual encoder "options" in run.py to "True".

It might appear straightforward to simply add random noises throughout the entire audio to sufficiently deviate the speaker embedding, without the need for perturbation optimization. To validate the efficacy of our strategy, we extracted the perturbations by referencing the original speech audio, and constructed perturbations that share the same distribution. By applying them to the original audio samples, we found them less effective compared to optimized ones. The script for this comparison experiment can be found in add_random_noise.py. The command to use it is:

python Add_random_noise.py <protected_wav> <source_wav> <output_wav>

Citation

If you find the platform useful, please cite our work with the following reference:

@inproceedings{yu2023antifake,
  title={AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis},
  author={Yu, Zhiyuan and Zhai, Shixuan and Zhang, Ning},
  booktitle={Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security},
  pages={460--474},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS

TTS

adaptive_voice_conversion

adaptive_voice_conversion

rtvc

rtvc

samples

samples

saved_models/default

saved_models/default

speakers_database

speakers_database

tortoise

tortoise

LICENSE

LICENSE

README.md

README.md

antifake.yml

antifake.yml

points.csv

points.csv

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

AntiFake - Using Adversarial Audio to Prevent Unauthorized Speech Synthesis

Dependencies and Setup

Execution Command Explained

Additional Experiments

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
TTS		TTS
adaptive_voice_conversion		adaptive_voice_conversion
rtvc		rtvc
samples		samples
saved_models/default		saved_models/default
speakers_database		speakers_database
tortoise		tortoise
LICENSE		LICENSE
README.md		README.md
antifake.yml		antifake.yml
points.csv		points.csv
requirements.txt		requirements.txt
run.py		run.py

License

WUSTL-CSPL/AntiFake

Folders and files

Latest commit

History

Repository files navigation

AntiFake - Using Adversarial Audio to Prevent Unauthorized Speech Synthesis

Dependencies and Setup

Execution Command Explained

Additional Experiments

Citation

About

Resources

License

Stars

Watchers

Forks

Languages