Setting up with SoundID VoiceAI plugin

Getting started with SoundID VoiceAI - from downloading and setting up a trial to loading the plugin in DAW and exploring the features, this step-by-step guide covers the entire process.

In this article:

Download and install
Load and trial/activate the plugin in your DAW
Capture audio
- Important to know when capturing audio
Select your preset and apply AI processing
Input/output audio quality and properties
Tokens and minutes

SoundID VoiceAI

SoundID VoiceAI is a voice and instrument AI transformation plugin for DAW. It allows changing the recorded singing voice to that of another human being or an instrument using AI technology:

Voice model library: transform your vocal track into a realistic singing voice from a studio-grade AI library of 23 voice models
Instrument model library: transform your melodic humming or beatbox to sound like drums, guitar violin, or other instruments from a studio-grade AI library of 21 instrument models

Transform singing voice tracks, generate backing vocals from a single voice track, transform speaking voice tracks, mimic instruments with your voice, and transform vocal inputs into realistic instruments for quick transfers of melodic ideas into DAW or creative sound generation, turn beatboxing into drums, and more.

Learn more about the use cases and advantages here: What is SoundID VoiceAI?

Download and install

SoundID VoiceAI voice and instrument transformation plugin can be used in DAW (e.g. Cubase, Logic Pro X, Pro Tools, etc.), and the AI audio processing is cloud-based. Here are the basic system requirements for using SoundID VoiceAI:

macOS 11 Big Sur, 12 Monterey, 13 Ventura, 14 Sonoma
Windows 10, 11
DAW or other plugin host app that supports AU, AAX, or VST3 plugin formats
SoundID VoiceAI processing tokens available in your Sonarworks Account
Stable internet connection, as cloud processing is used (offline use not supported)

SoundID VoiceAI installer can be downloaded here and will install the plugins in the default plugin install directories on macOS and Windows:

Macintosh HD/Library/Audio/Plug-Ins/Components/SoundIDVoiceAI.component
Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins/SoundIDVoiceAI.aaxplugin
Macintosh HD/Library/Audio/Plug-Ins/VST3/SoundIDVoiceAI.vst3

C:\Program Files\Common Files\VST3\Sonarworks\SoundIDVoiceAI\SoundIDVoiceAI.vst3
C:\Program Files\Common Files\Avid\Audio\Plug-Ins\SoundIDVoiceAI.aaxplugin\Contents\x64\SoundIDVoiceAI.aaxplugin

Load and trial/activate the plugin in your DAW

To start working with SoundID VoiceAI, load the plugin on any voice or instrument track in your DAW project:

Download and install the SoundID VoiceAI plugin
Launch your DAW and load the SoundID VoiceAI plugin on an audio track
Log in to your Sonarworks Account, or Sign up to create a new account
- Click on Start trial to start a 7-day trial with 9000 free processing tokens
- Click on Activate if you already have unused tokens available in your account
On the next page, click on Activate
In the following page, click on Activate on this device to allow the browser to launch the licensing service
Return to DAW - the plugin will be activated

VoiceAI - Logic Pro.png

Capture audio

Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:

Click on Capture to Arm the plugin
Select your DAW playback position and start playback
Click on Stop to complete the capture
Click on Remove to delete the last capture and start over

Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.

Audio capture - VoiceAI.png

Captured audio - SoundID VoiceAI.png

Important to know when capturing audio

The audio capture mechanics depend on smooth continuous playback. Don't change the playback position while an audio capture is in progress.
The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.

Learn more about the plugin mechanics here: How to use SoundID VoiceAI in your DAW

Select your preset and apply AI processing

Click on Voices or Creative to select the target voice or instrument preset
Click on '▶' ("play") to preview how the preset sounds at its best vocal range
If your source pitch is similar to the preset preview, proceed to Start processing
If the results sound too high or low, use Transpose to adjust the output pitch by seminotes, and process again
Use the AI voice button to Enable/Disable the transformation on the track

Before committing to process the entire track, it's a good idea to highlight and process a smaller section of the track first and ensure the results sound good. Processing takes approximately 2.5x the time of the captured audio duration. It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.

1 minute of audio processing costs 600 tokens. The token amount needed for processing will always be displayed on the Start processing button. You can check your balance in the plugin, or in your Sonarworks Account. Learn more about tokens below.

Note: Learn more about optimal preset selection and Transpose use below.

Audio processing complete - SoundID Voice.png

Important to know when processing

Processing requires a minimum of 70 tokens (7 seconds) followed by 10 token increments.
Tokens will still be deducted if processing is Canceled while in progress.
Repeated AI processing of the same audio source will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
The positioning of the AI replacement audio relies on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
The plugin supports a single audio capture and processing per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
- Use two plugin instances on the same track
- Capture a single (longer) clip with both fragments
If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the Cache folder.

Reprocessing

It is possible to Reprocess the AI processing results for free up to 10 times per hour to minimize excessive artifacts (additional reprocessing will deduct tokens, see below). Free Reprocessing is only available with the same Preset and Transpose settings. After the captured audio has been processed, by clicking on the Reprocess button, processing starts again with the same source, preset, and Transpose combination. The previous processing result will get overwritten.

Screenshot 2024-04-29 at 15.00.28.png

Note: If the Reprocessing limit is reached, you will see a message indicating that free Reprocessing is unavailable. If you choose Use tokens, tokens are deducted from your token balance for processing.

Optimal preset selection and Transpose use for voice transformation

Transpose

The primary use case for SoundID VoiceAI is transforming a singing voice into a realistic singing voice of another human being. Ideally, the original input should match the best input pitch - see the preset descriptions for what recorded audio pitch will generate the best results. If the natural vocal range difference is significant between the input audio and the applied preset, pitch adjustments can be made with the Transpose feature.

Transpose allows pitch adjustments by semitones (half steps) for the generated audio. 12 steps of the Transpose parameter value corresponds to an octave. Transpose can be adjusted to +/- 4 octaves (48 steps up or down). If the Transpose value is unaltered, the pitch will remain the same.

Achieving optimal results becomes more straightforward and efficient when certain parameters are considered, particularly when a project is fixed to a specific key. Before processing a vocal track, we recommend taking the following steps:

Preview the preset by clicking on "▶" (play button).
Evaluate the best input pitch to find a suitable preset without Transposing the output pitch.
Use Transpose according to the preset model's vocal range:
- If the target preset sings in a higher pitch than your input voice track, increase the value of the Transpose parameter.
- If the target preset sings in a lower pitch than your input voice track, decrease the value of the Transpose parameter.
Process a small section and evaluate the results before committing to process the entire track.

Note: Transpose values below or above 12 might produce unexpected results. Using Transpose with Drums will have a small impact on the overall sound and is not advised.

Auto-transpose

By default, an additional Auto-transpose feature is enabled. When it is active, the Transpose knob is unavailable for adjustments, and the plugin automatically detects and applies the optimal Transpose value for the combination of the captured audio and the applied preset.

For Voice presets, the auto-transpose values can be -12, 0, or +12
For Creative (instrument) presets, the auto-transpose values range from -24, -12, 0, +12, or +24

To switch back to manual Transpose adjustments, disable the 'Auto' checkbox - manual adjustments will become available again (by default, the last set value of Auto-transpose will be retained).

Creative (instrument) transformation

With the Creative presets you can transform humming and beatboxing into tracks that sound like instruments, discover new ways of generating sounds and melodies, and create demo songs quickly. Here are some ideas to consider:

Mimic instruments with your voice and transform vocal inputs into realistic instruments for quick transfer of melodic ideas into DAW or creative sound generation.
Turn beatboxing into drums. Record a few bars of beatboxing to create a drum track.
Transform existing instrument tracks. Convert your guitar solo into a saxophone solo, use your guitar to create a realistic bass guitar track, or use a trumpet track to harmonize, and create an entire brass section of various instruments, and much more.
Use virtual instruments for creative AI processing.

Input/output audio quality and properties

SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.

This applies only to a certain degree, there are some limits to take into consideration:

Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
Excessive reverb on the input audio can lead to melodic artifacts in the output.
It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
When applied to non-English singing, some amount of English accent might bleed over into the processed voice depending on the preset applied.
The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch.
The AI models work great for normal spoken voice tracks too, however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice (rough, raspy, strained, or breathy properties), can lead to artifacts in the processing results.

Tokens and minutes

SoundID VoiceAI is a pay-as-you-go model, enabling you to pay for the token packs needed for audio processing only. There are no subscription fees or other hidden charges involved, and the SoundID VoiceAI plugin itself is free to download and install. Here is what you need to know:

Processing cost: 600 tokens per 1 minute of audio processing.
A minimum charge of 70 tokens (7 seconds) applies for each processing instance, followed by increments of 10 tokens (1 second).
Transpose adjustments to an already processed audio capture will require re-processing

Here's a realistic example of tokens spent in a specific scenario - vocal replacement for a full song:

Capturing a 12 seconds sample audio of a voice track
Processing the sample with 5 voice presets and trying 3 different Transpose settings on each preset to find the best fit: 12x5x3 = 180 seconds / 3 minutes = 1800 tokens
Processing the entire vocal track of 2.5 minutes = 1500 tokens
Total processing time and token cost: 5.5 minutes = 3300 tokens

SoundID VoiceAI token packs can be purchased from your Sonarworks Account:

Small token pack: 72,000 tokens (120 minutes of audio processing) - 19.99 EUR/USD
Medium token pack: 180,000 tokens (300 minutes of audio processing) - 39.99 EUR/USD
Large token pack: 360,000 tokens (600 minutes of audio processing) - 69.99 EUR/USD

A 7-day trial with 9000 free tokens is available in your Sonarworks Account. If you haven't created a Sonarworks Account in the past, sign up here.

Note: The trial tokens will expire once the 7-day trial runs out, or once a token purchase is made.

Learn more about the token system here: Tokens and minutes

3 comments

sonarworks

May 08, 2024 07:37

Hi - I'm really appreciating the DAW integration, it is a great feature for previewing and helps workflow. I'm not sure if right here is the best place to make feature requests so I apologise if this is just wrong to ask: “can we train our own voice models?” and “can the roughly ½ second latency when playing back the processed vocal be removed?” Thanks, Tony.

Zane

May 08, 2024 09:41

Hi sonarworks, thanks for reaching out!

At the moment, it is not possible to train your own voice models, so you would have to choose from the Voice and Creative preset options available in the plugin. Thanks for letting us know you would be interested in training your own voice.

As for the latency - the processed audio should blend in without any delay. I will open up a support request on your behalf on this though, so we can check it closer.

If you have any other feature requests, feel free to submit them in our Community page here: Feature Requests

Rene Dwight

July 11, 2024 12:32

Hi there, having problems with the installation for cubase… My installation path is C:\Program Files\Vstplugins but when cubase searches that path for plugins nothing shows?