How to use SoundID VoiceAI in your DAW

How to load the plugin in DAW, understand the plugin mechanics, and product utilization for the best possible results.


In this article:


Get started with the SoundID VoiceAI plugin

SoundID VoiceAI is a standalone product from Sonarworks that can be used to transform voice and instruments with the power of AI. SoundID VoiceAI is a DAW plugin - AU, AAX, VST3. To review the specifications, get access, or install the product follow this support guide: Setting up with SoundID VoiceAI


Applying the plugin in DAW

SoundID VoiceAI can be used in various DAWs as the product will be used as an independent plugin. We will cover Ableton Live as one of the popular DAWs, the process of loading (inserting) the plugin may slightly change depending on the DAW in use, however, the appliance of the plugin and its mechanics will be the same at all times.


Loading the SoundID VoiceAI plugin

Create or load a session in your DAW, the SoundID VoiceAI plugin primarily is designed to transform a singing voice into a realistic singing voice of another human. Locate the audio track where vocal audio is recorded and insert the plugin onto a track.

  1. Select the Lead or backing vocal track
  2. Click on Plug-Ins > Sonarworks > SoundID VoiceAI
  3. Load (insert) the plugin by double-clicking or dropping the plugin to Audio Effects


Tip: The plugin can be loaded on multiple tracks (multiple instances) to apply different changes. For example, the lead vocal track can be processed with a voice change, but the backing tracks can be changed to instruments instead of vocals. 


Ablik .png


Arming the plugin for audio capture

Before the target voice or instrument model AI processing can be applied, the input audio of the DAW project track must be captured:

  1. Click on Capture to Arm the plugin
  2. Select your DAW playback position and start playback
  3. Click on Stop to complete the capture
  4. Click on Remove to delete the last capture and start over


Once the capture is Stopped, the exact audio capture duration and region timestamps will be displayed.


Audio capture - VoiceAI.png


Captured audio - SoundID VoiceAI.png


Important to know when capturing audio

  • The audio capture mechanics depend on smooth continuous playback. Don't change the playback position while an audio capture is in progress.
  • The positioning of the AI replacement audio will depend on the captured audio region timestamps. Don't change the audio content position on the track after capturing.
  • The plugin supports a single audio capture per plugin instance only. If two fragments on the same track need to be captured and processed, there are two solutions:
    • Use two plugin instances on the same track
    • Capture a single (longer) clip with both fragments
  • If loop mode is enabled in DAW, the capture might become corrupt when the playhead reaches the loop point and jumps back, and re-capturing might be needed.
  • There is no "Undo" functionality in the plugin. Any Removed captures that have already been processed with AI replacement audio can only be recovered using the raw audio files generated and stored in the cache folder.




Preset selection

SoundID VoiceAI plugin offers 23 voice models and 21 creative (instruments) models, the preset can be previewed by clicking on the '▶' button or saved as a favorite by clicking on the '♥'  button.


Important: Logic Pro users, make sure to playback or enable input monitoring to hear the preset previews in SoundID VoiceAI. This is a requirement due to Logic Pro architecture, otherwise, the presets won't be heard.


We recommend selecting the preset that sounds as close as possible to the original vocals for the best results, however, it's possible to use the transpose option to make pitch adjustments. For more details on the preset selection, see this article: Choosing your target voice/instrument model.




Audio processing

  1. Click on Voices or Creative to select the target voice or instrument preset
  2. Click on '▶' ("play") to preview how the preset sounds at its best vocal range
  3. If your source pitch is similar to the preset preview, proceed to Start processing
  4. If the results sound too high or low, use Transpose to adjust the output pitch by seminotes, and process again
  5. Use the AI voice button to Enable/Disable the transformation on the track


Before committing to process the entire track, it's a good idea to highlight and process a smaller section of the track first and ensure the results sound good. Processing takes approximately 2.5x the time of the captured audio duration. It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.


1 minute of audio processing costs 600 tokens. The token amount needed for processing will always be displayed on the Start processing button. You can check your balance in the plugin, or in your Sonarworks Account. Learn more about tokens below.


Note: Learn more about optimal preset selection and Transpose use below.




Input/output audio quality and properties

SoundID VoiceAI plugin can cater to a relatively wide range of recording quality for the input track. Regular phone microphone recordings in a random space with reverb are perfectly okay to use - after processing, the output results will have the properties of studio-quality audio captured with a great microphone.


This applies only to a certain degree, there are some limits to take into consideration:

  • Repeated AI processing on the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • Excessive reverb on the input audio can lead to melodic artifacts in the output.
  • It is possible to Reprocess the results for free (limited to 10 times per hour) to minimize excessive artifacts.
  • When applied to non-English singing, some amount of English accent might bleed over into the processed voice depending on the preset applied.
  • The AI models can sometimes introduce artifacts such as clipping "s'es" into the processing results. This is typically resolved by re-processing or adjusting the Transpose setting to a value closer to the input track pitch. 
  • The AI models work great for normal spoken voice tracks too, however, when applied to extreme emotional states of speech such as whispering or shouting, artifacts are possible.
  • Repeated AI processing of the same audio capture will not produce identical results. Due to the creative nature of the AI models in SoundID VoiceAI, results will be slightly different each time.
  • The intonation of the input voice audio is a key aspect of the AI models. Raspiness in the voice (rough, raspy, strained, or breathy properties), can lead to artifacts in the processing results.

There is additional documentation available on the audio properties and the AI models used in SoundID VoiceAI (AI model training data, data protection of the processed audio, etc.), learn more here: Input/output audio quality and properties.


Is it possible to restore the audio track after the processing?

With the change applied actions such as undo will not cancel the processed change. In a situation where the original input track needs to be recovered, see this support article: Where can I find the raw audio files generated?

Screenshot 2024-04-04 at 9.21.51 AM.png

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request


Please sign in to leave a comment.