Documentation Index
Fetch the complete documentation index at: https://docs.augent.app/llms.txt
Use this file to discover all available pages before exploring further.
Uses Meta’s HTDemucs model to split audio into individual stems. The vocal stem feeds directly into any other Augent tool for dramatically cleaner results on noisy recordings.
Requires: pip install augent[separator] (included in augent[all])
Example
Request:
{
"audio_path": "/Users/you/Downloads/podcast-with-intro-music.mp3",
"vocals_only": true
}
Response:
{
"stems": {
"vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"no_vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/no_vocals.wav"
},
"vocals_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"model": "htdemucs",
"source_file": "/Users/you/Downloads/podcast-with-intro-music.mp3",
"cached": false,
"output_dir": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals",
"hint": "Use the vocals_path as the audio_path in transcribe_audio, search_audio, deep_search, or any other tool for clean results."
}
Then transcribe the clean vocals:
{
"audio_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"model_size": "tiny"
}
Parameters
| Parameter | Required | Default | Description |
|---|
audio_path | Yes | | Path to the audio/video file |
vocals_only | No | true | If true, separates into vocals + no_vocals (faster). If false, separates into all 4 stems: vocals, drums, bass, other. |
model | No | htdemucs | Demucs model. htdemucs is fast with great quality. htdemucs_ft is fine-tuned for best quality but slower. |
Full 4-Stem Separation
Set vocals_only to false to get all four stems:
{
"audio_path": "/Users/you/Downloads/song.mp3",
"vocals_only": false
}
Response:
{
"stems": {
"vocals": "/path/to/vocals.wav",
"drums": "/path/to/drums.wav",
"bass": "/path/to/bass.wav",
"other": "/path/to/other.wav"
},
"vocals_path": "/path/to/vocals.wav",
"model": "htdemucs",
"cached": false
}
Notes
Results are cached by file hash. The first run separates the audio. Every run after is instant.
Use vocals_only: true (the default) when your goal is transcription. It is faster than full 4-stem separation and produces the same vocal quality.
Separated stems are stored at ~/.augent/separated/. Each file gets its own directory named by hash, so the same file is never processed twice.
The vocals_path from the response can be used as audio_path in any Augent tool: transcribe_audio, search_audio, deep_search, chapters, identify_speakers, batch_search, and more.
For best quality on difficult audio (heavy overlapping voices and music), use model: "htdemucs_ft". It is slower but produces cleaner separation.