Powered by pyannote-audio, the most widely used speaker diarization toolkit in production. Pre-trained models are bundled with Augent and downloaded automatically during installation. No API keys, no tokens, no accounts required. Automatically detects the number of speakers. Handles overlapping speech. Models used:Documentation Index
Fetch the complete documentation index at: https://docs.augent.app/llms.txt
Use this file to discover all available pages before exploring further.
| Model | Role |
|---|---|
| speaker-diarization-3.1 | Main pipeline: detects speakers and assigns segments |
| segmentation-3.0 | Underlying segmentation model used by the pipeline |
Example
Request:Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
audio_path | Yes | — | Path to the audio/video file |
model_size | No | tiny | Whisper model size for transcription |
num_speakers | No | auto-detect | Number of speakers (omit to auto-detect) |
How it works
- Transcribe the audio with faster-whisper (from memory if already transcribed)
- Diarize with pyannote to detect speaker boundaries and count
- Merge transcription segments with speaker turns by timestamp overlap
- Cache the result. Same file, same speaker count returns instantly on next call.
Combine with other tools
Use the diarized output to drive deeper analysis:search_audioordeep_searchto find what a specific speaker said about a topicseparate_audiobefore diarization for cleaner results on noisy recordingschaptersto see which speakers dominate which sectionsbatch_searchto find a speaker’s remarks across multiple recordings

