Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.augent.app/llms.txt

Use this file to discover all available pages before exploring further.

Model Sizes

ModelSpeedAccuracy
tinyFastestExcellent (default)
baseFastExcellent
smallMediumSuperior
mediumSlowOutstanding
largeSlowestMaximum
Use tiny for nearly everything. Only upgrade for heavy accents, poor audio quality, or lyrics.

Example

Request:
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "model_size": "tiny"
}
Response:
{
  "text": "Full transcription text...",
  "language": "en",
  "duration": 1076.12,
  "duration_formatted": "17:56",
  "segments": [
    {
      "start": 0.0,
      "end": 4.8,
      "timestamp": "0:00",
      "text": "Welcome back to the show. Today we're diving into..."
    },
    {
      "start": 4.8,
      "end": 9.2,
      "timestamp": "0:04",
      "text": "something I've been thinking about for a long time."
    }
  ],
  "segment_count": 430,
  "cached": false,
  "model_used": "tiny"
}

Example: Transcribe a specific section

Use start and duration to transcribe only a portion of the file — no manual ffmpeg trimming needed.
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "start": 600,
  "duration": 300
}
This transcribes 5 minutes starting at the 10-minute mark. Timestamps in the response are offset back to the original file position.

Example: Export to file

{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "output": "~/Desktop/transcription.xlsx"
}
When output is provided, the transcription is written to disk and output_path is added to the response. Use .xlsx for styled spreadsheets with bold headers, or .csv for plain data.

Parameters

ParameterRequiredDefaultDescription
audio_pathYesPath to the audio file
model_sizeNotinyWhisper model size
startNo0Start transcription at this many seconds into the audio
durationNofull fileOnly transcribe this many seconds of audio
outputNoFile path to save transcription (.csv or .xlsx)
translated_textNoEnglish translation to store alongside the original. Used after translating a non-English transcription.

Multilingual

Augent transcribes audio in its original language — Chinese, French, Spanish, Japanese, etc. Translation to English is handled by Claude, which produces far better results than any local translation model. When the transcription language is not English, the response includes:
{
  "language": "zh",
  "translation_available": true,
  "translation_hint": "This audio is in Chinese. To store an English translation..."
}
Translation workflow:
  1. transcribe_audio returns the original-language transcription with translation_available: true
  2. Claude translates the text
  3. Claude calls transcribe_audio again with the same audio_path and translated_text containing the English translation
  4. A sibling (eng) markdown file is created in memory alongside the original
Both versions appear in the Web UI Memory Explorer and are searchable via search_memory.

Memory

  • Transcriptions are stored by file content hash + model size
  • Same file, same model = instant memory hit
  • Same file, different model = new transcription
  • Modified file = new transcription (hash changes)
  • A markdown file is also saved to ~/.augent/memory/transcriptions/
  • Translated transcriptions get a sibling (eng) file (e.g., My Video.md + My Video (eng).md)