Agathe Agent: French Speech Therapy Reports on Your Machine with Ollama

Speech therapists in France and Switzerland often live inside Word: session notes, initial assessments, renewal reports, school coordination. The writing is high stakes (tone, structure, traceability) and high friction (hours reformatting the same clinical skeleton). Cloud assistants are a hard sell when files contain identifiable minors and when clinics simply do not want patient text on a vendor server.

Agathe Agent is my latest side project, still work in progress: a local-only assistant that turns therapist notes (and, in renewal mode, prior reports) into structured French reports aligned with orthophonie conventions. It runs against Ollama over HTTP, uses a small or mid-size open model you choose via OLLAMA_MODEL, and ships with a CLI plus a Streamlit UI for quick trials. The design goal is not “replace the therapist.” It is to draft a consistent first version the professional edits, signs, and owns.

This article walks through the architecture as it exists today: DOCX to text, skill-driven prompts, JSON-shaped generation, Pydantic validation, DOCX export, and an optional self-check pass.

Prerequisites

Python 3.11+ and uv (uv sync from the project root).
Ollama installed, ollama serve reachable at http://localhost:11434.
A pulled model, for example ollama pull mistral-nemo:12b, and a root .env with OLLAMA_MODEL=... (see .env.example in the repo).
Patient layout: under each patient folder, notes/ with clinical notes. For renouvellement, prior material under rapport_initial/ or reports/ after extraction.

The problem: confidentiality, repetition, and two different report modes

Bilan initial and bilan de renouvellement are not the same document. An initial report anchors history and first observations. A renewal report must contrast earlier functioning with current session evidence without duplicating boilerplate the law and practice already regulate. Agathe encodes that split in prompt logic plus a Markdown skill file (skills/redaction-rapport-orthophonique.md) with front matter and long-form rules (factuel, neutre, français, no invented names, minimum depth per section when sources exist).

The README states the product principle plainly: local-only, no cloud API, no telemetry.

Architecture: not “infinite agent,” a staged pipeline

The heart is run_pipeline in harness/pipeline.py. It is intentionally a single structured generation followed by file writes, not an open-ended tool loop. Stages surface to the UI through on_phase:

lecture → rédaction → validation → sauvegarde → vérification (optional self-check)

DOCX path: extract_patient_docx prepares text the pipeline reads from notes/*.txt (and prior reports as .txt under the expected dirs). The Streamlit app uploads .docx into a temp tree that mirrors that layout.

The LLM client: httpx to Ollama, low temperature, bounded context

LLMClient in harness/llm.py centralizes host, temperature, timeout, and num_ctx / num_predict defaults (overridable via environment). Temperature is set low for factual clinical prose.

# harness/llm.py
payload: dict[str, Any] = {
    "model": self.model,
    "messages": messages,
    "stream": False,
    "options": {
        "temperature": self.temperature,
        "num_ctx": LLM_NUM_CTX,
        "num_predict": LLM_NUM_PREDICT,
    },
}

ping() checks reachability and that the configured model appears in ollama list, with a small tolerance on tag suffixes. That is the kind of fail-fast UX you want before burning a long prompt build.

The pipeline: JSON contract, then Pydantic, then documents

After build_messages(...), the code calls Ollama with format="json", then validates with SpeechTherapyReport.model_validate_json(raw). If validation fails, the user gets a clear error instead of a silently broken DOCX.

# harness/pipeline.py
raw, tokens_in, tokens_out = _stream_or_call(llm, messages, on_token)
report = SpeechTherapyReport.model_validate_json(raw)
result = write_report(report=report, output_dir=output_dir, title=_TITLES.get(mode))

When run_self_check is true, a second chat call sends the rendered Markdown report back with build_self_check_prompt, again requesting JSON, and stores parsed feedback for the CLI to print (conformité, verdict, problem list).

The Streamlit path (app.py) wires the same run_pipeline with on_phase driving the stepper UI and on_token appending to a rolling buffer for a monospace preview.

# app.py
result = run_pipeline(
    llm=llm,
    skill_path=SKILL_PATH,
    mode=report_mode,
    patient_dir=str(patient_dir),
    output_dir=str(out_dir),
    run_self_check=False,
    on_phase=lambda phase: progress.update({"phase": phase}),
    on_token=lambda chunk: progress.update({"stream": progress["stream"] + chunk}),
)

app.py also tries to ollama serve if the tags endpoint is down, which lowers friction for therapists who are not used to managing daemons.

Try it out

CLI:

uv run python -m harness.main --patient patient_example/Robert --mode initial

Expect spinner phases, then paths to rapport.md and rapport.docx under an outputs/ timestamp folder.

Streamlit: uv run streamlit run app.py, upload one or more notes .docx files, pick Bilan initial or Bilan de renouvellement, add the prior report upload if renewal, generate, download DOCX.

“Lightweight” local models in practice

The README suggests Mistral Nemo 12B as an example. llm.py comments mention tuning OLLAMA_NUM_CTX and OLLAMA_NUM_PREDICT when prompts truncate or JSON cuts off. For a work in progress, that is the real engineering surface: swap the model string, watch token counts on the success screen in Streamlit, and decide whether a smaller quantized model is “good enough” for your average note bundle on a clinic laptop. Agathe keeps the interface narrow so you can point the same code at another OpenAI-compatible local server later without rewriting the pipeline.

Conclusion

Agathe Agent is where I am spending time on applied local AI: French clinical prose, strict document structure, on-device inference, and a codebase small enough to reason about during evenings and weekends. The next milestones are mostly product hygiene: keep README and CLI flags in sync, harden evaluation sets with synthetic patients, and document which models pass validation reliably on which hardware.

If you are a therapist or clinic reading this: treat any output as a draft until you have internal governance you trust.