# Practices for Embedding AI Agents in Software
# Sync Facade over Async Core
🎯 The Hook
Choosing between "always sync" and "always async" is a false dilemma. What if your API could return instantly when fast, and gracefully degrade when slow?
🔥 The Problem
Agent processing latency follows a bimodal distribution. Cache hits and lightweight tasks return in milliseconds, but complex reasoning or tool chains stretch to tens of seconds. Always-sync leads to timeouts and connection exhaustion. Always-async forces polling even for sub-second responses.
💡 The Pattern
The Sync Facade always processes internally via an async pipeline. The outward-facing API waits up to a configurable threshold: if the job finishes in time, it returns a 200 with the result; if not, it returns a 202 with a job ID for async retrieval. Clients hit a single unified endpoint without worrying about latency bimodality. The threshold is tuned adaptively based on observed P95/P99 latency trends, not hardcoded.
✅ When to Use
Use when:
- Latency distribution is bimodal (mix of fast and slow completions)
- Existing clients expect a synchronous API contract
- Latency requirements vary per request (chat UI vs. batch)
Don't use when:
- Processing always finishes in a few seconds (use Sync Edge)
- Processing always exceeds 30s (use Durable Async from the start)
⚠️ Pitfalls
- Do not hardcode the sync-wait threshold. Observe P95/P99 trends via tracing and adjust adaptively
- Clients often miss 202 handling. Explicitly define the 202 response schema in your OpenAPI spec to prevent SDK generation gaps
- Distinguish worker crashes from simple timeouts during the sync wait. On worker failure, escalate to 202 immediately rather than waiting for the threshold
🔧 Implementation Approach
- All requests are internally processed via an async queue. The facade layer awaits the result up to a configurable sync-wait threshold, returning 200 with the result on success or 202 with a job_id on timeout
- Tune the sync-wait threshold adaptively by observing P95/P99 latency trends via tracing. Starting points are roughly 5-10s for web APIs and 30s for internal RPCs
- Use SSE or WebSocket for progress notifications after async escalation, with polling (3-5s interval) as a fallback for clients that cannot maintain persistent connections
- Keep the facade layer as a thin adapter with no business logic. The async core reuses the Durable Async Agent checkpoint and resume machinery directly
- Explicitly define the 202 response schema in the OpenAPI spec so that generated client SDKs correctly handle job ID retrieval and result polling
#
AIAgents# #
SoftwareArchitecture#