Two-tier auth for agents is the difference between a workflow that mostly works in demos and one that survives production: retries, partial reruns, token expiry mid-run, and multiple clients sharing the same tool server.
If you’re building Claude Skills (or any MCP/tool-based agent) that touches Google Drive, Notion, Slack, CRMs, or internal APIs, you’ll quickly hit a mismatch:
- The internet’s auth model assumes a single user clicking a consent screen.
- Agentic workflows need to act as an org and as a specific end user—sometimes in the same run.
This post lays out a practical pattern we use and recommend at nNode: two tiers of authentication with explicit auth context on every tool call—plus the operator-focused observability needed to debug failures without leaking secrets.
The real problem: agents don’t fit “one user + one token”
A typical multi-step workflow might:
- Read a folder of docs (broad, org-owned access)
- Draft content (no external auth)
- Ask a human for approval (user-scoped)
- Publish to a CMS (often admin-scoped)
- Notify in Slack (could be bot-scoped or user-scoped)
If you choose only app credentials, you’ll eventually ship something that:
- Overreaches (“the bot can publish anything, anywhere”)
- Can’t attribute actions to a user
- Becomes un-auditable during incidents
If you choose only user OAuth, you’ll end up with:
- Constant breakage (users leave, revoke consent, scopes drift)
- Workflows that stall because the “right user” isn’t online to re-consent
- Operators who can’t reproduce auth bugs
What “two-tier auth for agents” actually means
Two-tier auth for agents splits tool access into two intentionally different credential types:
Tier A: App / org credentials (the “platform identity”)
Use these when the workflow needs stable, org-level access.
Examples:
- Service accounts (e.g., workspace-level access)
- Bot tokens
- Workspace API keys
- A shared “integration user” managed by the org
Tier B: User OAuth (the “end-user identity”)
Use this when actions must be tied to a specific person’s consent and permissions.
Examples:
- Posting “as the user”
- Accessing a user’s private spaces
- Performing actions that require an approver’s authority
The key design point: every tool call is executed with an explicit AuthContext that says which tier is being used, for which tenant, and optionally for which user.
Decision matrix: app auth vs user OAuth (agent edition)
Use this as a starting point. The goal is least privilege plus operational stability.
| Tool action | Recommended tier | Why |
|---|---|---|
| Read org-owned content library (Drive/Docs/Notion) | App/org | Stable access; avoids “user left the company” failures |
| Draft in a shared workspace | App/org | Predictable permissions; easier to debug |
| Publish to production (CMS, GitHub, marketing automation) | App/org + approval gate | Publishing is high-risk; keep a controlled identity |
| Comment/approve (Docs, Notion, PR review) | User OAuth | Action should be attributable to an individual |
| Send “FYI” notifications | App/org (bot) | Low-risk; keeps things simple |
| Send messages that must come from the user | User OAuth | Consent + attribution |
| Access user private folders/spaces | User OAuth | The app shouldn’t see everything |
A useful heuristic:
- “Source of truth reads” → Tier A
- “Human intent writes” → Tier B
Reference architecture (production-ready)
Here’s a clean way to implement it for agent tools, Claude Skills, or MCP servers.
Components
- Workflow runner: orchestrates steps, retries, and partial reruns
- Tool gateway (or MCP server): the only component allowed to call external APIs
- Credential store: encrypted at rest; per-tenant isolation
- Policy engine: evaluates whether a given AuthContext may perform a given action
- Observability: traces + audit logs with safe auth metadata
Minimal AuthContext spec
// Keep this small, explicit, and present on *every* tool call.
export type AuthTier = "APP" | "USER";
export interface AuthContext {
tenant_id: string; // required for multi-tenant isolation
auth_tier: AuthTier; // APP or USER
integration_id: string; // which connection (e.g., drive-prod, notion-team)
// USER tier only
user_id?: string; // your internal user id
oauth_subject?: string; // provider subject (optional)
// helpful for policy + debug
scopes?: string[]; // granted scopes (never log raw tokens)
purpose?: string; // e.g. "publish_blog_post" or "sync_content"
}
Tool-call envelope (what your agent actually sends)
{
"tool": "google_drive.listFiles",
"auth": {
"tenant_id": "t_acme",
"auth_tier": "APP",
"integration_id": "drive_content_library",
"scopes": ["drive.readonly"],
"purpose": "fetch_source_docs"
},
"input": {
"folderId": "1AbC...",
"pageSize": 50
},
"idempotency_key": "run_01924-step_07"
}
The agent doesn’t decide “which token string to use.” It declares intent and context; the gateway resolves credentials.
Implementation tips that save weeks
1) Treat refresh tokens like production secrets (because they are)
Refresh tokens can bypass many interactive controls (SSO/MFA prompts don’t happen during refresh). So:
- Encrypt tokens at rest
- Separate key management from the database
- Implement revocation and rotation runbooks
2) Prevent refresh-race outages (multi-run concurrency is real)
In agent platforms, it’s normal to have multiple workflows hitting the same integration concurrently. If two workers refresh the same token simultaneously, you can lose the “latest” refresh token and force re-auth.
Use a per-connection lock (or transactional compare-and-swap) around refresh.
async function getAccessToken(connectionId: string): Promise<string> {
const token = await tokenStore.get(connectionId);
if (!token.isExpired()) return token.accessToken;
return withLock(`oauth-refresh:${connectionId}`, async () => {
const latest = await tokenStore.get(connectionId);
if (!latest.isExpired()) return latest.accessToken;
const refreshed = await oauth.refresh(latest.refreshToken);
await tokenStore.save(connectionId, refreshed); // atomic update
return refreshed.accessToken;
});
}
3) Make retries + partial reruns safe with idempotency
Agent workflows replay steps—especially if you add checkpointing (we’re building this deeply into nNode). If the tool call is “create record” or “publish page,” retries can duplicate side effects.
- Use idempotency keys for writes
- Prefer “upsert” APIs where available
- Store an external “result pointer” (e.g., created page ID) in your run state
4) Codify “integration roles” instead of ad-hoc scopes
Don’t let every workflow request arbitrary scopes.
Define a few roles per integration:
reader(read-only)writer(safe writes)publisher(high-risk writes)
Then map roles → scopes in one place (policy + UI), and use that mapping to validate tool calls.
Observability: debug auth failures without leaking secrets
If you don’t make auth visible in traces, operators end up guessing. If you log too much, you leak secrets. The compromise is structured metadata.
Log fields like:
tenant_idintegration_idauth_tier(APP vs USER)scopes_hash(hash the sorted scopes list)token_age_bucket(e.g.,0-5m,5-30m,30m+)provider_error_code
A quick runbook for common failures
- 401 Unauthorized: token expired, refresh failed, wrong audience/client, clock skew
- 403 Forbidden: token is valid but missing scope, resource ACL denies access, user removed from workspace
- 429 Too Many Requests: rate limits—apply backoff and consider batching (especially important for agents)
In nNode-style execution traces, this is where you want “one glance” clarity: which tier, which integration, which step, what changed since last run.
Security checklist (fast, actionable)
- Enforce tenant isolation at the credential-store key level (
tenant_idis not optional) - Default to least privilege (reader/writer/publisher roles)
- Add human approval gates before high-risk actions (publishing, mass updates, deletions)
- Maintain audit logs that attribute writes to APP identity vs USER identity
- Support revocation (per user, per integration, per tenant)
How nNode approaches this (and why it matters)
nNode is built for agentic workflows that have to be operable: you shouldn’t lose a day because a token expired mid-run or because a trace hides the auth context.
Two-tier auth pairs naturally with the things production teams care about:
- Observability (clear traces for “what identity did this step use?”)
- Reliability patterns like retries, idempotency, and checkpointing
- A path to user-managed credentials (“bring your own keys”) without turning your platform into an un-debuggable maze
If you’re building Claude Skills or tool servers and want a workflow runner that’s designed around these realities—multi-tenant auth, debugging, and safe integration execution—take a look at nNode.
Soft CTA: If this pattern matches what you’re wrestling with, try implementing the AuthContext contract above in your next tool, and see how much easier debugging gets. When you’re ready to run these workflows with production-grade traces and reliability, nNode can help—visit nnode.ai.