Two-Tier Auth for Agents: App Credentials vs User OAuth (A Production Pattern for Multi-Tenant Tooling)

Two-tier auth for agents is the difference between a workflow that mostly works in demos and one that survives production: retries, partial reruns, token expiry mid-run, and multiple clients sharing the same tool server.

If you’re building Claude Skills (or any MCP/tool-based agent) that touches Google Drive, Notion, Slack, CRMs, or internal APIs, you’ll quickly hit a mismatch:

The internet’s auth model assumes a single user clicking a consent screen.
Agentic workflows need to act as an org and as a specific end user—sometimes in the same run.

This post lays out a practical pattern we use and recommend at nNode: two tiers of authentication with explicit auth context on every tool call—plus the operator-focused observability needed to debug failures without leaking secrets.

The real problem: agents don’t fit “one user + one token”

A typical multi-step workflow might:

Read a folder of docs (broad, org-owned access)
Draft content (no external auth)
Ask a human for approval (user-scoped)
Publish to a CMS (often admin-scoped)
Notify in Slack (could be bot-scoped or user-scoped)

If you choose only app credentials, you’ll eventually ship something that:

Overreaches (“the bot can publish anything, anywhere”)
Can’t attribute actions to a user
Becomes un-auditable during incidents

If you choose only user OAuth, you’ll end up with:

Constant breakage (users leave, revoke consent, scopes drift)
Workflows that stall because the “right user” isn’t online to re-consent
Operators who can’t reproduce auth bugs

What “two-tier auth for agents” actually means

Two-tier auth for agents splits tool access into two intentionally different credential types:

Tier A: App / org credentials (the “platform identity”)

Use these when the workflow needs stable, org-level access.

Examples:

Service accounts (e.g., workspace-level access)
Bot tokens
Workspace API keys
A shared “integration user” managed by the org

Tier B: User OAuth (the “end-user identity”)

Use this when actions must be tied to a specific person’s consent and permissions.

Examples:

Posting “as the user”
Accessing a user’s private spaces
Performing actions that require an approver’s authority

The key design point: every tool call is executed with an explicit AuthContext that says which tier is being used, for which tenant, and optionally for which user.

Decision matrix: app auth vs user OAuth (agent edition)

Use this as a starting point. The goal is least privilege plus operational stability.

Tool action	Recommended tier	Why
Read org-owned content library (Drive/Docs/Notion)	App/org	Stable access; avoids “user left the company” failures
Draft in a shared workspace	App/org	Predictable permissions; easier to debug
Publish to production (CMS, GitHub, marketing automation)	App/org + approval gate	Publishing is high-risk; keep a controlled identity
Comment/approve (Docs, Notion, PR review)	User OAuth	Action should be attributable to an individual
Send “FYI” notifications	App/org (bot)	Low-risk; keeps things simple
Send messages that must come from the user	User OAuth	Consent + attribution
Access user private folders/spaces	User OAuth	The app shouldn’t see everything

A useful heuristic:

“Source of truth reads” → Tier A
“Human intent writes” → Tier B

Reference architecture (production-ready)

Here’s a clean way to implement it for agent tools, Claude Skills, or MCP servers.

Components

Workflow runner: orchestrates steps, retries, and partial reruns
Tool gateway (or MCP server): the only component allowed to call external APIs
Credential store: encrypted at rest; per-tenant isolation
Policy engine: evaluates whether a given AuthContext may perform a given action
Observability: traces + audit logs with safe auth metadata

Minimal `AuthContext` spec

// Keep this small, explicit, and present on *every* tool call.
export type AuthTier = "APP" | "USER";

export interface AuthContext {
  tenant_id: string;          // required for multi-tenant isolation
  auth_tier: AuthTier;        // APP or USER
  integration_id: string;     // which connection (e.g., drive-prod, notion-team)

  // USER tier only
  user_id?: string;           // your internal user id
  oauth_subject?: string;     // provider subject (optional)

  // helpful for policy + debug
  scopes?: string[];          // granted scopes (never log raw tokens)
  purpose?: string;           // e.g. "publish_blog_post" or "sync_content"
}

Tool-call envelope (what your agent actually sends)

{
  "tool": "google_drive.listFiles",
  "auth": {
    "tenant_id": "t_acme",
    "auth_tier": "APP",
    "integration_id": "drive_content_library",
    "scopes": ["drive.readonly"],
    "purpose": "fetch_source_docs"
  },
  "input": {
    "folderId": "1AbC...",
    "pageSize": 50
  },
  "idempotency_key": "run_01924-step_07"
}

The agent doesn’t decide “which token string to use.” It declares intent and context; the gateway resolves credentials.

Implementation tips that save weeks

1) Treat refresh tokens like production secrets (because they are)

Refresh tokens can bypass many interactive controls (SSO/MFA prompts don’t happen during refresh). So:

Encrypt tokens at rest
Separate key management from the database
Implement revocation and rotation runbooks

2) Prevent refresh-race outages (multi-run concurrency is real)

In agent platforms, it’s normal to have multiple workflows hitting the same integration concurrently. If two workers refresh the same token simultaneously, you can lose the “latest” refresh token and force re-auth.

Use a per-connection lock (or transactional compare-and-swap) around refresh.

async function getAccessToken(connectionId: string): Promise<string> {
  const token = await tokenStore.get(connectionId);
  if (!token.isExpired()) return token.accessToken;

  return withLock(`oauth-refresh:${connectionId}`, async () => {
    const latest = await tokenStore.get(connectionId);
    if (!latest.isExpired()) return latest.accessToken;

    const refreshed = await oauth.refresh(latest.refreshToken);
    await tokenStore.save(connectionId, refreshed); // atomic update
    return refreshed.accessToken;
  });
}

3) Make retries + partial reruns safe with idempotency

Agent workflows replay steps—especially if you add checkpointing (we’re building this deeply into nNode). If the tool call is “create record” or “publish page,” retries can duplicate side effects.

Use idempotency keys for writes
Prefer “upsert” APIs where available
Store an external “result pointer” (e.g., created page ID) in your run state

4) Codify “integration roles” instead of ad-hoc scopes

Don’t let every workflow request arbitrary scopes.

Define a few roles per integration:

reader (read-only)
writer (safe writes)
publisher (high-risk writes)

Then map roles → scopes in one place (policy + UI), and use that mapping to validate tool calls.

Observability: debug auth failures without leaking secrets

If you don’t make auth visible in traces, operators end up guessing. If you log too much, you leak secrets. The compromise is structured metadata.

Log fields like:

tenant_id
integration_id
auth_tier (APP vs USER)
scopes_hash (hash the sorted scopes list)
token_age_bucket (e.g., 0-5m, 5-30m, 30m+)
provider_error_code

A quick runbook for common failures

401 Unauthorized: token expired, refresh failed, wrong audience/client, clock skew
403 Forbidden: token is valid but missing scope, resource ACL denies access, user removed from workspace
429 Too Many Requests: rate limits—apply backoff and consider batching (especially important for agents)

In nNode-style execution traces, this is where you want “one glance” clarity: which tier, which integration, which step, what changed since last run.

Security checklist (fast, actionable)

Enforce tenant isolation at the credential-store key level (tenant_id is not optional)
Default to least privilege (reader/writer/publisher roles)
Add human approval gates before high-risk actions (publishing, mass updates, deletions)
Maintain audit logs that attribute writes to APP identity vs USER identity
Support revocation (per user, per integration, per tenant)

How nNode approaches this (and why it matters)

nNode is built for agentic workflows that have to be operable: you shouldn’t lose a day because a token expired mid-run or because a trace hides the auth context.

Two-tier auth pairs naturally with the things production teams care about:

Observability (clear traces for “what identity did this step use?”)
Reliability patterns like retries, idempotency, and checkpointing
A path to user-managed credentials (“bring your own keys”) without turning your platform into an un-debuggable maze

If you’re building Claude Skills or tool servers and want a workflow runner that’s designed around these realities—multi-tenant auth, debugging, and safe integration execution—take a look at nNode.

Soft CTA: If this pattern matches what you’re wrestling with, try implementing the AuthContext contract above in your next tool, and see how much easier debugging gets. When you’re ready to run these workflows with production-grade traces and reliability, nNode can help—visit nnode.ai.