OAuth scopes long-running workflows: a production debugging playbook (Google + Notion)

OAuth scopes long-running workflows break differently than normal apps

If you’re building OAuth scopes long-running workflows (AI agents that run for minutes, retry, and touch multiple SaaS tools), OAuth will fail in ways that feel “random”: it worked in the demo, then a background run hits a 401, or a Notion database suddenly becomes “not found.”

This isn’t because OAuth is “bad.” It’s because agent workflows behave like distributed systems: long-lived state, multiple tool calls, concurrency, and partial failure. If you treat auth like a one-time login step, you’ll ship something that disconnects at the worst possible moment.

This tutorial is a production debugging playbook you can copy into your team wiki. It’s written for teams building hosted, reusable automations (the “white-box workflow” style) where reliability is the product.

Why OAuth fails more in agent workflows than in request/response apps

Traditional web apps:

The user is present.
The session is short.
You can redirect to re-auth quickly.

Agent/workflow apps:

Runs can be long-running (minutes to hours).
Steps can run in the background (no user present).
You have retries (sometimes with exponential backoff).
You have multi-tool chains (Drive → Docs → Notion → Slack).
You may have multiple workflows running concurrently for the same user.

That combination changes your auth requirements:

You need offline access (refresh tokens) for anything that must continue without the user in the browser.
You need durable token storage (so restarts don’t force re-consent).
You need run-context binding (so the right token is used for the right user/workspace/workflow).

The 6 OAuth failure modes you’ll actually see (and how they present)

1) Missing or incorrect scopes

Symptoms

Google: API calls fail with 403 / “insufficient permissions”.
Notion: endpoints fail even though the integration looks “connected.”

Root causes

You requested a scope set that doesn’t include the operation you’re doing.
You changed features over time, but never forced a re-consent.
You’re calling a different API than you think (e.g., Drive vs Docs).

Fix

Maintain a scope-to-action map (per connector, per capability).
Add “scope diffing” to your logs (requested vs granted).

2) Refresh token not issued

Symptoms

Works for ~1 hour, then fails when access token expires.
You never see a refresh token stored.

Root causes (Google)

You didn’t request offline access (access_type=offline).
You did request it, but the refresh token is only returned on the first consent for that client-user combination, so you lost it and can’t “get it back” without re-consent.

Fix

Ensure you request offline access.
Store the refresh token in long-lived storage immediately.
If you lost it, force a re-consent with prompt=consent.

3) Refresh token invalidation / rotation / “invalid_grant”

Symptoms

Token refresh fails with invalid_grant.
A subset of users are affected; repro is inconsistent.

Common root causes

The user revoked access.
You exceeded Google’s refresh token issuance limits (older refresh tokens stop working).
Clock skew on your server breaks token exchange (more common than you’d like).

Fix

Treat invalid_grant as a “re-auth required” state, not a transient retry.
Sync server time (NTP).
Don’t request new refresh tokens repeatedly; reuse the stored one.

4) Session cookie vs OAuth token confusion

Symptoms

UI says “connected,” but API calls fail.
Reconnecting in the UI “fixes it” temporarily.

Root causes

Your UI session is alive, but the underlying OAuth credential is missing/expired/revoked.

Fix

In the product, distinguish:
- “Signed into the app” (session)
- “Connector authorized” (OAuth)

5) Workspace/admin policy restrictions

Symptoms

Google Workspace users fail; personal Gmail works (or vice versa).
Failures happen after IT changes policies.

Fix

Detect policy errors and surface a clear message (“Your admin must allow this app/scope”).
Log the user’s account type (consumer vs Workspace) and domain.

6) Wrong principal / wrong tenant (credential mapping bugs)

Symptoms

Files show up in the wrong Drive.
Notion writes to the wrong workspace.
“It works for me” but fails for a teammate.

Root causes

Tokens aren’t strongly bound to: user_id + workspace/domain + connector + workflow/run.

Fix

Implement strict credential isolation and run-context binding (see pattern below).

Step-by-step OAuth debugging runbook (copy/paste checklist)

Use this when a workflow breaks in production.

Step 0 — Reproduce with a minimal workflow

Before you touch code, make the failure small.

One connector
One API call
No branching

Example minimal checks:

Google Drive: list 1 file in the target folder
Google Docs: create a doc, then read it back
Notion: list databases or read a known page

This matters because complex workflows hide the first auth failure behind retries and downstream errors.

Step 1 — Confirm identity (account + workspace)

Log and verify:

Google sub (OpenID subject) or user email (if you collect it)
Workspace domain (if Google Workspace)
Notion workspace ID
The connector account label the user selected (if you allow multiple accounts)

If you can’t answer “which account is this credential for?” you will keep chasing ghosts.

Step 2 — Confirm granted scopes vs required scopes

Maintain a small internal table:

Action	Required scopes
Drive: list files	`drive.metadata.readonly` (or broader)
Drive: create file	`drive.file` or `drive`
Docs: create/update doc	Docs scope(s) + Drive if you move/share

Then log:

scopes you requested
scopes you believe were granted
the action you attempted

Step 3 — Inspect token lifecycle events

Track these events with timestamps:

token_issued
refresh_succeeded
refresh_failed
revoked_detected
reauth_required_set

If you don’t track them, you’ll miss patterns like “refresh fails after 55 minutes” or “revoked 3 days after last run.”

Step 4 — Verify storage and retrieval are deterministic

Most “random disconnects” are storage bugs:

refresh token stored in the wrong row
credential overwritten by a second consent
encryption key mismatch across environments

You want to prove that:

the refresh token is persisted once
every workflow run loads the same credential record

Step 5 — Add structured logs (with redaction)

Log enough to debug without leaking secrets.

A good rule: log token fingerprints, never tokens.

// TypeScript pseudo-code
function fingerprint(secret: string) {
  // return a short, non-reversible identifier
  return sha256(secret).slice(0, 10)
}

logger.info("oauth.refresh_failed", {
  provider: "google",
  user_id,
  credential_id,
  refresh_token_fp: fingerprint(refresh_token),
  error: err.error,
  error_description: err.error_description,
  workflow_run_id,
})

Also log the HTTP status and Google/Notion request IDs if available.

Implementation pattern: auth as a first-class workflow dependency

The core shift is this:

Don’t treat auth as “the setup screen.” Treat auth as a dependency that each workflow run must validate.

Pattern 1 — Preflight auth checks at workflow start

At the beginning of every run:

Load credential record (by user + connector + workspace)
Verify it is not marked reauth_required
Attempt a cheap API call (or a token refresh if near expiry)
Only then proceed

# Python-ish pseudo-code

def preflight_auth(run):
    cred = vault.load(run.user_id, run.connector, run.workspace_id)

    if cred.status == "reauth_required":
        raise ReauthRequired("User must reconnect")

    # If access token is expired (or close), refresh
    if cred.access_token_expires_at < now() + minutes(5):
        try:
            cred = oauth.refresh(cred)
            vault.save(cred)
        except OAuthInvalidGrant:
            vault.mark_reauth_required(cred.id)
            raise ReauthRequired("Token revoked or expired")

    return cred

Pattern 2 — Mid-run reauth gates (only when needed)

Long workflows can cross token boundaries. Instead of refreshing “whenever,” refresh:

when you’re within a safety window (e.g., 5 minutes)
before expensive steps (bulk writes, file creation)

Pattern 3 — Safe failure: halt + ask, don’t partially complete

If a workflow can’t authenticate midway:

stop
mark run as blocked_on_auth
provide a “Reconnect Google / Notion” action
resume from the last idempotent checkpoint

This is where “white-box workflows” shine: users can see exactly where the run stopped and why.

Google-specific quick wins (Drive/Docs)

1) Always request offline access (and store refresh tokens immediately)

Google’s web-server flow supports offline access via access_type=offline. In practice:

you only reliably get a refresh token when you request offline access
you may need to force a re-consent if the user already authorized without offline access

Node.js example:

// google-auth-library style
const authorizationUrl = oauth2Client.generateAuthUrl({
  access_type: "offline",           // required for refresh_token
  include_granted_scopes: true,      // incremental auth
  prompt: "consent",                // force refresh_token if you lost it
  scope: [
    "https://www.googleapis.com/auth/drive.file",
    "https://www.googleapis.com/auth/documents",
  ],
});

Important workflow detail: if you run multiple re-auth flows, you can invalidate older refresh tokens when you hit issuance limits. Make re-auth rare and intentional.

2) Handle `invalid_grant` as a state transition, not a retry

Retrying invalid_grant usually just burns time.

A practical policy:

Refresh fails once with invalid_grant → mark credential reauth_required
Surface a reconnect prompt
Don’t keep the workflow running in a degraded state

3) Prevent “created a .txt instead of a Google Doc” class of bugs

This one is less about OAuth and more about production hardening:

Validate you’re calling the correct endpoint (Drive create vs Docs create)
Validate MIME types before execution
Add a “create → read → update” minimal test to your demo script

Notion-specific quick wins (permissions are the real ‘scope’)

Notion errors often look like auth problems but are actually sharing problems.

The #1 Notion gotcha: the integration must be shared with the page/database

Even with a valid token, Notion will return errors if the target page/database isn’t shared with your integration.

Operationally, add a Notion preflight step:

“Can I read the target database/page?”
If not, show instructions to share it via Add connections in Notion UI

Minimal read preflight (HTTP):

curl -sS https://api.notion.com/v1/users \
  -H "Authorization: Bearer $NOTION_TOKEN" \
  -H "Notion-Version: 2022-06-28" \
  -H "Content-Type: application/json"

If /v1/users works but a database read fails with “not found,” it’s usually sharing.

Detect permission errors early

In workflows, fail fast:

verify the page/database is accessible at the start
don’t wait until step 12 to discover the integration wasn’t added

Token vault + credential isolation (the architecture that stops ‘random’ failures)

If you’re building an agent platform (especially multi-tenant), you want a credential model like:

Credential record is per user + provider + workspace
Workflow runs reference a credential by ID (never “grab latest token for user")
All secrets are encrypted at rest

Example schema:

-- Pseudo-SQL
CREATE TABLE oauth_credentials (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  provider TEXT NOT NULL,              -- google, notion
  workspace_id TEXT NULL,              -- domain/workspace identifier
  scopes TEXT NOT NULL,                -- stored as space-delimited or JSON array
  access_token_ciphertext TEXT NOT NULL,
  access_token_expires_at TIMESTAMP NOT NULL,
  refresh_token_ciphertext TEXT NULL,
  status TEXT NOT NULL DEFAULT 'active', -- active | reauth_required | revoked
  created_at TIMESTAMP NOT NULL,
  updated_at TIMESTAMP NOT NULL
);

CREATE UNIQUE INDEX oauth_cred_unique
ON oauth_credentials(user_id, provider, workspace_id);

This is the backbone of “durable workflow authentication.” Without it, concurrency and long runs will eventually mix credentials.

The 10-minute “Fresh Sign-In Demo” checklist (agencies + sales)

If you demo agent workflows, the fastest way to lose trust is: “Hang on, Google disconnected again.”

Run this script before every demo and before onboarding a client.

Google (Drive/Docs)

Disconnect the connector (so you test the real flow)
Reconnect and confirm the account email/domain
Verify required scopes are present
Run a create → read → update loop:
- create a Google Doc in the target folder
- read it back
- append a line
Wait 65 minutes or force-refresh to confirm refresh token works

Notion

Reconnect integration
Confirm workspace
Share the target database/page with the integration
Run read → write:
- list databases or read a page
- create a new row/page

Workflow platform sanity

Confirm only one run uses the credential at a time (or confirm your refresh logic is concurrency-safe)
Confirm logs show a clean preflight and no hidden retries

Where nNode fits (without making OAuth your full-time job)

nNode (Endnode) is built for hosted, reusable, white-box workflows—the kind you deliver to clients or run repeatedly inside your business. In that world, OAuth reliability isn’t a checkbox; it’s a moat.

The patterns in this post map to what strong workflow platforms should give you by default:

Preflight auth steps you can add to any workflow
Clear “reauth required” checkpoints that pause runs safely
Run logs that make scope/permission issues obvious
Connector hygiene (Google Drive/Docs + Notion) that survives long-running execution

If you’re currently stitching together one-off scripts or “black box tasks,” you can adopt the runbook above today. And if you want these reliability primitives baked into a hosted workflow system, nNode is designed for exactly that: Claude Code-style power, but with integrations and durable execution.

Soft CTA

If you’re building agent automations that touch Google Drive/Docs and Notion—and you want them to keep working after the demo—take a look at nnode.ai. You’ll get a workflow-first approach where auth, retries, and long-running runs are treated as first-class concerns, not afterthoughts.

OAuth scopes long-running workflows break differently than normal apps

Why OAuth fails more in agent workflows than in request/response apps

The 6 OAuth failure modes you’ll actually see (and how they present)

1) Missing or incorrect scopes

2) Refresh token not issued

3) Refresh token invalidation / rotation / “invalid_grant”

4) Session cookie vs OAuth token confusion

5) Workspace/admin policy restrictions

6) Wrong principal / wrong tenant (credential mapping bugs)

Step-by-step OAuth debugging runbook (copy/paste checklist)

Step 0 — Reproduce with a minimal workflow

Step 1 — Confirm identity (account + workspace)

Step 2 — Confirm granted scopes vs required scopes

Step 3 — Inspect token lifecycle events

Step 4 — Verify storage and retrieval are deterministic

Step 5 — Add structured logs (with redaction)

Implementation pattern: auth as a first-class workflow dependency

Pattern 1 — Preflight auth checks at workflow start

Pattern 2 — Mid-run reauth gates (only when needed)

Pattern 3 — Safe failure: halt + ask, don’t partially complete

Google-specific quick wins (Drive/Docs)

1) Always request offline access (and store refresh tokens immediately)

2) Handle invalid_grant as a state transition, not a retry

3) Prevent “created a .txt instead of a Google Doc” class of bugs

Notion-specific quick wins (permissions are the real ‘scope’)

The #1 Notion gotcha: the integration must be shared with the page/database

Detect permission errors early

Token vault + credential isolation (the architecture that stops ‘random’ failures)

The 10-minute “Fresh Sign-In Demo” checklist (agencies + sales)

Google (Drive/Docs)

Notion

Workflow platform sanity

Where nNode fits (without making OAuth your full-time job)

Soft CTA

Build your first AI Agent today

2) Handle `invalid_grant` as a state transition, not a retry