oauthintegrationsgoogle-drivenotionworkflowsclaude-codeautomation

OAuth scopes long-running workflows: a production debugging playbook (Google + Notion)

nNode Team12 min read

OAuth scopes long-running workflows break differently than normal apps

If you’re building OAuth scopes long-running workflows (AI agents that run for minutes, retry, and touch multiple SaaS tools), OAuth will fail in ways that feel “random”: it worked in the demo, then a background run hits a 401, or a Notion database suddenly becomes “not found.”

This isn’t because OAuth is “bad.” It’s because agent workflows behave like distributed systems: long-lived state, multiple tool calls, concurrency, and partial failure. If you treat auth like a one-time login step, you’ll ship something that disconnects at the worst possible moment.

This tutorial is a production debugging playbook you can copy into your team wiki. It’s written for teams building hosted, reusable automations (the “white-box workflow” style) where reliability is the product.


Why OAuth fails more in agent workflows than in request/response apps

Traditional web apps:

  • The user is present.
  • The session is short.
  • You can redirect to re-auth quickly.

Agent/workflow apps:

  • Runs can be long-running (minutes to hours).
  • Steps can run in the background (no user present).
  • You have retries (sometimes with exponential backoff).
  • You have multi-tool chains (Drive → Docs → Notion → Slack).
  • You may have multiple workflows running concurrently for the same user.

That combination changes your auth requirements:

  • You need offline access (refresh tokens) for anything that must continue without the user in the browser.
  • You need durable token storage (so restarts don’t force re-consent).
  • You need run-context binding (so the right token is used for the right user/workspace/workflow).

The 6 OAuth failure modes you’ll actually see (and how they present)

1) Missing or incorrect scopes

Symptoms

  • Google: API calls fail with 403 / “insufficient permissions”.
  • Notion: endpoints fail even though the integration looks “connected.”

Root causes

  • You requested a scope set that doesn’t include the operation you’re doing.
  • You changed features over time, but never forced a re-consent.
  • You’re calling a different API than you think (e.g., Drive vs Docs).

Fix

  • Maintain a scope-to-action map (per connector, per capability).
  • Add “scope diffing” to your logs (requested vs granted).

2) Refresh token not issued

Symptoms

  • Works for ~1 hour, then fails when access token expires.
  • You never see a refresh token stored.

Root causes (Google)

  • You didn’t request offline access (access_type=offline).
  • You did request it, but the refresh token is only returned on the first consent for that client-user combination, so you lost it and can’t “get it back” without re-consent.

Fix

  • Ensure you request offline access.
  • Store the refresh token in long-lived storage immediately.
  • If you lost it, force a re-consent with prompt=consent.

3) Refresh token invalidation / rotation / “invalid_grant”

Symptoms

  • Token refresh fails with invalid_grant.
  • A subset of users are affected; repro is inconsistent.

Common root causes

  • The user revoked access.
  • You exceeded Google’s refresh token issuance limits (older refresh tokens stop working).
  • Clock skew on your server breaks token exchange (more common than you’d like).

Fix

  • Treat invalid_grant as a “re-auth required” state, not a transient retry.
  • Sync server time (NTP).
  • Don’t request new refresh tokens repeatedly; reuse the stored one.

4) Session cookie vs OAuth token confusion

Symptoms

  • UI says “connected,” but API calls fail.
  • Reconnecting in the UI “fixes it” temporarily.

Root causes

  • Your UI session is alive, but the underlying OAuth credential is missing/expired/revoked.

Fix

  • In the product, distinguish:
    • “Signed into the app” (session)
    • “Connector authorized” (OAuth)

5) Workspace/admin policy restrictions

Symptoms

  • Google Workspace users fail; personal Gmail works (or vice versa).
  • Failures happen after IT changes policies.

Fix

  • Detect policy errors and surface a clear message (“Your admin must allow this app/scope”).
  • Log the user’s account type (consumer vs Workspace) and domain.

6) Wrong principal / wrong tenant (credential mapping bugs)

Symptoms

  • Files show up in the wrong Drive.
  • Notion writes to the wrong workspace.
  • “It works for me” but fails for a teammate.

Root causes

  • Tokens aren’t strongly bound to: user_id + workspace/domain + connector + workflow/run.

Fix

  • Implement strict credential isolation and run-context binding (see pattern below).

Step-by-step OAuth debugging runbook (copy/paste checklist)

Use this when a workflow breaks in production.

Step 0 — Reproduce with a minimal workflow

Before you touch code, make the failure small.

  • One connector
  • One API call
  • No branching

Example minimal checks:

  • Google Drive: list 1 file in the target folder
  • Google Docs: create a doc, then read it back
  • Notion: list databases or read a known page

This matters because complex workflows hide the first auth failure behind retries and downstream errors.

Step 1 — Confirm identity (account + workspace)

Log and verify:

  • Google sub (OpenID subject) or user email (if you collect it)
  • Workspace domain (if Google Workspace)
  • Notion workspace ID
  • The connector account label the user selected (if you allow multiple accounts)

If you can’t answer “which account is this credential for?” you will keep chasing ghosts.

Step 2 — Confirm granted scopes vs required scopes

Maintain a small internal table:

ActionRequired scopes
Drive: list filesdrive.metadata.readonly (or broader)
Drive: create filedrive.file or drive
Docs: create/update docDocs scope(s) + Drive if you move/share

Then log:

  • scopes you requested
  • scopes you believe were granted
  • the action you attempted

Step 3 — Inspect token lifecycle events

Track these events with timestamps:

  • token_issued
  • refresh_succeeded
  • refresh_failed
  • revoked_detected
  • reauth_required_set

If you don’t track them, you’ll miss patterns like “refresh fails after 55 minutes” or “revoked 3 days after last run.”

Step 4 — Verify storage and retrieval are deterministic

Most “random disconnects” are storage bugs:

  • refresh token stored in the wrong row
  • credential overwritten by a second consent
  • encryption key mismatch across environments

You want to prove that:

  • the refresh token is persisted once
  • every workflow run loads the same credential record

Step 5 — Add structured logs (with redaction)

Log enough to debug without leaking secrets.

A good rule: log token fingerprints, never tokens.

// TypeScript pseudo-code
function fingerprint(secret: string) {
  // return a short, non-reversible identifier
  return sha256(secret).slice(0, 10)
}

logger.info("oauth.refresh_failed", {
  provider: "google",
  user_id,
  credential_id,
  refresh_token_fp: fingerprint(refresh_token),
  error: err.error,
  error_description: err.error_description,
  workflow_run_id,
})

Also log the HTTP status and Google/Notion request IDs if available.


Implementation pattern: auth as a first-class workflow dependency

The core shift is this:

Don’t treat auth as “the setup screen.” Treat auth as a dependency that each workflow run must validate.

Pattern 1 — Preflight auth checks at workflow start

At the beginning of every run:

  1. Load credential record (by user + connector + workspace)
  2. Verify it is not marked reauth_required
  3. Attempt a cheap API call (or a token refresh if near expiry)
  4. Only then proceed
# Python-ish pseudo-code

def preflight_auth(run):
    cred = vault.load(run.user_id, run.connector, run.workspace_id)

    if cred.status == "reauth_required":
        raise ReauthRequired("User must reconnect")

    # If access token is expired (or close), refresh
    if cred.access_token_expires_at < now() + minutes(5):
        try:
            cred = oauth.refresh(cred)
            vault.save(cred)
        except OAuthInvalidGrant:
            vault.mark_reauth_required(cred.id)
            raise ReauthRequired("Token revoked or expired")

    return cred

Pattern 2 — Mid-run reauth gates (only when needed)

Long workflows can cross token boundaries. Instead of refreshing “whenever,” refresh:

  • when you’re within a safety window (e.g., 5 minutes)
  • before expensive steps (bulk writes, file creation)

Pattern 3 — Safe failure: halt + ask, don’t partially complete

If a workflow can’t authenticate midway:

  • stop
  • mark run as blocked_on_auth
  • provide a “Reconnect Google / Notion” action
  • resume from the last idempotent checkpoint

This is where “white-box workflows” shine: users can see exactly where the run stopped and why.


Google-specific quick wins (Drive/Docs)

1) Always request offline access (and store refresh tokens immediately)

Google’s web-server flow supports offline access via access_type=offline. In practice:

  • you only reliably get a refresh token when you request offline access
  • you may need to force a re-consent if the user already authorized without offline access

Node.js example:

// google-auth-library style
const authorizationUrl = oauth2Client.generateAuthUrl({
  access_type: "offline",           // required for refresh_token
  include_granted_scopes: true,      // incremental auth
  prompt: "consent",                // force refresh_token if you lost it
  scope: [
    "https://www.googleapis.com/auth/drive.file",
    "https://www.googleapis.com/auth/documents",
  ],
});

Important workflow detail: if you run multiple re-auth flows, you can invalidate older refresh tokens when you hit issuance limits. Make re-auth rare and intentional.

2) Handle invalid_grant as a state transition, not a retry

Retrying invalid_grant usually just burns time.

A practical policy:

  • Refresh fails once with invalid_grant → mark credential reauth_required
  • Surface a reconnect prompt
  • Don’t keep the workflow running in a degraded state

3) Prevent “created a .txt instead of a Google Doc” class of bugs

This one is less about OAuth and more about production hardening:

  • Validate you’re calling the correct endpoint (Drive create vs Docs create)
  • Validate MIME types before execution
  • Add a “create → read → update” minimal test to your demo script

Notion-specific quick wins (permissions are the real ‘scope’)

Notion errors often look like auth problems but are actually sharing problems.

The #1 Notion gotcha: the integration must be shared with the page/database

Even with a valid token, Notion will return errors if the target page/database isn’t shared with your integration.

Operationally, add a Notion preflight step:

  • “Can I read the target database/page?”
  • If not, show instructions to share it via Add connections in Notion UI

Minimal read preflight (HTTP):

curl -sS https://api.notion.com/v1/users \
  -H "Authorization: Bearer $NOTION_TOKEN" \
  -H "Notion-Version: 2022-06-28" \
  -H "Content-Type: application/json"

If /v1/users works but a database read fails with “not found,” it’s usually sharing.

Detect permission errors early

In workflows, fail fast:

  • verify the page/database is accessible at the start
  • don’t wait until step 12 to discover the integration wasn’t added

Token vault + credential isolation (the architecture that stops ‘random’ failures)

If you’re building an agent platform (especially multi-tenant), you want a credential model like:

  • Credential record is per user + provider + workspace
  • Workflow runs reference a credential by ID (never “grab latest token for user")
  • All secrets are encrypted at rest

Example schema:

-- Pseudo-SQL
CREATE TABLE oauth_credentials (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  provider TEXT NOT NULL,              -- google, notion
  workspace_id TEXT NULL,              -- domain/workspace identifier
  scopes TEXT NOT NULL,                -- stored as space-delimited or JSON array
  access_token_ciphertext TEXT NOT NULL,
  access_token_expires_at TIMESTAMP NOT NULL,
  refresh_token_ciphertext TEXT NULL,
  status TEXT NOT NULL DEFAULT 'active', -- active | reauth_required | revoked
  created_at TIMESTAMP NOT NULL,
  updated_at TIMESTAMP NOT NULL
);

CREATE UNIQUE INDEX oauth_cred_unique
ON oauth_credentials(user_id, provider, workspace_id);

This is the backbone of “durable workflow authentication.” Without it, concurrency and long runs will eventually mix credentials.


The 10-minute “Fresh Sign-In Demo” checklist (agencies + sales)

If you demo agent workflows, the fastest way to lose trust is: “Hang on, Google disconnected again.”

Run this script before every demo and before onboarding a client.

Google (Drive/Docs)

  1. Disconnect the connector (so you test the real flow)
  2. Reconnect and confirm the account email/domain
  3. Verify required scopes are present
  4. Run a create → read → update loop:
    • create a Google Doc in the target folder
    • read it back
    • append a line
  5. Wait 65 minutes or force-refresh to confirm refresh token works

Notion

  1. Reconnect integration
  2. Confirm workspace
  3. Share the target database/page with the integration
  4. Run read → write:
    • list databases or read a page
    • create a new row/page

Workflow platform sanity

  • Confirm only one run uses the credential at a time (or confirm your refresh logic is concurrency-safe)
  • Confirm logs show a clean preflight and no hidden retries

Where nNode fits (without making OAuth your full-time job)

nNode (Endnode) is built for hosted, reusable, white-box workflows—the kind you deliver to clients or run repeatedly inside your business. In that world, OAuth reliability isn’t a checkbox; it’s a moat.

The patterns in this post map to what strong workflow platforms should give you by default:

  • Preflight auth steps you can add to any workflow
  • Clear “reauth required” checkpoints that pause runs safely
  • Run logs that make scope/permission issues obvious
  • Connector hygiene (Google Drive/Docs + Notion) that survives long-running execution

If you’re currently stitching together one-off scripts or “black box tasks,” you can adopt the runbook above today. And if you want these reliability primitives baked into a hosted workflow system, nNode is designed for exactly that: Claude Code-style power, but with integrations and durable execution.


Soft CTA

If you’re building agent automations that touch Google Drive/Docs and Notion—and you want them to keep working after the demo—take a look at nnode.ai. You’ll get a workflow-first approach where auth, retries, and long-running runs are treated as first-class concerns, not afterthoughts.

Build your first AI Agent today

Join the waiting list for nNode and start automating your workflows with natural language.

Get Started