OAuth2 engineering › Token lifecycle › Token refresh race condition

OAuth token refresh race condition — detect, diagnose, and fix

Multiple concurrent callers each notice the access token is expired and each fires a POST to the token endpoint with the same refresh token. One wins. The rest get invalid_grant. Here's why, how to spot it, and the complete fix.

Vendor-neutral · copy-paste code · no trackers, no cookies. Last updated 2026-05-31.

TL;DR

The race: N callers each check the token, each sees "expired", each POST the same refresh_token to the provider. Provider processes caller A first: issues new access token, rotates (revokes) R0. Callers B–N arrive with R0 — now revoked → 400 invalid_grant.

Log signal: a tight cluster of invalid_grant errors (same user, milliseconds apart), one successful refresh, all others failed — OR refresh_token_reused on Okta/Auth0/Salesforce.

Fix: single-flight in-process (one shared Promise) + cross-process lock if credential is shared on disk + atomic writes + rotation-merge. See the full guide for code.

The race, step by step

OAuth 2.0 refresh-token rotation — now the default at Okta, Auth0, Microsoft, Salesforce, and recommended by the Security BCP for public clients — makes each refresh token single-use. The moment the provider processes a successful refresh it invalidates the token just consumed and issues a new one. This is the security property that makes concurrent refresh dangerous:

t0  access token is expired (or within the skew window)
t1  caller A reads creds, sees "expired", POSTs refresh_token=R0
t2  caller B reads creds, sees "expired", POSTs refresh_token=R0   // same token!
t3  provider processes A → issues access_token A1, rotates R0→R1, REVOKES R0
t4  provider processes B → R0 is revoked → 400 invalid_grant
                                           (on Okta/Auth0/Salesforce: refresh_token_reused
                                           → entire token family revoked; user logged out everywhere)

Both callers followed the OAuth spec. The defect is that they did it concurrently with a single-use token. The symptoms: random invalid_grant errors, surprise re-login prompts, "works locally, fails under load."

How to detect it in logs

The race produces distinctive log patterns:

  • Timestamp cluster — multiple invalid_grant or 401 Unauthorized entries within the same 200ms window for the same client_id or user. One success, many failures at the same instant.
  • refresh_token_reused — on Okta, Auth0, or Salesforce, this error code (instead of or alongside invalid_grant) means reuse detection fired. A single concurrent race can trigger this and revoke the entire token family.
  • Transient failures that self-resolve — the failures go away on retry (because another process already wrote a fresh token), which masks the bug until traffic increases.
  • Corrupted or zero-byte token file — if multiple processes write to the same credential file without atomic persistence (temp file + rename), you may see truncated JSON or a lost refresh_token field, producing a different failure mode.
Don't confuse with legitimate invalid_grant: a user revoking access, changing their password, or a token idle-expiring also produces invalid_grant — but that one is permanent (retry doesn't help) and not correlated with concurrent requests. The race version is transient and correlated with bursts.

The fix — three layers

Apply all layers that match your deployment topology:

  1. In-process single-flight — one shared in-flight Promise; every concurrent caller awaits it. Covers the common case: one server, one worker, one process. See the single-flight code on the main guide.
  2. Cross-process lock — if several CLIs, workers, or agents share one credential file: O_CREAT|O_EXCL lock file around the refresh; re-read after acquiring the lock and short-circuit if a sibling already rotated; write atomically (temp file + rename). See cross-process code.
  3. Rotation-merge on persist — if the provider omits refresh_token in the response (Google does this), keep the old one. If it returns a new one (Okta/Auth0/Microsoft), save it. See rotation-merge code.

One library that packages all three layers

If you'd rather not re-derive the pattern from scratch, refresh-guard is a zero-dependency MIT library that ships in-process single-flight, correct rotation-merge, and atomic file persistence as a single installable primitive. It does not bundle a cross-process lock (that depends on your lock backend — file, Redis, DB); the guide covers that layer separately.

Drop-in for the in-process case

npm i refresh-guard
import { createTokenManager, fileStore } from "refresh-guard";

const tokens = createTokenManager({
  provider: "okta",   // picks the right quirks (rotation + reuse detection)
  store: fileStore("~/.myapp/creds.json"),
  refresh: async (prev) => fetchNewToken(prev.refresh_token)
});

const access = await tokens.getValidToken();  // exactly ONE refresh under any concurrency

Disclosure: refresh-guard is by the same team that wrote this guide. It solves the in-process race. For the multi-process case, combine with the file-lock pattern. Source on GitHub · npm i refresh-guard.

FAQ

What is an OAuth token refresh race condition?

When N concurrent callers each detect that the access token is expired and each POST the same refresh token to the provider. Only the first refresh succeeds; the provider revokes the refresh token it just processed. Every other caller then presents a revoked token and gets invalid_grant. The bug appears under any concurrency: a page loading several API calls in parallel, a worker pool, or multiple CLIs sharing one credential file.

How do I detect a token refresh race condition in logs?

Look for a tight timestamp cluster: multiple invalid_grant or refresh_token_reused errors from the same client_id within 200ms, with exactly one preceding successful refresh. On providers with reuse detection (Okta, Auth0, Salesforce) the error may be refresh_token_reused rather than invalid_grant. The failures are transient — a retry fired a moment later may succeed because another caller already wrote a fresh token.

What is single-flight for OAuth token refresh?

The first caller that sees the access token expired starts the refresh and stores the in-flight Promise in a shared variable. Every other caller within the same process awaits that same Promise instead of starting its own refresh. When the one refresh completes, all waiters get the result. The Promise is cleared in a finally block so the next expiry event can trigger a fresh refresh.

When do I need a cross-process lock instead of single-flight?

Single-flight only works within one process. If multiple OS processes share one credential file — several CLI instances, a process pool, containers on the same host — each process has its own in-memory state and can still race. Add an inter-process lock (O_EXCL lock file or flock), re-read the token after acquiring it (the previous holder may have already rotated), and write atomically (temp file + rename).

Does the race only happen with rotating refresh tokens?

The invalid_grant failure requires rotation (provider revokes the token after use). Without rotation, concurrent refreshes may be wasteful but not error-producing. However, if multiple processes write the same credential file without atomic persistence, you can still get file corruption (truncated JSON, lost refresh token field) even without rotation. Rotation just makes the concurrency bug visibly fatal instead of subtly corrupting.

Can a retry loop fix the race condition?

A retry may mask it at low concurrency (by which time another caller has written a fresh token). At high concurrency, retries amplify the problem — each retry is a new concurrent caller. The correct approach: on invalid_grant, re-read the stored token first; if another caller already rotated it, use that fresh token. Only re-auth the user if the token is genuinely revoked (permanent failure, not transient race).