Keychain-unavailable environments fail with a raw error and never surface the FJ_TOKEN fallback #93

Closed
opened 2026-06-10 21:20:08 +00:00 by stephen · 2 comments
Owner

Observation

When FJ_TOKEN is unset and the OS keychain is unreachable (locked Login keychain over SSH, a headless Linux container with no Secret Service / D-Bus, a copied hosts.toml on a machine that never ran fj auth login), every authenticated command fails with a raw keychain error and no mention of the FJ_TOKEN fallback.

Token resolution tries the env var first, then the keychain (src/client/resolve.rs:27-36):

AuthKind::Pat => match auth::env_token() {
    Some(token) => token,
    None => auth::load_token(&name)?.ok_or_else(|| {
        anyhow!(
            "no token stored for host '{name}'. Run `fj auth login --host {name}` \
             or set FJ_TOKEN for this process."
        )
    })?,
},

The helpful ok_or_else message only fires when load_token returns Ok(None). But load_token returns Ok(None) only for keyring::Error::NoEntry (src/auth/mod.rs:38-45):

pub fn load_token(host: &str) -> Result<Option<String>> {
    let entry = entry(host)?;                       // (a) Entry::new can fail
    match entry.get_password() {
        Ok(token) => Ok(Some(token)),
        Err(keyring::Error::NoEntry) => Ok(None),
        Err(e) => Err(e).with_context(|| format!("reading token for {host} from keychain")),  // (b)
    }
}

Any other keychain failure: service unavailable, no Secret Service, locked keychain, access denied, propagates through ? at path (a) or (b). The user sees opening keychain entry for rasterhub.com: <platform error> or reading token for rasterhub.com from keychain: <platform error> and the FJ_TOKEN hint at resolve.rs:32-36 is never reached.

The FJ_TOKEN doc comment (src/auth/mod.rs:47-49) even states its purpose is to "avoid touching keychain services in headless containers", but nothing tells a stuck user that the lever exists.

Why it matters

CI/CD and remote dev are exactly where paying teams put a forge CLI, and exactly where the keychain is absent. The current failure looks like "fj is broken in containers" rather than "set one env var". A new team evaluating fj in their pipeline hits a dead end whose fix (FJ_TOKEN) is documented in CLAUDE.md and used by this repo's own CI, but is invisible at the point of failure. That is a silent adoption cliff for the headless use case.

Possible directions (sketches)

  • (sketch) Treat "keychain service unavailable / locked / denied" the same as NoEntry inside load_token, returning Ok(None) so the existing actionable resolve.rs message ("...or set FJ_TOKEN for this process.") is what the user sees. Keep the raw error available under --debug.
  • (sketch) Alternatively, catch the keychain error at resolve.rs and append the same "set FJ_TOKEN" guidance to it, preserving the underlying cause for diagnosis.
  • (sketch) A short troubleshooting entry ("running fj in CI / containers: set FJ_TOKEN") cross-linked from the keychain error text.

Confidence

High that the code path drops the FJ_TOKEN hint on non-NoEntry keychain errors (verified in resolve.rs + auth/mod.rs). Medium on real-world frequency per environment; a quick repro in a container with no Secret Service (or FJ_TOKEN unset + login keychain locked) would confirm the exact surfaced string and raise this to high.

## Observation When `FJ_TOKEN` is unset and the OS keychain is unreachable (locked Login keychain over SSH, a headless Linux container with no Secret Service / D-Bus, a copied `hosts.toml` on a machine that never ran `fj auth login`), every authenticated command fails with a raw keychain error and **no mention of the `FJ_TOKEN` fallback**. Token resolution tries the env var first, then the keychain (`src/client/resolve.rs:27-36`): ```rust AuthKind::Pat => match auth::env_token() { Some(token) => token, None => auth::load_token(&name)?.ok_or_else(|| { anyhow!( "no token stored for host '{name}'. Run `fj auth login --host {name}` \ or set FJ_TOKEN for this process." ) })?, }, ``` The helpful `ok_or_else` message only fires when `load_token` returns `Ok(None)`. But `load_token` returns `Ok(None)` **only** for `keyring::Error::NoEntry` (`src/auth/mod.rs:38-45`): ```rust pub fn load_token(host: &str) -> Result<Option<String>> { let entry = entry(host)?; // (a) Entry::new can fail match entry.get_password() { Ok(token) => Ok(Some(token)), Err(keyring::Error::NoEntry) => Ok(None), Err(e) => Err(e).with_context(|| format!("reading token for {host} from keychain")), // (b) } } ``` Any **other** keychain failure: service unavailable, no Secret Service, locked keychain, access denied, propagates through `?` at path (a) or (b). The user sees `opening keychain entry for rasterhub.com: <platform error>` or `reading token for rasterhub.com from keychain: <platform error>` and the `FJ_TOKEN` hint at `resolve.rs:32-36` is never reached. The `FJ_TOKEN` doc comment (`src/auth/mod.rs:47-49`) even states its purpose is to "avoid touching keychain services in headless containers", but nothing tells a stuck user that the lever exists. ## Why it matters CI/CD and remote dev are exactly where paying teams put a forge CLI, and exactly where the keychain is absent. The current failure looks like "fj is broken in containers" rather than "set one env var". A new team evaluating `fj` in their pipeline hits a dead end whose fix (`FJ_TOKEN`) is documented in `CLAUDE.md` and used by this repo's own CI, but is invisible at the point of failure. That is a silent adoption cliff for the headless use case. ## Possible directions (sketches) - *(sketch)* Treat "keychain service unavailable / locked / denied" the same as `NoEntry` inside `load_token`, returning `Ok(None)` so the existing actionable `resolve.rs` message ("...or set FJ_TOKEN for this process.") is what the user sees. Keep the raw error available under `--debug`. - *(sketch)* Alternatively, catch the keychain error at `resolve.rs` and append the same "set FJ_TOKEN" guidance to it, preserving the underlying cause for diagnosis. - *(sketch)* A short troubleshooting entry ("running fj in CI / containers: set `FJ_TOKEN`") cross-linked from the keychain error text. ## Confidence High that the code path drops the `FJ_TOKEN` hint on non-`NoEntry` keychain errors (verified in `resolve.rs` + `auth/mod.rs`). Medium on real-world frequency per environment; a quick repro in a container with no Secret Service (or `FJ_TOKEN` unset + login keychain locked) would confirm the exact surfaced string and raise this to high.
Author
Owner

Converted (label converted). This opportunity became one backlog item:

  • rasterstate/fj#96 (backlog, p1): Surface the FJ_TOKEN fallback when the OS keychain is unavailable.

Kept open per the product-agent triage convention. Rationale: a silent dead-end in CI/headless containers (exactly where teams evaluate a forge CLI, and where the keychain is absent) is a real blocker-to-adoption, and the verified code path drops the actionable FJ_TOKEN hint on any non-NoEntry keychain error. Sized S; the fix maps keychain-unavailable errors to Ok(None) so the existing resolve.rs guidance fires, preserving the raw cause under --debug.

**Converted** (label `converted`). This opportunity became one backlog item: - rasterstate/fj#96 (`backlog`, `p1`): Surface the `FJ_TOKEN` fallback when the OS keychain is unavailable. Kept open per the product-agent triage convention. Rationale: a silent dead-end in CI/headless containers (exactly where teams evaluate a forge CLI, and where the keychain is absent) is a real blocker-to-adoption, and the verified code path drops the actionable `FJ_TOKEN` hint on any non-`NoEntry` keychain error. Sized S; the fix maps keychain-unavailable errors to `Ok(None)` so the existing `resolve.rs` guidance fires, preserving the raw cause under `--debug`.
Author
Owner

All derived backlog items are merged: rasterstate/fj#96 closed by PR #102. Closing this opportunity per the issue state machine (operator-approved).

All derived backlog items are merged: rasterstate/fj#96 closed by PR #102. Closing this opportunity per the issue state machine (operator-approved).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
rasterstate/fj#93
No description provided.