fj auth unusable on headless Linux: keychain errors block login/reads; FJ_TOKEN should fully bypass the keychain #147

Closed
opened 2026-06-11 02:08:34 +00:00 by stephen · 0 comments
Owner

Symptom

On the headless agent host (usw-dev-01, Linux), fj became unusable for all API operations mid-session:

error: opening keychain entry for rasterhub.com
  caused by: No matching entry found in secure storage

This hit fj pr list, fj pr merge, fj pr create, fj issue and even fj auth login (it could not store the pasted token), box-wide and across every agent lane at once. git push/clone were unaffected (separate credential helper), so only the forge-API layer went down.

Root cause

fj's keyring backend on Linux talks to the Secret Service over D-Bus (libsecret / gnome-keyring). On a headless server there is usually no unlocked keyring collection, so every keychain read and write fails. The token had been usable earlier only because a keyring/D-Bus session was transiently unlocked; once it went away, auth died with no recovery path on the box.

The merged keychain fallback (rasterstate/fj#96) maps keychain errors to Ok(None) so the FJ_TOKEN env var can take over for reads, but the binary installed on the box predated it, so the raw keychain error short-circuited before FJ_TOKEN was consulted.

Impact

High: a transient keyring failure halted the entire agent fleet's forge-API work mid-round until the binary was rebuilt and FJ_TOKEN was wired into the shells that run fj. This is exactly the "fj is broken in containers/CI" adoption cliff #96 was meant to close, and it is the daily environment for the headless buyer.

Workaround applied

  1. Rebuilt fj from main (has #96) and installed it to ~/.local/bin/fj.
  2. Exported FJ_TOKEN in the shells that actually spawn fj (bash for Claude lanes, zsh -lc reads ~/.zshenv for codex lanes; ~/.zshrc does NOT reach either).

Asks

  1. FJ_TOKEN should be a first-class, documented primary auth path for headless/CI, checked before the keychain, so a broken keyring never blocks a process that has the env var set.
  2. fj auth login must not hard-fail when the keychain is unavailable: offer a non-keychain token store (a 0600 file-based store, or print the exact FJ_TOKEN export to use) instead of erroring on store.
  3. Detect no-Secret-Service / headless and emit the actionable FJ_TOKEN guidance rather than a raw opening keychain entry error (extend #96 to the login/store path).
  4. Document FJ_TOKEN as the recommended auth for CI and headless hosts in the README/auth docs.

Environment

  • Host: usw-dev-01 (headless Linux), no unlocked Secret Service collection.
  • Installed binary predated #96; current main (851b68b) has the read-side fallback but not the login/store fix or the docs.
## Symptom On the headless agent host (usw-dev-01, Linux), `fj` became unusable for all API operations mid-session: ``` error: opening keychain entry for rasterhub.com caused by: No matching entry found in secure storage ``` This hit `fj pr list`, `fj pr merge`, `fj pr create`, `fj issue` and even `fj auth login` (it could not *store* the pasted token), box-wide and across every agent lane at once. `git` push/clone were unaffected (separate credential helper), so only the forge-API layer went down. ## Root cause `fj`'s `keyring` backend on Linux talks to the Secret Service over D-Bus (libsecret / gnome-keyring). On a headless server there is usually no unlocked keyring collection, so every keychain read **and write** fails. The token had been usable earlier only because a keyring/D-Bus session was transiently unlocked; once it went away, auth died with no recovery path on the box. The merged keychain fallback (rasterstate/fj#96) maps keychain *errors* to `Ok(None)` so the `FJ_TOKEN` env var can take over for reads, but the binary installed on the box predated it, so the raw keychain error short-circuited before `FJ_TOKEN` was consulted. ## Impact High: a transient keyring failure halted the entire agent fleet's forge-API work mid-round until the binary was rebuilt and `FJ_TOKEN` was wired into the shells that run `fj`. This is exactly the "fj is broken in containers/CI" adoption cliff #96 was meant to close, and it is the daily environment for the headless buyer. ## Workaround applied 1. Rebuilt `fj` from `main` (has #96) and installed it to `~/.local/bin/fj`. 2. Exported `FJ_TOKEN` in the shells that actually spawn `fj` (bash for Claude lanes, `zsh -lc` reads `~/.zshenv` for codex lanes; `~/.zshrc` does NOT reach either). ## Asks 1. **`FJ_TOKEN` should be a first-class, documented primary auth path for headless/CI**, checked before the keychain, so a broken keyring never blocks a process that has the env var set. 2. **`fj auth login` must not hard-fail when the keychain is unavailable**: offer a non-keychain token store (a 0600 file-based store, or print the exact `FJ_TOKEN` export to use) instead of erroring on store. 3. **Detect no-Secret-Service / headless and emit the actionable `FJ_TOKEN` guidance** rather than a raw `opening keychain entry` error (extend #96 to the login/store path). 4. **Document `FJ_TOKEN` as the recommended auth for CI and headless hosts** in the README/auth docs. ## Environment - Host: usw-dev-01 (headless Linux), no unlocked Secret Service collection. - Installed binary predated #96; current `main` (851b68b) has the read-side fallback but not the login/store fix or the docs.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
rasterstate/fj#147
No description provided.