fj run view: distinguish private-repo log auth failure from a missing run, instead of masking it as "no logs" #91
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
When
fj run view <n> --log/--log-failed(and the plainfj run view <n>summary) cannot read a run's logs, distinguish a 404-because-private/unauthenticated response from a genuinely-missing run/job, and print an actionable error for the former instead of the misleading "no logs for that run/job/attempt".Root cause:
fjreads logs from the Forgejo web frontend route (POST/GET /{owner}/{repo}/actions/runs/{run}/jobs/{job}/attempt/{attempt}, built inweb_run_url,src/api/workflow_view.rs), which authenticates with a session cookie + CSRF token, not an API token. On a private repo that route returns 404 to a token-only request to hide the resource.log_route_error(src/api/workflow_view.rs, ~line 226) maps every 404 to "no logs for that run/job/attempt", so an auth/permission failure is reported as a missing run, including for known-successful runs.Shippable approach: the 404 alone cannot tell auth-rejection from a real miss, but the token-accessible
/api/v1/repos/{owner}/{repo}/actions/taskslist can. When the web route 404s, probe that list (already the confirmed-working token surface, see the module header insrc/api/workflow_view.rs):/actions/tasksbut the web route 404s -> auth/permission case. Print something like: "Cannot read logs for run N: the Forgejo web log route rejected token auth. Log retrieval on private repos needs session auth (cookie/CSRF); an API token is not accepted by this route. See rasterstate/fj#103." Distinguish this from a job-index-out-of-range case where the run exists but--jobis wrong./actions/tasks(or job index out of range) -> keep the existing "check the run number againstfj run list, and that--jobexists" message.This is the error-quality + detection slice only. Actually retrieving private-repo logs is split into rasterstate/fj#103.
Priority
p2. Not a crash and a fallback exists (
/actions/tasksfor pass/fail, web UI for logs), but the misleading "no logs" actively misdiagnoses real CI triage (see rasterstate/fj#92, which sent a triage chasing runner/autoscaler limits for an ordinary step failure whose logs existed).Why
Reproduced on rasterhub.com against private repos
rasterstate/fjord-ios(run 395 failed, 384 succeeded) andrasterstate/flux(run 171): every form of log retrieval fails identically for green and red runs.--debugshows the web-route POST returning 404 under token auth. The "no logs for that run/job/attempt" text implies a bad run/job number, but the numbers are valid (they appear infj run list) and the logs exist (confirmed viaaction_task_step.log_lengthserver-side). The wrong error is worse than no error: it points at a nonexistent cause. Duplicate report with the same root cause: rasterstate/fj#92.Acceptance
/api/v1/.../actions/tasks,fj run view <n> --log/--log-failedprints an actionable auth/permission error (naming session-vs-token auth and linking rasterstate/fj#103), not "no logs for that run/job/attempt".--jobstill gets the existing "check the run number / job index" message.fj run view <n>summary path surfaces the same distinction when it hits the same web route.src/client/integration_tests.rs: a private-repo case (web route 404 + tasks list 200 that contains the run) asserts the auth-distinguishing message; a missing-run case (tasks list 200 without the run) asserts the existing message.cargo fmt --check,cargo clippy --all-targets --all-features -- -D warnings, andcargo test --allpass.Dependencies
None for this slice; it lands independently. It is the prerequisite for rasterstate/fj#103 (actually retrieving private-repo logs), whose documented-limitation path reuses the actionable error introduced here.
Out of scope
Size
S
While researching adoption blockers I found the auth-masking pattern this issue describes is not limited to
run view --log; the sibling Actions commands hand-roll the same error construction and share the same blind spot, so it's worth fixing as one class rather than per-command.In
src/api/workflow_run.rs, several handlers dolet body = res.text().await.unwrap_or_default();and then build a message that (a) drops the HTTP status on the empty-body branch and (b) never special-cases 401/403:dispatch(~:102-117): on any non-success it falls back to "check the workflow file name and ref" when the body is empty, so an auth failure reads as a bad workflow name.list_artifacts(~:152-161): non-404 errors surface as "could not list artifacts ..." with an empty detail when the body is empty.download_artifact(~:186-195): emits "(HTTP 403): " with nothing after the colon.post_run_action(rerun/cancel, ~:276-292): non-404 errors surface as "could not {action} run #N: " with no status/diagnostic.Same user-facing failure mode as this issue: a private-repo/permission problem looks like "your run/workflow is wrong." The fix in ask #1 here (distinguish unauthenticated/forbidden from genuinely-missing, and always include the status) would cover these too if applied at the shared layer. Filing here rather than as a new issue since it's the same root cause.
fj run view --log/--log-failedreports "no logs" for every run even when logs exist #92fj run view --log returns 404 "no logs" for every run on private reposto fj run view: distinguish private-repo log auth failure from a missing run, instead of masking it as "no logs"