diff --git a/graph/fixes/fix-prevent-duplicate-posts-when-audit-update-fails-after-su-fccd42.md b/graph/fixes/fix-prevent-duplicate-posts-when-audit-update-fails-after-su-fccd42.md new file mode 100644 index 00000000000..8a05c459b65 --- /dev/null +++ b/graph/fixes/fix-prevent-duplicate-posts-when-audit-update-fails-after-su-fccd42.md @@ -0,0 +1,46 @@ +--- +id: fccd4277-a253-4f7f-8065-7a9cffd64354 +type: fix +title: "Fix: Prevent duplicate POSTs when audit update fails after successful delivery in outbound-event-handler" +tags: [esb-monorepo, tac, outbound-event-handler, error-handling, fix, cloud-functions, idempotency, duplicate-prevention] +importance: 0.7 +confidence: 0.8 +created: "2026-02-25T17:47:36.634745+00:00" +updated: "2026-02-25T17:47:36.634745+00:00" +--- + +# Fix: Duplicate POST Prevention on Audit Failure + +## Project +esb-monorepo — TAC `outbound-event-handler` (GCF) + +## Problem +Bitbucket Rovo Dev flagged a code design issue: if `update_processed_events()` fails at Step 5 (after all POSTs to `outbound-object-router` succeeded), the function returned HTTP 500. Cloud Scheduler treats any 5xx as a failure and retries, causing duplicate POSTs to the downstream router for events that were already delivered. + +## Root Cause +The final audit DB update call (`update_processed_events`) was not wrapped in error handling. A transient DB failure at that step caused the function to surface a 500, even though all delivery work was complete. + +## Fix +Wrapped the final `update_processed_events` call in a `try/except` block: +- Logs the audit error at ERROR level +- Still returns **200** because all deliveries succeeded — the audit failure is non-fatal +- Cloud Scheduler will NOT retry, preventing duplicate POSTs + +## Important Distinction +The partial-progress flush in the POST failure path (~line 165) is intentionally **NOT** wrapped. In that case we DO want 500 returned so Cloud Scheduler retries the failed POST delivery. + +```python +# Step 5 — audit update: non-fatal, deliveries already complete +try: + update_processed_events(processed_ids) +except Exception as e: + logger.error("Audit update failed after successful delivery: %s", e) + +return make_response("OK", 200) +``` + +## Test Added +`test_audit_update_failure_after_all_posts_succeed_returns_200` + +## Commit +`320b871` on `dev-object-handler` branch