claude-home/development/2026-03-20.md
Cal Corum cd57645dd0
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: sync KB — 2026-03-20.md,release-2026.3.20.md
2026-03-20 22:00:43 -05:00

4.0 KiB

title description type domain tags
ESB-56: MDR Object Handler Simplification & Cleanup Code review and cleanup of MDR (Master Data Registry) object handler integration — added client singletons, hardened SQL, cached env vars, and filed follow-up issues. context development
esb-monorepo
python
cloud-functions
mdr
code-review
refactoring
gitea

ESB-56: MDR Object Handler Simplification & Cleanup

Date: 2026-03-20 Branch: feature/ESB-56-mdr-object-handler Repo: esb-monorepo

What Was Done

Ran a three-agent parallel code review (reuse, quality, efficiency) on the MDR-to-object-handler integration, then applied fixes directly.

Fixes Applied (commit 13f78f3)

  1. Pub/Sub client singletonoutput_utils.publish_to_pubsub was creating a new PublisherClient per call. Added module-level singleton matching the pattern already used in mdr-api/event_publisher.py.

  2. Firestore client singletonwrite_to_firestore was creating a new firestore.Client (gRPC channel + auth) per call. Added keyed singleton dict _firestore_clients by (project_id, database).

  3. SQL injection hardeningwrite_to_cloudsql built the Postgres function name via f-string (f"data.fn_upsert_{object_type.lower()}"). While guarded by an allowlist check, the pattern is fragile. Replaced with a pre-built lookup dict _UPSERT_FUNC_MAP so SQL identifiers are never derived by string interpolation.

  4. Removed verbose payload logginglogging.info(f"Silver data: {silver_data}") logged full business object payloads at INFO level. Replaced with identity-only log line.

  5. Dead code removal — Two if not project_id guards in send_to_downstream() could never trigger because os.environ.get("GCP_PROJECT_ID", "amenity-integrat-dev-svc") always returns a non-None default.

  6. Env var caching_PROJECT_ID, _FIRESTORE_ENABLED, _CLOUDSQL_ENABLED moved to module-level constants instead of being read from os.environ on every request.

  7. Double list_sources() callprocess_object_request called registry.list_sources() twice (once for membership check, once for the error response). Assigned to local variable.

  8. Test singleton isolation — Added autouse pytest fixture to reset module-level singletons between tests, fixing 3 test failures caused by cross-test state leakage.

Decision: No Connection Pool for CloudSQL

Discussed whether a SQLAlchemy connection pool made sense. Decided no because:

  • Cloud Run functions default to single concurrent request per instance
  • A pool of size 1 just adds wrapper overhead
  • The Connector singleton (which caches IAM auth) is already the expensive part, and it's cached
  • Pool would only matter with concurrency > 1, which would need thread-safety work first

Follow-Up Issues Created (Gitea)

# Title Priority
#1 Extract shared Pub/Sub publisher to py-esb-integrations High
#2 Extract shared CloudSQL connector to py-esb-integrations High
#3 Reduce duplication in MDR API route handlers Medium
#4 Parallelize independent downstream calls Low

Key Finding: Three-Way Duplication

Both Pub/Sub publishing and CloudSQL connection logic are copy-pasted identically across three functions (object-handler, outbound-event-handler, outbound-object-router). The shared packages/py-esb-integrations package has no GCP utility module yet — that's the natural home for extraction.

Files Changed

  • functions/tac/object-handler/main.py — env var caching, dead code removal, list_sources dedup
  • functions/tac/object-handler/output_utils.py — singletons, SQL hardening, logging cleanup
  • functions/tac/object-handler/tests/test_output_utils.py — singleton reset fixture