4.0 KiB
| title | description | type | domain | tags | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ESB-56: MDR Object Handler Simplification & Cleanup | Code review and cleanup of MDR (Master Data Registry) object handler integration — added client singletons, hardened SQL, cached env vars, and filed follow-up issues. | context | development |
|
ESB-56: MDR Object Handler Simplification & Cleanup
Date: 2026-03-20
Branch: feature/ESB-56-mdr-object-handler
Repo: esb-monorepo
What Was Done
Ran a three-agent parallel code review (reuse, quality, efficiency) on the MDR-to-object-handler integration, then applied fixes directly.
Fixes Applied (commit 13f78f3)
-
Pub/Sub client singleton —
output_utils.publish_to_pubsubwas creating a newPublisherClientper call. Added module-level singleton matching the pattern already used inmdr-api/event_publisher.py. -
Firestore client singleton —
write_to_firestorewas creating a newfirestore.Client(gRPC channel + auth) per call. Added keyed singleton dict_firestore_clientsby(project_id, database). -
SQL injection hardening —
write_to_cloudsqlbuilt the Postgres function name via f-string (f"data.fn_upsert_{object_type.lower()}"). While guarded by an allowlist check, the pattern is fragile. Replaced with a pre-built lookup dict_UPSERT_FUNC_MAPso SQL identifiers are never derived by string interpolation. -
Removed verbose payload logging —
logging.info(f"Silver data: {silver_data}")logged full business object payloads at INFO level. Replaced with identity-only log line. -
Dead code removal — Two
if not project_idguards insend_to_downstream()could never trigger becauseos.environ.get("GCP_PROJECT_ID", "amenity-integrat-dev-svc")always returns a non-None default. -
Env var caching —
_PROJECT_ID,_FIRESTORE_ENABLED,_CLOUDSQL_ENABLEDmoved to module-level constants instead of being read fromos.environon every request. -
Double
list_sources()call —process_object_requestcalledregistry.list_sources()twice (once for membership check, once for the error response). Assigned to local variable. -
Test singleton isolation — Added
autousepytest fixture to reset module-level singletons between tests, fixing 3 test failures caused by cross-test state leakage.
Decision: No Connection Pool for CloudSQL
Discussed whether a SQLAlchemy connection pool made sense. Decided no because:
- Cloud Run functions default to single concurrent request per instance
- A pool of size 1 just adds wrapper overhead
- The
Connectorsingleton (which caches IAM auth) is already the expensive part, and it's cached - Pool would only matter with concurrency > 1, which would need thread-safety work first
Follow-Up Issues Created (Gitea)
| # | Title | Priority |
|---|---|---|
| #1 | Extract shared Pub/Sub publisher to py-esb-integrations | High |
| #2 | Extract shared CloudSQL connector to py-esb-integrations | High |
| #3 | Reduce duplication in MDR API route handlers | Medium |
| #4 | Parallelize independent downstream calls | Low |
Key Finding: Three-Way Duplication
Both Pub/Sub publishing and CloudSQL connection logic are copy-pasted identically across three functions (object-handler, outbound-event-handler, outbound-object-router). The shared packages/py-esb-integrations package has no GCP utility module yet — that's the natural home for extraction.
Files Changed
functions/tac/object-handler/main.py— env var caching, dead code removal, list_sources dedupfunctions/tac/object-handler/output_utils.py— singletons, SQL hardening, logging cleanupfunctions/tac/object-handler/tests/test_output_utils.py— singleton reset fixture