Lock conflicts and deadlocks are frequent culprits behind business disruptions in GBase 8c distributed clusters. Unlike single-node databases, GBase 8c introduces both local locks and cross-node global locks, making diagnosis more challenging. This guide provides a field-tested workflow covering lock mechanisms, common causes, diagnosis steps, and practical solutions.
1. Core Lock Mechanisms in GBase 8c
1.1 Lock Scope
- Local locks: Confined to a single CN or DN node (e.g., concurrent transactions on one DN).
- Global locks: Span multiple nodes, typical in distributed transactions and DDL operations.
1.2 Common Lock Modes and Conflicts
| Mode | Purpose | Conflict Scenarios |
|---|---|---|
| ShareLock (S) | Shared read lock | Conflicts with X locks |
| ExclusiveLock (X) | Exclusive write lock | Conflicts with all other lock modes |
| AccessExclusiveLock (AEL) | DDL operations (e.g., TRUNCATE) | Blocks all reads and writes |
| Intent Locks (IS/IX) | Table-level intent, coordinates with row-level locks | Ensures table/row lock compatibility |
1.3 Deadlock Types
- Local deadlock: Occurs within a single node, automatically detected and resolved by GBase 8c.
- Global deadlock: Cross-node circular wait; may require manual intervention if auto-resolution fails.
2. Five Common Causes of Lock Issues
- Poor transaction design: Long-running transactions, inconsistent multi-table operation order.
- Ineffective SQL: Missing WHERE clause causing full-table lock, index failure expanding lock scope, DDL during peak hours.
-
Stuck two-phase transactions: Distributed transactions stuck in
preparedstate holding locks. -
Improper cluster parameters: e.g.,
lock_wait_timeouttoo short/long, insufficient thread pool. - Node communication anomalies: Network delays or partitions triggering global deadlocks.
3. Diagnostic Workflow (4 Steps)
3.1 Step 1: Locate Blocked Sessions
-- Find all blocked lock requests
SELECT
pg_locks.pid,
pg_class.relname AS table_name,
pg_locks.mode,
pg_locks.granted,
pg_stat_activity.query,
pg_stat_activity.state
FROM pg_locks
LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid
LEFT JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE pg_locks.granted = false;
Also check for stranded two-phase transactions:
SELECT gxid, status, start_time
FROM pg_distributed_transactions
WHERE status = 'prepared';
3.2 Step 2: Analyze the Lock Wait Chain
WITH lock_chain AS (
SELECT
pg_locks.pid AS wait_pid,
pg_locks.relation,
pg_locks.mode AS wait_mode,
pg_stat_activity.query AS wait_query,
(SELECT pid FROM pg_locks l2 WHERE l2.relation = pg_locks.relation AND l2.granted = true AND l2.mode = 'ExclusiveLock') AS hold_pid
FROM pg_locks
LEFT JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE pg_locks.granted = false
)
SELECT
wait_pid AS blocked_pid,
pg_class.relname AS table_name,
wait_mode,
wait_query,
hold_pid AS holding_pid,
(SELECT query FROM pg_stat_activity WHERE pid = hold_pid) AS holding_query
FROM lock_chain
LEFT JOIN pg_class ON lock_chain.relation = pg_class.oid;
3.3 Step 3: Distinguish Local vs. Global Deadlock
-
Local: Check node logs
grep -i "deadlock" /opt/gbase/data/log/postgresql-*.log. -
Global: Check all CN/DN logs, and query
pg_distributed_transactionsforpreparedtransactions.
3.4 Step 4: Verify Root Cause
- Use
EXPLAIN ANALYZEto check index usage. - Identify transactions running longer than 5 minutes.
- Test node connectivity with
pingandtelnet.
4. Solutions
4.1 Optimise Transaction Design
- Break long transactions; avoid external API calls inside transactions.
- Enforce consistent object order across all transactions (e.g., always update
userbeforeorder).
4.2 Refine SQL and Indexes
-- Add index to prevent full-table scan
CREATE INDEX idx_user_phone ON user(phone);
-- Schedule DDL off-peak, check lock status beforehand
4.3 Clean Up Stuck Prepared Transactions
SELECT pg_terminate_backend(gxid)
FROM pg_distributed_transactions
WHERE status = 'prepared' AND start_time < now() - INTERVAL '1 hour';
4.4 Tune Cluster Parameters
ALTER SYSTEM SET lock_wait_timeout = '10s';
ALTER SYSTEM SET thread_pool_attr = '512, 1, (cpubind:0-3)';
ALTER SYSTEM SET thread_pool_stream_attr = '512, 0.2, 1, (cpubind:0-3)';
-- Apply with SELECT pg_reload_conf(); or restart if necessary.
4.5 Ensure Stable Node Communication
Enable heartbeat checks and adjust inter‑node timeout settings.
5. Real‑World Case Study
Symptom: During peak hours, many queries failed with lock wait timeout, and multiple backend threads hung in waiting state.
Diagnosis: Global lock query revealed two UPDATE transactions in a circular wait — PID 1234 held an X lock on user and waited for order, while PID 5678 held an X lock on order and waited for user. Two prepared transactions also existed.
Resolution:
- Manually terminated the stuck prepared transactions.
- Standardised transaction order across all applications (user → order).
- Adjusted
lock_wait_timeoutto 10s and optimised thread pool settings. - Repaired network latency and enabled node heartbeat detection.
Result: Lock blocks cleared, API response times returned to normal, and no new deadlocks occurred.
6. Common Pitfalls and Best Practices
- Never blindly kill sessions; first identify the lock holder.
- Regularly clean up long‑running
preparedtransactions. - Global deadlocks often need manual intervention.
- Execute DDL only during maintenance windows.
- Periodically verify index effectiveness and query plans.
- Tune lock and thread pool parameters based on actual cluster load.
Effective lock troubleshooting in a gbase database demands a systematic approach — pinpoint the root cause, apply targeted fixes, and establish continuous monitoring to prevent recurrence. Use this guide to keep your GBASE clusters stable and responsive.
For further actions, you may consider blocking this person and/or reporting abuse
