Blog

Implement a distributed transaction in microservices software system using Saga pattern

Implement a distributed transaction in microservices software system using Saga pattern.webp

1. Context and problem

With relational databases, we usually find the concept of transaction. Transaction is a single unit of work which may contain multiple operations that are ensured to be all occur or none occur.

Transactions should have ACID properties: atomic, consistent, isolated, and durable. Within a single service, transactions are usually easy to be ACID. When it comes to cross-service, data consistency requires a cross-service transaction management strategy.

In a distributed system/multi-services architecture:

  • Atomicity is a single unit with set of operations that must all occur or none occur.
  • Consistency means data should only move from one state to another expected valid state in all participants.
  • Isolation ensures that concurrent transactions result in the same outcome when running sequentially.
  • Durability makes the commit status of transactions persistent and be able to bear with failure related to system or power outage.

Microservices architecture usually takes the approach of database-per-microservice model. It provides many benefits with data isolation, independency and scaling ability. However, ensuring data consistency across different databases is a challenge.

Let’s take an example of the order processing logic:

  • Normal flow:

1. Client: Submit checkout order

2. Order API: Save order to database

3. Account API: Create transaction and deduct account balance

4. Payment API: Save payment to database and request external payment

5. Order API: Mark order as completed

  • Failures:

1. If Saving order fails

  • Order API: Order will be mark as failed

2. If Creating transaction or deducting account balance fails

  • Order API: Order will be mark as failed

3. If Saving payment or requesting external payment fails

  • Account API: Reverse transaction and increase account balance
  • Order API: Order will be mark as failed

4. If Marking order as completed fails

  • Payment API: Mark payment as failed and request refund payment
  • Account API: Reverse transaction and increase account balance
  • Order API: Order will be mark as failed

The above workflow is quite complex and requires a distributed transaction coordination logic for the local transactions to execute and compensating actions to run in case of failures. More steps can be added by the time.

Two-phase commit (2PC) protocol, which is also a technique to create a distributed transaction, requires all participants in a transaction to commit or roll back before the transaction is considered as completed. However some participant mechanisms, e.g, NoSQL databases, message brokering, file storage, etc, don't support this model.

Distributed transaction also has some limitations with synchronicity and availability since it needs all participants to be available. Therefore we have another candidate, the Saga pattern.

 2. Saga - The solution

The Saga pattern is a transaction management strategy involves with multiple local transactions. A local transaction is the atomic work performed by a saga participant. Each local transaction makes the changes, e.g, update database, do some external actions, and publishes message or event to trigger the next transaction in the whole saga. If some local transaction fails, the saga will execute a series of specified compensating transactions that reverse/compensate the changes that were made before by the local transactions.

Saga overview flow

 

In Saga patterns:

  • Compensable transactions: can potentially be undo by executing another transaction with the opposite effect.
  • A pivot transaction: is the go/no-go point in a saga. If the pivot transaction commits, the saga runs the remaining transactions until done. We can tell the saga is successful if a pivot transaction commits. A pivot transaction can be neither compensable nor retryable, or it can be the last compensable transaction or the first retryable transaction in the saga.
  • Retryable transactions: are transactions that follow the pivot transaction and are guaranteed to succeed, e.g, in ordering application, we have finished and prepared everything we need for the order such as payment, products, delivery man/vehicle, etc, and we just need to deliver products to our customers (which is retryable)
  • Transaction key: each transaction should have a unique key across services in the whole system. Transaction key is used to retrieve the transaction’s status and information so coordination logic can be applied to correct targets. Entities identifier can be used as a key or they can be linked with a generated key.

There are two common saga implementation approaches: choreography and orchestration. Each has its own set of challenges and techniques to manage the workflow.

3. Set up demo projects

Introduction to the demo project:

  • Full source code: trung-tran-sts/simple-saga (github.com)
  • Technical: C#, .NET 6
  • Libraries: RxNet
  • Development tools: VS Code, Visual Studio or any IDE with .NET integration
  • Branches:
    • choreography-based: Choreography-based saga implementation 
    • orchestration-based: Orchestration-based saga implementation

Overview context:

1. There are three services with their own local transactions. 

2. Client will submit action that triggers the execution of transaction 1.

3. Subsequent messages/events will be fired on status changes to trigger the other transactions.

4. There is a pivot transaction followed by a retryable transaction.

5. If there are failures, messages/events will be fired to trigger corresponding compensating transactions.

Overview system flow

 

Following is the project structure:

 

  • FirstService/SecondService/ThirdService → to be replaced by actual services and can be distributed across servers.

 

  • PubSubService: used for pub/sub. The project is currently using RxNet to implement simple pub/sub pattern within a single process. → can be replaced by real messaging services such as RabbitMQ, Azure Service Bus, Apache Kafka, etc.

 

  • Events which will be fired throughout the transaction execution.
  • Program.cs: entry point of the console program.

Testing instruction:

 

After starting the program:

  • Part 1 asks us to enter some simulation configuration, if:
    • Should transaction 2 fail = Y → local transaction 2 will throw an exception that publishes the Transaction2FailureEvent
    • Should transaction 3 fail = Y → local transaction 3 will throw an exception that publishes the Transaction3FailureEvent
    • Complete action try count = x → complete action will throw an exception then retry until try x. If x >= 5, it will eventually fail and publish the CompletionFailure
  • Part 2 runs us through the transaction execution, if no error simulation is configured, it will run until completion. Otherwise, we will see how the saga reacts to exception in specific participants/steps.

4. Choreography

Choreography is a way to coordinate sagas where events are exchanged using a message broker or any mechanism without a centralized controller. Each local transaction publishes events that trigger local transactions in other participants. Participants must know each other’s events to be able to handle them.

Choreography Overview

Choreography Overview

Pros

  • Good for workflows which is simple and involves only a few participants

  • Good for workflows that don’t need coordination logic

  • Don’t need an additional implementation and maintenance for orchestrator service

  • No single point of failure at the orchestrator, responsibilities are shared and distributed across participants

Cons

  • The workflow can become cumbersome when adding new participants and steps, since it’s hard to keep track of the dependencies between services

  • Cyclic dependency between participants can occur because they have to consume each other's commands

  • All services must be running to simulate a transaction so its difficult to do integration testing

Let’s apply to our demo project: 

Choreography-based Saga overview

 

Overview:

1. Client submit action

2. FirstService performs local transaction 1

3. FirstService publishes Transaction1CompletedEvent that triggers SecondService

4. SecondService performs local transaction 2

5. SecondService publishes Transaction2CompletedEvent that triggers ThirdService

6. ThirdService performs local transaction 3 with two steps

7. ThirdService publishes Transaction3CompletedEvent that triggers FirstService to do the complete action transaction

8. FirstService performs complete action transaction

In case of failures:

1. If local transaction 1 fails: transaction fails

2. If local transaction 2 fails:

  1. SecondService publishes Transaction2FailureEvent that triggers FirstService
  2. FirstService performs compensate transaction 1

3. If local transaction 3 fails:

  1. ThirdService publishes Transaction3FailureEvent that triggers FirstService, SecondService
  2. FirstService performs compensate transaction 1
  3. SecondService performs compensate transaction 2

4. If complete action transaction (retryable) fails after many tries:

  1. FirstService publishes CompletionFailureEvent that triggers FirstService, SecondService, ThirdService
  2. FirstService performs compensate transaction 1
  3. SecondService performs compensate transaction 2
  4. ThirdService performs compensate transaction 3 and external compensating operation if possible

Examples:

FirstService local transaction 1 (FirstService.cs)

 

SecondService listener for Transaction1CompletedEvent (Program.cs)

 

SecondService listeners for failure events from transaction 3 and complete action (Program.cs)

5. Orchestration

Orchestration coordinates sagas using a centralized controller. The orchestrator has the responsibilities of specifying the participants local transactions to perform based on events. The orchestrator should also handle failures by specifying the actions to be taken. Status can be managed for monitoring and management purposes.

Orchestrator Overview

Pros

  • Suitable for complex workflows having many participants or new participants added over time.

  • Used when the flow of tasks and activities are under control.

  • No cyclic dependencies since the orchestrator will know all participants but the participants themselves don’t need to know each other.

  • Separation of concerns is ensured in a single participant so it can focus on the business logic, no need to worry about the other participants in saga.

Cons

  • Additional effort to maintain, implement coordination logic.

  • The orchestrator can be the single point of failure since its the only way for the transaction to be executed.

 

Let’s apply to our demo project: 

 

Orchestrator-based Saga overview

Overview:

1. Client submit action

2. FirstService performs local transaction 1

3. FirstService publishes Transaction1CompletedEvent that triggers Orchestrator

4. Orchestrator sends command to SecondService

5. SecondService to performs local transaction 2

6. SecondService publishes Transaction2CompletedEvent that triggers Orchestrator

7. Orchestrator sends command to ThirdService

8. ThirdService performs local transaction 3 with two steps

9. ThirdService publishes Transaction3CompletedEvent that triggers Orchestrator

10. Orchestrator sends command to FirstService to do the complete action transaction

11. FirstService performs complete action transaction

In case of failures:

5. If local transaction 1 fails: transaction fails

6. If local transaction 2 fails:

  1. SecondService publishes Transaction2FailureEvent that triggers Orchestrator
  2. Orchestrator sends command to FirstService
  3. FirstService performs compensate transaction 1

7. If local transaction 3 fails:

  1. ThirdService publishes Transaction3FailureEvent that triggers Orchestrator
  2. Orchestrator sends command to FirstService and SecondService
  3. FirstService performs compensate transaction 1
  4. SecondService performs compensate transaction 2

8. If complete action transaction (retryable) fails after many tries:

  1. FirstService publishes CompletionFailureEvent that triggers Orchestrator
  2. Orchestrator sends command to FirstService, SecondService and ThirdService
  3. FirstService performs compensate transaction 1
  4. SecondService performs compensate transaction 2
  5. ThirdService performs compensate transaction 3 and external compensating operation if possible

Examples:

FirstService local transaction 1 (FirstService.cs)

 

SecondService listener for StartTransaction2Command (Program.cs)

 

Orchestrator coordination logic (Orchestrator.cs)

 

Sample use case in an Order processing system

The orchestration-based saga in order processing system

6. Issues and considerations

Some considerations:

  • Can be challenging in the beginning.

  • Hard to debug.

  • Complexity grows when there are more participants.

  • Consistency is relative only, because changes are committed to local databases.

  • Transient failures handling and idempotence must be implemented to ensure the correctness, reduce side-effects. Idempotence means a logic can be repeated multiple times without changing the initial result.

  • Observability must be applied to monitor and track the saga execution.

Some issues can happen without proper handling:

  • Lost updates, a participant updates data without reading latest changes.

  • Dirty reads, when a participant reads data that is just partially updated by another saga. The final values will be different from what was read before.

  • Fuzzy/nonrepeatable reads, data is not consistent across participants since data can be updated at some points throughout the entire saga workflow.

Suggested countermeasures to reduce or prevent anomalies include:

  • Semantic lock.

  • Commutative updates, that can be executed in any order and won’t change the result.

  • Pessimistic view: reorders the saga, so data updates will be placed in a retryable transaction, which, at the same time, makes saga participants to read the original data and ensure data updates in the end.

  • Reread value and restart if data changed.

  • Record operations in order as they arrive and execute them sequentially.

  • Low-risk requests may use sagas, while high-risk requests favor distributed transactions (can be rolled back properly in case of failures).

7. When to use

Use the Saga pattern when you need to:

  • Ensure data consistency in a distributed system without tight coupling.
  • Efficiently execute compensating actions if one of the operations fails in the distributed action sequence.

The Saga pattern is less suitable for:

  • Tightly coupled operations and transactions.
  • Cyclic dependencies.

— END —

Resources

References

Content Manager

Thanh (Bruce) Pham CEO of Saigon Technology

A Member of Forbes Technology Council

Table of Contents

Want to start a project?

Our team is ready to implement your ideas. Contact us now to discuss your roadmap!

get in touch

As a Leading Vietnam Software Development Outsourcing Company, we dedicate to your success by following our philosophy:

YOUR SUCCESS IS OUR MISSION.

about-us

Contact Us