There’s been more and more buzz lately around the subject of microservices architectures for business applications, and whether to use choreography or orchestration to manage them. In this article, I’ll cover a bit about microservices architecture as it relates to enterprise application development, and a lot more about what orchestration means and more specifically about why BPMN orchestration engines are a powerful complement to microservices architectures. At the end of the article I will also introduce some other use cases in which BPMN engines can be used to orchestrate but also coordinate work among services of any kind, humans and robots.
The advantage that microservices offer, of course is great flexibility and deployability using cloud-based components. Microservices are autonomous deployable entities that can interact with each other directly as needed. Let’s look at, as an example, the case of customer interactions with MobileBank, a hypothetical mobile banking app offering a promotional gift of credit in exchange for using the app to purchase a product. MobileBank is also on the alert for opportunities to make special offers to encourage customer loyalty.
From the customer’s point of view, there are a number key steps, from purchasing the product with the mobile app, to receiving credit for a future purchase, to using that credit on another purchase.
From the application point of view, each of these transactions can be handled by a microservice. When it works, it works great! But what happens when there is an error? Where there are a lot of calls and no overall process, it’s difficult to see where the error could be.
What if it is in, say, the purchase interaction? When the online retail system is temporarily unavailable? When there is an interruption in internet service? When you do a network call and it fails, you don’t know why. Did the call fail to reach the service provider? Did the service provider have a failure? Did your system fail to receive the call back? If it is the third case, the customer’s account has been charged, so if they try again to purchase the item, your service may charge the account again. Now a clean up may be needed, to cancel the first transaction and avoid an unhappy customer.
There is a distinct order to the completion of these individual services, and that’s where process is meaningful. The gift credit can’t be issued until the payment is processed, and the credit can’t be used on a new purchase until it is received in the customer’s account. There are ways to manage this process with microservices choreography, that is, coordination directly among the microservices.
Increasing complexity shows the limits of microservices
Microservices pioneers like Netflix are learning the limitation of choreography. On the company's technology blog, it noted that "with peer to peer task choreography, we found it was harder to scale with growing business needs and complexities."
Some of the issues they were encountering with the choreography approach:
- There was “hidden” process, that is, the actual process flows were “embedded” within the code of the microservices;
- Tight coupling and assumptions around input/output, SLAs, etc within individual services made it harder to adapt to changing needs
- There was not overall tracking, so it was difficult to systematically answer “How much are we done with process X”?
Netflix decided to built an open source platform, called Conductor, to provide the overall logic to orchestrate their microservices based process flows.
Traditionally, some of these processes had been orchestrated in an ad-hoc manner using a combination of pub/sub, making direct REST calls, and using a database to manage the state. However, as the number of microservices grow and the complexity of the processes increases, getting visibility into these distributed workflows becomes difficult without a central orchestrator.
Netflix Conductor: A microservices orchestrator, Netflix Technology blog
Mark Richards also anticipated this and noted it in his book Microservices Antipatterns and Pitfalls (2016):
The other issue with too much service choreography is that it can impact the overall reliability and robustness of your system. The more remote calls you make for a single business request, the better the chances are that one of those remote calls will fail or time out.
The different flavors of microservices orchestration
Orchestration used to have a bad reputation in the microservices ecosystem, as it was considered an antipattern. Each microservice should be independent of other microservices, so there was no room for a centralized orchestration approach. There's no rule that says orchestration needs to be centralized. Orchestration can be used instead to extract the business logic of each individual microservice or even to provide visibility into a sequence of microservices calls. Or a mix of both choreography and orchestration may be most appropriate.
In addition to Conductor, I've seen more and more new orchestration frameworks such as Cadence or Apache Airflow to help developers code the orchestration flow of microservices.
BPMN orchestration engines offers an alternative approach to those microservices orchestration frameworks as the process flow can be graphically defined using the Business Process Model and Notation (BPMN) standard. That way, it becomes easier to understand the whole logic and to collaborate during the orchestration flow definition.
Most Workflow and Business Process Management engines today support the BPMN standard, that is, they use graphical notation to define the orchestration logic, to describe the different processes so it becomes easier to understand the whole picture. And because all parts of the process are clearly shown, it’s possible to define all exception handling, set timers and so on visibly in the logic. You can also graphically define logic that allows you to define how to handle errors and manage compensations, for example “Do A -> Do B; if B fails, undo A.”
In order to see the differences between those two orchestration approaches let's have a look to one of the microservices orchestration examples available in Netflix's Conductor documentation and compare it with the equivalent process using a BPMN engine.
The example is a simple workflow that adds Netflix Idents to videos. Netflix Idents are 4 second videos that appears at the beginning and end of shows. Here is what that workflow definition looks like in Conductor.
Here is what that workflow looks like in Conductor.
{
"name": "add_netflix_identation",
"description": "Adds Netflix Identation to video files.",
"version": 2,
"schemaVersion": 2,
"tasks": [
{
"name": "verify_if_idents_are_added",
"taskReferenceName": "ident_verification",
"inputParameters": {
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
},
{
"name": "decide_task",
"taskReferenceName": "is_idents_added",
"inputParameters": {
"case_value_param": "${ident_verification.output.is_idents_added}"
},
"type": "DECISION",
"caseValueParam": "case_value_param",
"decisionCases": {
"false": [
{
"name": "add_idents",
"taskReferenceName": "add_idents_by_type",
"inputParameters": {
"identType": "${workflow.input.identType}",
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
}
]
}
}
]
}
Now, additional behaviour can be added to each task of the process:
1Fo1. For the task called verifiy_if_idents_are_added:
{
"name": "verify_if_idents_are_added",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
}
2. and for the task called add_idents.
{
"name": "add_idents",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
}
Here is the same logic, defined in a BPMN model:
Using BPMN means you don’t have to hard code the logic, and the sequential processes, where they are necessary, are no longer hidden!
For the diagram above, I used the BPMN editor of the Bonita open source platform, but this model is valid for any BPMN standard compliant engine. Some of the visible advantages of this approach:
- Managing errors. Errors can be handled automatically exactly where they can happen, using exception paths, timers, and other BPMN elements. If human intervention might be needed, that can be included in the logic as well.
- A BPMN engine stores all workflow executions in a database so they can be analyzed later on.
- Data on cases and processes can be compiled for reporting.
- BPMN allows definition of rules for routing data and data handling - as in the logic of how tasks are assigned to the right services, right people, and so on.
Other orchestration use cases include coordination of systems, humans and robots
BPMN engines offer even more than just microservices orchestration It allows orchestration - and automation - of any service: operations managed through APIs, integrations with legacy and proprietary specialty systems, integrations with services such as SAP and other ERP platform operations, IoT integrations, and the like. Our MobileBank can use their parent company’s legacy CRS, SAP, and approval processes right alongside their purchasing and crediting microservices.
Then there are those tasks “assigned to people” I mentioned above. BPMN modeling is expressly designed for the orchestration of human and automated IS tasks, creating a mix of what people need to act on in a process, for example, for approvals (let’s run a special waiver for a certain target sector - that needs a supervisor), escalations for issues (needs a division manager?), delegations for absences (when it’s August in France), and the like.
So now we’re at the point where, with a BPMN engine, we can orchestrate microservices, monoliths, and people. Our little mobile bank app is humming along pretty well now.
But wait, there’s more. Enter the robot workforce.
Repetitive, mechanical tasks performed by humans are ideal candidates for delegation to software robots. BPMN engines can integrate smoothly with Robotic Process Automation technologies (RPA). Now we can have MobileBank robots collect, retrieve, and collate customer information that might be useful to make an instant special offer.
Orchestration can handle a much broader mix than just microservices. In future articles, I will illustrate with examples those other orchestration use cases involving services, humans and robots.
Note: a shorter version of this article was published in Forbes on October 28th 2019