Quick Read

Parallel programming: Using the Fork-Join model in Salesforce

Parallel programming: Using the Fork-Join model in Salesforce

September 29, 2023

abstract images

The Model

Fork-Join is a parallel computing paradigm in which code execution starts sequentially and then splits off (forks) into parallel threads before eventually joining back together to resume sequential execution.

It’s a classic design pattern typically used to solve for divide-and-conquer scenarios. The “forking” allows for parallel execution of multiple operations much faster than it would if each operation was serialized. The “joining” allows for those parallel operations to report their status back to a singular sequence with consolidated composite reporting and related actions.

Parallel programming: Using the fork-join model in Salesforce

A visual representation of the Fork-Join model

Parallel programming: Using the fork-join model in Salesforce

visual-representation-fork-join-model_Parallel programming Fork-Join  Salesforce.jpeg

There are no direct constructs to create a “fork” or “join” in Salesforce, a very popular Enterprise Development Platform. Here are some tactics that will allow you to still implement the Fork-Join model in Salesforce.

Parallelism In Salesforce: How to “fork”

Within the Salesforce platform, one way to create a parallel thread of execution (forking) is via Asynchronous Apex , and it comes in multiple flavors: Queueables, Future methods, Scheduled Apex, and Batch Apex (Batch Apex does not implement parallelism within itself; While a Batch Apex job is scheduled asynchronously, each invocation on a chunk of records (a batch) is invoked sequentially.)

As a developer, you are not explicitly creating something parallel—instead, you’re simply requesting an asynchronous context (e.g., via the System.enqueueJob invocation for Queueable) that allows for parallelism on the server. The platform schedules the request at the appropriate time when resources are available. 

Another way to create a parallel thread is via Asynchronous Messaging. The sender of the message continues processing without waiting for the receiver to respond, allowing the sender and receiver to then run in parallel. Examples of contexts for using this method include:

  • Platform events and Change Data Capture events that use the Pub-Sub model on the Salesforce Enterprise Messaging platform 
  • Push Topics that use the Salesforce Streaming API for sending notifications about changes to data
  • An API call from Salesforce to an external service that provides immediate acknowledgement before processing the request (Not a response, but an acknowledgement that the message has been received).

There are also other forms of parallelism initiated from outside the Salesforce platform (e.g., Bulk API in parallel mode, publishing event messages via API, etc.).

Consolidation in Salesforce: How to join

There isn’t a direct “join” operation in Salesforce like there is in Java or C# programming languages. But we could simulate a “join” by deducing the last operation (among parallel threads) to complete and designating it as the one that resumes after “join.” Reliably deducing the last operation would require the use of Salesforce data as a shared resource.

A typical problem in a parallel context is shared resource consistency. A shared resource accessed by a parallel thread should remain consistent for read/write operations. The thread could handle this by acquiring a lock on the shared resource. 

Consistency of Salesforce data is guaranteed via data referential integrity and automatic transaction controls established on query and Data Manipulation Language (DML) operations invoked in an Apex Transaction. A Salesforce Object Query Language (SOQL) query, for instance, provides a non-blocking read-lock on the data being fetched; writes on that data can still happen in parallel. A DML operation (insert, update, delete, upsert) provides a blocking write-lock for other threads wanting to write, but does not block parallel reads. Neither is sufficient by itself to deduce a “join.” Users will run into race conditions if we follow a read for shared data with a write to update it.

What we would need is a combined read/write lock that provides an exclusive lock on the data around that operation. This is where the SOQL FOR UPDATE statement comes in. In Apex, this statement locks queried records while they are being updated. This avoids race conditions and thread safety issues. Let’s look at a real-world example where, using a combination of Asynchronous Messaging and FOR UPDATE, we built the Fork-Join paradigm in Salesforce. 

Business Scenario

While working on an online loan application portal, we encountered a use case that would benefit from using the Fork-Join model. In this portal, borrowers submitted a single application containing multiple loans. If an application was approved, each loan needed to be submitted for booking in a secondary system asynchronously (the “fork”). Each booking would run in parallel. The portal would then receive updates with the booking status for each loan separately. As it received these booking statuses, the portal needed to aggregate them and update the application booking status (the “Join”).

The Fork

We used Asynchronous Messaging (a REST API callout that acknowledges immediately before resuming processing) to initiate the fork. 

image-module_Parallel programming Fork-Join  Salesforce.jpeg

The Join

The portal tracked the count of loans to be booked on the application record and decreased it whenever a loan booking response was received. When the counter hit zero, we knew that we were receiving the last loan’s booking status and we were ready to “join” the results into an aggregated application booking status. If all loans were successfully booked, we could then mark the application as “Booked”—otherwise, the portal would notify the application owner for manual review.  

The counter here is a shared resource. A SOQL query for the counter (a non-blocking read) followed by an update to deduct the counter (blocking write) could cause race conditions between the read and the write. If loan booking responses were received in quick succession, one thread might not have time to decrement the counter before another thread read it, causing it to never hit zero. A blocking write-lock is necessary around both the read and the write to ensure data consistency. We used the SOQL FOR UPDATE statement to query the counter while locking the record to solve for it.

image-module-02_Parallel programming Fork-Join  Salesforce.jpeg

A key constraint of the FOR UPDATE statement is that a thread waiting for a lock on the same record (either via DML statement or another FOR UPDATE) would time out after 10 seconds. Since the lock boundary is the entire Apex transaction, we ensured that our transactions did not include any time-consuming processing beyond handling the status.

Conclusion

The Fork-Join model is a powerful concept that allows enterprise systems to perform complex, time-consuming actions faster using parallelism while collating results in a stable manner.

With a little bit of creativity, this can be accomplished in the Salesforce world even though there is no direct fork or join construct available.

Authored by: Bartee Natarajan & Dakota Lim