What is Batch Processing ?
Batch Process is a technique by which we can process records in batches. Here batches are containers which will holds the records.
Mule Enterprise Edition has the capability to process messages in batches. This is useful for processing a large number of records. It has many components that are very specific to batch processing and can be used to implement business logic.
Within an application, you can initiate a batch job scope, which is a block of code that splits messages into individual records, performs actions upon each record, then reports on the results and pushes the processed output to other systems or queues.
Why we have to use Batch Processing ?
When we need to process a lot of amount of data in that case we need to use Batch Processing because by using this all the records are divided in to batches and batches will process asynchronously in the process phase of batch.So it will take less time to process these records.
Different Phases of Batch Job
There are 4 phases of a batch job in Mule 3.
- Input Phase
- Load and Dispatch
- Process Phase
- OnComplete Phase
Input Phase: This is an optional part of the batch job that can be used to retrieve the source data using any inbound connector. It also allows chaining of multiple message processors to transform the source data before it is ready for processing.
Load and Dispatch: This is an implicit phase and Mule runtime takes care of it. In this phase, the payload generated in the Input phase or provided to the batch from the caller flow is turned into a collection of records. It also creates a job instance for processing records. The collection is then sent through the collection-splitter to queue individual records for processing.
Process: This is the required phase where the actual processing of every record occurs asynchronously.
- Each record from the input queue is processed through the first step and sent back to the queue after processing of the first step completes.
- Records that are processed in the first step are then passed through the second step and sent back to the queue after processing of the second step completes.
- Mule continues this until all records are passed through each step.
On Complete: In this final but optional phase, a summary of batch execution is made available to possibly generate reports or any other statistics. The payload in this phase is available as an instance of a BatchJobResult object. It holds information such as the number of records loaded, processed, failed, succeeded. It can also provide details of exceptions that occurred in steps, if any.
Configuration of Batch
Threads in batch
- By default there are 16 threads available in a batch process.
- If you want to change the number of threadsthere is a option to configure threading profile.
- A batch of records are processed by a thread.
- Batch step can be used in only Process phase of a batch.
- There can be multiple Batch steps in a single Batch
- At the step level, you can also specify what type of records each step should accept. This can be configured using accept-expressionin the step definition. Records satisfying the accept-expression condition of a step will be processed by that step, or otherwise moved to the next eligible step.
A scope that accumulates records into chunks to prepare bulk upserts to the external source or service is called a batch commit. You can add batch commits at the process record stage and wrap up the Salesforce connector with batch commit and set the commit size depending on your requirement.
Batch execute can be used to trigger the batch job. If you are not using poll scope or any message source in your batch job, then you can use batch execute to trigger a batch job.
Exception handling in Batch
In case of Batch we don’t have any option to declare exception handling. So there are we need to use Global Exception handling or we can call flows using Flow reference from batch steps so that we can handle exception in that flows only.
Scope of variables in Batch
Flow and Sessionvariable :-In case of Batch scope of Flow and Session variables are same.
If we define a variable in input phase then it can be accessible in batch steps but when we change its value in first batch step then it we are not able to fine that updated value in next batch step, we will get the value which are assigned in input phase. And the same value we will get in OnComplete phase also.
Record variable :-Mule provides Flow Variables, Session Variables, and Outbound Properties to store information at different scope levels. When records are being processed through the steps, each record becomes the message payload for step.
If you want to store any information at the individual record level, then existing scopes does not work. That is where a record variable is needed. It is scoped to the process phase only and every record gets its own copy of record variables that are serialized and carried with the record through all the steps.
Thank you for reading!