Getting Started with Batch Processing : Key Concepts and Fundamentals
Batch processing is a technique which processess data in a large group instead of a single element of data, where you can process a high volume data with minimal human interaction.It provides reusable functions for processing large volumes of records.
Spring Batch is a lightweight framework which is used to develop Batch Applications that are used in Enterprise Applications.
Other Functionalities
- It provides functions for includes logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management that are necessary when processing large volumes of records.
Architectural Flow
Before jumping into the architecture , let us understand the basic terminologies involves in it.
- A job is a single execution unit represents the Batch job. It runs from start to finish without interruption. Each job is comprised of one or more step.
- A step represents the logical task which is used as a worker in a parallel or partitioned execution. Each step belongs to a single job.
- Each job is managed by a Job Launcher which acts has an interface for running these Jobs.
- Job Repository is responsible for storing each Java Object into its meta-data table for sprng batch.
- Job Instance represents the logical run of a job; it is created when we run a job.
- Job Execution and Step Execution are the representation of the execution of a job/step.
- Item-Reader reads the input data from a file or database and gives us the items.
- Item-Processor is used if any processing required for the data. It process the data and send to the Item-Writer.
- Item-Writer writes out (store) data of an item to the database or in output files.
Sample XML configuration of a Job in Spring Batch
<job id = "jobid">
<step id = "step1" next = "step2"/>
<step id = "step2" next = "step3"/>
<step id = "step3"/>
</job>
Main Prcoessing Flow:
The Job Launcher is initiated from the Job Schedulers which gets triggered n ceertain condition. The job is executed from Job Launcher. The steps fetches the input data by using the Item-Reader and processes the data by using Item-Processor further outputs the processed data by using Item-Writer.
For example, if we are writing a job with a simple step in it where we read data from MySQL database and process it and write it to a file (flat), then our step uses −
- A reader which reads from MySQL database.
- A writer which writes to a flat file.
- A custom processor which processes the data as per our wish.
<job id = "helloWorldJob">
<step id = "step1">
<tasklet>
<chunk reader = "mysqlReader" writer = "fileWriter"
processor = "CustomitemProcessor" ></chunk>
</tasklet>
</step>
</job>
Flow for persisting Job Information:
1. Job Launcher registers Job Instance in Database through Job Repository.
2. Job Launcher registers that Job Execution has started in Database through Job Repository.
3. Job Step updates miscellaneous information like counts of I/O records and status in Database through Job Repository.
4. Job Launcher registers that Job execution has completed in Database through Job Repository.
Real Time Example
One real-time example of batch processing can be seen in the operations of Amazon, a global e-commerce giant. With a massive scale of customers and a high volume of daily transactions, Amazon employs batch processing to effectively manage its operations. By collecting and processing customer orders in predefined batches, Amazon efficiently handles inventory management, optimizes packaging, and streamlines shipping operations. Additionally, batch processing enables Amazon to analyze sales and customer data, generate reports, and make data-driven decisions. This approach ensures scalability, reliability, and efficient data processing, allowing Amazon to successfully serve millions of customers worldwide.
Conclusion
In conclusion, this blog covered the basics of batch processing and its importance in handling large volumes of data efficiently. We discussed the architecture diagram, process flow, and a real-time example of Amazon using batch processing for streamlined operations and data-driven decision-making. By leveraging batch processing, organizations can optimize resources, scale operations, and make informed decisions based on comprehensive data analysis.