Spring Batch Architecture

Spring/Spring Batch

Spring Batch Architecture

민철킹 2022. 2. 20. 15:11

JobLauncher: Job을 실행시키는 컴포넌트
Job: 배치 작업
JobRepository: Job 실행과 Job, Step을 저장
Step: 배치 작업 단계
ItemReader, ItemProcessor, ItemWriter: 데이터를 읽고 처리하고 쓰는 컴포넌트

Application Layer: 비즈니스 로직이 존재, 사용자의 코드
Core Layer: 배치 작업을 시작하고 제어하는데 필수적인 클래스
- Job, Step, JobLauncher
Infrastructure Layer: 외부와 상호작용
- ItemReader, ItemWriter, RetryTemplate

Job

전체 배치 프로세스를 캡슐화한 도메인
Step의 순서를 정의
JobParameters를 받음
JobExecution이 실제 실행되는 것을 의미

@Bean
public Job footballJob() {
    return this.jobBuilderFactory.get("footballJob")
                     .start(playerLoad())
                     .next(gameLoad())
                     .next(playerSummarization())
                     .build();
}

playerLoad() -> gameLoad() -> playerSummarization() 순으로 실행

Step

작업 처리의 단위
Chunk 기반 Step, Tasklet 기반의 Step

Chunk-oriented Processing

먼저, Chunk란 무엇일까?

Chunk는 Spring Batch에서 데이터 덩어리로 작업할 때 각 커밋 사이에 처리되는 row수를 의미한다.

즉, Chunk-oriented란 Chunk단위로 데이터를 처리한다는 의미로 볼 수 있다.

순서를 나열해본다면,

read()를 통해 데이터를 읽음
읽어온 데이터를 Processor에서 가공
지정된 단위만큼 Chunk가 쌓이면 write

Chunk 기반으로 하나의 트랜잭션에서 데이터를 처리
commitInterval만큼 데이터를 read 트랜잭션 내에서 chunkSize만큼 write
- chunkSize: 한 트랜잭션에서 사용할 아이템의 개수
- commitInterval: reader가 한번에 읽을 아이템의 갯수
- 대부분의 경우에서 이 두 개수를 동일하게 맞추는 것이 좋다.

이를 코드로 나타내본다면?

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read();
    if (item != null) {
        items.add(item);
    }
}
itemWriter.write(items);

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read();
    if (item != null) {
        items.add(item);
    }
}

List processedItems = new Arraylist();
for(Object item: items){
    Object processedItem = itemProcessor.process(item);
    if (processedItem != null) {
        processedItems.add(processedItem);
    }
}

itemWriter.write(processedItems);

예시

/**
 * Note the JobRepository is typically autowired in and not needed to be explicitly
 * configured
 */
@Bean
public Job sampleJob(JobRepository jobRepository, Step sampleStep) {
    return this.jobBuilderFactory.get("sampleJob")
                .repository(jobRepository)
                .start(sampleStep)
                .build();
}

/**
 * Note the TransactionManager is typically autowired in and not needed to be explicitly
 * configured
 */
@Bean
public Step sampleStep(PlatformTransactionManager transactionManager) {
    return this.stepBuilderFactory.get("sampleStep")
                .transactionManager(transactionManager)
                .<String, String>chunk(10)
                .reader(itemReader())
                .writer(itemWriter())
                .build();
}

step에 트랜잭션 매니저를 주입해주고, chuckSize를 설정한다.

또한, ItemReader, ItemProcessor, ItemWriter 구현체를 설정해주어야한다.

ItemProcessor는 생략가능
<String, String>은String`타입을 읽고 쓰겠다는 의미이다.

TaskletStep

하나의 트랜잭션에서 데이터를 처리 -> 단순한 처리할 때 사용

@Bean
public Step step1() {
    return this.stepBuilderFactory.get("step1")
                .tasklet(myTasklet())
                .build();
}

Tasklet 구현체를 설정해주는데 내부에 단순한 읽기, 쓰기, 처리 로직을 모두 넣음
RepeatStatus를 설정

Spring Batch Schema

배치를 실행하고 관리하기 위한 메타 데이터를 저장
StepExecution, JobInstance, JobExecution, JobParameters, ExecutionContext.Step, ExecutionContext.Job, JobParameters

Spring Batch를 실행할 때 이 클래스를 통해 인스턴스를 생성하고 사용한다.

Spring Batch가 이 메타 데이터 테이블을 사용하기 때문에 초기 설정이 필요하고, 이는 프레임워크에 포함되는 부분이기 때문에 직접 수정하지 않고 조회만 한다.

공식문서

현재글Spring Batch Architecture

많이 배우고 성장하고싶은 Junior Backend Developer입니다.

BFS, 코딩테스트, 정렬, 백준, dfs, 알고리즘, 파이썬, leetcode, 프로그래머스, spring mvc, JPA, Python, DP, Servlet, django, Spring, Java, SpringBoot, 프로그래밍, spring boot,

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

발자국남기기