To speed up the ETL data pipeline, you should try to run jobs in parallel. Obviously, not all jobs can run at the same time in most cases, since there are dependency constraints between the jobs and limits of the servers capacity (number of processors and/or IO bandwidth). So assuming the server allows you to run n jobs in parallel, often there is the situation that the dependencies give you the option to run any of a set of m different jobs with m > n.

Continue reading

Author's picture

Christian Stade-Schuldt

Data Engineer @ HERE IoT innovation lab| Full-time geek | Cyclist | Learning from data

Data Engineer @ HERE

Berlin, Germany