2017 © Pedro Peláez
 

library datachunker

SQL Chunker into files

image

aecohen/datachunker

SQL Chunker into files

  • Tuesday, November 14, 2017
  • by aecohen
  • Repository
  • 0 Watchers
  • 0 Stars
  • 10 Installations
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 4 Versions
  • 0 % Grown

The README.md

Overview

The idea behind this library is to have a framework that allows you to move big chunks of data from one side to another without exhausting the memory allocation of your script., (*1)

It started due to a basic problem presented in many systems we built every day, and it's to dump data from databases into CSV files. This is kind of a common request and in many cases the tables grow out of proportion and the script that is in charge of downloading this data and dumping it into the CSV file suddenly starts having a hard time handling the load and you need to redo the script., (*2)

Usually the solution will be to make the script handle the data in chunks at a time. Well, this is what this library is for, to try to create a very basic platform to perform this task., (*3)

Of course the time this type of solution takes (multiple queries) usually is higher in comparison to what usually takes to do everything in one batch, but remember we are allowing us to process bigger data sets without the fear of exhausting the resources., (*4)

Version

The current version of this project (v1.0) includes the initial interfaces for the Chunkers (the scripts in charge of retrieving the data in chunks) and the Dumpers (which puts the data into the other place you want it)., (*5)

It also includes an implementation of those interfaces for a PDOChunker and a CSVDumper., (*6)

Installing

Just do the basic:, (*7)

composer require aecohen/datachunker

How to use it

Once installed you just need to do something like this:, (*8)

$dumper = new CSVDumper('file_to_write.csv');
$chunker = new PDOChunker($pdo,$dumper);
$result = $chunker->process($query);

where $pdo is the already instantiated PDO handler and $query is the query you want to execute. However in order for it to work you will need to provide the query with an ORDER BY clause (if not how are we going to retrieve the data in chunks?)., (*9)

You will find that there are some options that you can tweak like the chunk size or the fail safe size (which helps you prevent infinite loops within your query if something goes wrong), (*10)

Tests

There is a good set of unit tests for all the implementations that try to cover every scenario., (*11)

Experiments/Benchmarking

So far I've done some very basic benchmarking using a simple SQL table with 4 columns and 100k rows. Not really the heaviest data, however:, (*12)

  • Following the less efficient but widely used fetchAll and then going row by row putting it on a CSV file, the script used some 81M (81903616) during peak consumption.
  • Using the better option fetch to retrieve each row instead and putting it into the file used roughly 27M (27373568)
  • And using the PDOChunker the system managed to do the same job with only 4M (4194304)

So far these numbers look promising. Let see what the future holds!, (*13)

Future features

  • Try to implement a CSVChunker and a PDODumper to cover the reverse scenario (not sure how generic this can be yet. Can it be possible? maybe with callbacks?)
  • Allow the injection of a event interface to help trigger statuses on the progress.

The Versions

14/11 2017

dev-master

9999999-dev https://bitbucket.org/aecohen/datachunker

SQL Chunker into files

  Sources   Download

The Requires

  • php >=7.0.0

 

The Development Requires

by Arturo Enrique Cohen Marsik

14/11 2017

v1.0.2

1.0.2.0 https://bitbucket.org/aecohen/datachunker

SQL Chunker into files

  Sources   Download

The Requires

  • php >=7.0.0

 

The Development Requires

by Arturo Enrique Cohen Marsik

13/11 2017

v1.0.1

1.0.1.0 https://bitbucket.org/aecohen/datachunker

SQL Chunker into files

  Sources   Download

The Requires

  • php >=7.0.0

 

The Development Requires

by Arturo Enrique Cohen Marsik

31/10 2017

v1.0.0

1.0.0.0 https://bitbucket.org/aecohen/datachunker

SQL Chunker into files

  Sources   Download

The Requires

  • php >=7.0.0

 

The Development Requires

by Arturo Enrique Cohen Marsik