2017 © Pedro Peláez
 

library mysql-to-google-bigquery

MySQL to Google BigQuery Sync Tool

image

memeddev/mysql-to-google-bigquery

MySQL to Google BigQuery Sync Tool

  • Friday, July 14, 2017
  • by gabrielrcouto
  • Repository
  • 3 Watchers
  • 31 Stars
  • 359 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 14 Forks
  • 3 Open issues
  • 3 Versions
  • 64 % Grown

The README.md

MySQL to Google BigQuery Logo, (*1)

MySQL to Google BigQuery Sync Tool

Table of Contents

How it works

Steps when no order column has been supplied:, (*2)

  • Count MySQL table rows
  • Count BigQuery table rows
  • MySQL rows > BigQuery rows?
  • Get the rows diff, split in batches of XXXXX rows/batch

Steps when order column has been supplied:, (*3)

  • Get max value for order column from MySQL table
  • Get max value for order column from BigQuery table
  • Max value MySQL > Max value BigQuery?
  • Delete all rows with order column value = max value BigQuery to make sure no duplicate records are being created in BigQuery
  • Get max value for order column from BigQuery table
  • Get the rows diff based on new max value BigQuery, split in batches of XXXXX rows/batch

Final three steps:, (*4)

  • Dump MySQL rows to a JSON
  • Send JSON to BigQuery
  • Repeat until all batches are sent

Tip: Create a cron job for keep syncing the tables using an interval like 15 minutes (respect the Load Jobs quota policy), (*5)

Requirements

The following PHP versions are supported:, (*6)

  • PHP 7
  • HHVM
  • PDO Extension with MySQL driver

Usage

Download the library using composer:, (*7)

$ composer require memeddev/mysql-to-google-bigquery

Now, define some environment variables or create a .env file on the root of the project, replacing the values:, (*8)

BQ_PROJECT_ID=bigquery-project-id
BQ_KEY_FILE=google-service-account-json-key-file.json
BQ_DATASET=bigquery-dataset-name

DB_DATABASE_NAME=mysql-database-name
DB_USERNAME=mysql_username
DB_PASSWORD=mysql_password
DB_HOST=mysql-host
DB_PORT=3306

IGNORE_COLUMNS=password,hidden_column,another_column

PS: To create the Google Service Account JSON Key File, access https://console.cloud.google.com/apis/credentials/serviceaccountkey, (*9)

Run:, (*10)

vendor/bin/console sync table-name

If you want to auto create the table on BigQuery:, (*11)

vendor/bin/console sync table-name --create-table

If you want to delete (and create) the table on BigQuery for a full dump:, (*12)

vendor/bin/console sync table-name --delete-table

Credits

:heart: Memed SA (memed.com.br), (*13)

License

MIT license, see LICENSE, (*14)

The Versions

14/07 2017
25/10 2016