GitHub - bisoftbilgi/mysql-to-parquet

MySQL To Parquet

MySQL to parquet is command line tool to offload any particular MySQL table or part of the table to a parquet file

It is inefficient to fetch all rows and convert to parquet, it uses a streaming cursor to fetch rows. Converts chunks of data to a intermediary CSV file. Then convert CSV file to a parquet file.

This approach is very memory and CPU effective when dealing with large tables.

Installation

git clone https://github.com/bisoftbilgi/mysql-to-parquet.git

cd mysql-to-parquet

pip install -r requirements.txt

Usage

usage: main.py [-h] --table-name TABLE_NAME [--target-directory TARGET_DIRECTORY] [--host HOST] [--user USER] [--password PASSWORD] [--port PORT] [--database DATABASE] [--where WHERE] [--ask-pass] [--analyze-table] [--progress] [--keep-csv]

--table-name or -t is the name of the table to offload

--target-directory or -d is the directory to offload data. Both CSV and parquet files are kept in this directory. The target directory should be have enough space to keep both CSV and parquet file.

--host or -H is the hostname or ip adress of the MySQL instance

--user or -u is the username to connect to the MySQL instance

--password or - p is the password for the MySQL user

--ask-pass is used in order to achive interactive password input, --ask-pass has higher precedence over --pasword argument. So if --ask-pass is used, it will ask to input a password and use it anyway.

--port is the port for the MySQL instance

--database or -D is the schema name in which the table exists

--where or -w is expression if any partial offload is required such as createAt < now() - interval 1 year. There should be no table aliases for the target table, just column names and other expressions.

--analyze-table is to analyze table to predict number of rows in the MySQL table before export

--progress output progress for menu.sh progress bar

--keep-csv is to keep CSV file after convert CSV file to parquet

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
menu.sh		menu.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MySQL To Parquet

Installation

Usage

menu

About

Releases

Packages

Languages

License

bisoftbilgi/mysql-to-parquet

Folders and files

Latest commit

History

Repository files navigation

MySQL To Parquet

Installation

Usage

menu

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages