Fabric BQ (BigQuery) Sync

This project is provided as an accelerator to help synchronize or migrate data from Google BigQuery to Fabric. The primary use cases for this accelerator are:

BigQuery customers who wish to continue to leverage their existing investments and data estate while optimizing their PowerBI experience and reducing overall analytics TCO
BigQuery customers who wish to migrate all or part of their data estate to Microsoft Fabric

Getting Started for New Installs

The accelerator includes an automated installer that can set up your Fabric workspace and install all required dependencies automatically. To use the installer:

Download the current version Installer notebook
Import the installer into your Fabric Workspace
Attach the installer to a Lakehouse within the Workspace
Upload your GCP Service Account credential json file to OneLake
Update the configuration parameters:
- loader_name – custom name for the sync operation used in dashboards/reports (ex: HR Data Sync, BQ Sales Transaction Mirror)
- metadata_lakehouse - name of the lakehouse used to drive the BQ Sync process
- target_lakehouse - name of the lakehouse where the BQ Sync data will be landed
- gcp_project_id - the GCP billing project id that contains the in-scope dataset
- gcp_dataset_id - the target BQ dataset name/id
- gcp_credential_path - the File API Path to your JSON credential file (Example: /lakehouse/default/Files/my-credential-file.json")
Run the installer notebook

The installer performs the following actions:

Create the required Lakehouses, if they do not exists
Creates the metadata tables and required metadata
Downloads the correct version of your BQ Spark connector based on your configured spark runtime
Downloads the current BQ Sync python package
Creates an initial default user configuration file based on your config parameters
Installs a fully configured and ready to run BQ-Sync-Notebook into your workspace

Upgrading to Current Version

The accelerator now includes an upgrade utility to simplify the process of upgrading your existing BQ Sync instance to the most current version. The upgrade utility handles major and minor updates. To use the upgrade utility:

Download the current version Upgrade notebook
Import the Upgrade notebook into your Fabric Workspace
Attach the Upgrade notebook to your BQ Sync metadata Lakehouse
Update the notebook parameters to point to your current configuration file.
Run te upgrade process

The upgrade process performs the following actions:

Migrates your current configuration file (when necessary). Note that new features/capabilities are turned off by default. Your current configuration file is cloned and is not overwritten.
Updates the BQ Sync metastore (when necessary). When schema changes are made to the BQ Sync metastore, the metastore is Optimized as part of the upgrade process.
Downloads the current version of the BQ Sync package. If you are using environments or have otherwise optimized your environment it may be necessary to manually update your package repository.
Downloads the current version of the Big Query spark connector (when available/necessary)
Installed a new version of the BQ Sync notebook, mapped to the new configuration, python package and spark connector

Project Overview

For many of our customers, the native mirroring capabilities in Fabric are one of the most exciting features of the platform. While Fabric currently supports a growing number of different mirroring sources, BigQuery is not yet supported. This current gap in capabilities is the foundation of this accelerator.

The goal of this accelerator is to simplify the process of synchronizing data from Google BigQuery to Microsoft Fabric with an emphasis on reliability, performance, and cost optimization. The accelerator is implemented using Spark (PySpark) using many concepts common to an ETL framework. The accelerator is more than just an ETL framework however in that it uses BigQuery metadata to solve for the most optimal way to synchronize data between the two platforms.

Features & Capabilities

Within the accelerator there is an ever-growing set of capabilities that either offer feature parity or enhance & optimize the overall synchronization process. Below is an overview of some of the core capabilities:

Multi-Project/Multi-Dataset sync support
Table pattern-match filters to filter (include/exclude) during discovery
Table & Partition expiration based on BigQuery configuration
Synching support for Views & Materialized Views
Support for handling tables with required partition filters
BigQuery connector configuration for alternative billing and materialization targets
Rename BigQuery tables and map to specific Lakehouse targets
Rename or convert data types using table-level column mapping
Shape BigQuery source with an alternate source sql query and/or source predicate
Complex-type (STRUCT/ARRAY) handling/flattening
Support for Delta schema evolution for evolving BigQuery table/view schemas
Override BigQuery native partitioning with a partitioning schema optimized for the Lakehouse (Delta partitioning)
Automatic Lakehouse table maintenance on synced tables
Detailed process telemetry that tracks data movement and pairs with native Delta Time Travel capabilities

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
Docs		Docs
Notebooks/v2.0.0		Notebooks/v2.0.0
Packages/FabricSync		Packages/FabricSync
Setup/v2.0.0		Setup/v2.0.0
dist		dist
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
fabricsync.png		fabricsync.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fabric BQ (BigQuery) Sync

Getting Started for New Installs

Upgrading to Current Version

Project Overview

Features & Capabilities

Contributing

Trademarks

About

Releases 3

Packages

Contributors 3

Languages

License

microsoft/FabricBQSync

Folders and files

Latest commit

History

Repository files navigation

Fabric BQ (BigQuery) Sync

Getting Started for New Installs

Upgrading to Current Version

Project Overview

Features & Capabilities

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages