This terraform module automates populating some Tamr config variables that are generated as outputs from other GCP scale-out modules.
Smallest complete fully working example. This example might require extra resources to run the example.
This modules creates:
- A template_file data source which renders the contents of a populated Tamr config.
Name | Version |
---|---|
terraform | >= 1.0.0 |
>= 4.6.0 |
No provider.
Name | Description | Type | Default | Required |
---|---|---|---|---|
tamr_bigtable_cluster_id | Bigtable cluster ID | string |
n/a | yes |
tamr_bigtable_instance_id | Bigtable instance ID | string |
n/a | yes |
tamr_bigtable_max_nodes | Max number of nodes to scale up to | string |
n/a | yes |
tamr_bigtable_min_nodes | Min number of nodes to scale down to | string |
n/a | yes |
tamr_cloud_sql_location | location for cloud sql instance. NOTE: this is either a region or a zone. | string |
n/a | yes |
tamr_cloud_sql_name | name of cloud sql instance | string |
n/a | yes |
tamr_dataproc_bucket | GCS bucket to use for the tamr dataproc cluster | string |
n/a | yes |
tamr_dataproc_region | Region the dataproc uses. | string |
n/a | yes |
tamr_filesystem_bucket | GCS bucket to use for the tamr default file system | string |
n/a | yes |
tamr_instance_internal_ip | internal ip of tamr vm | string |
n/a | yes |
tamr_instance_project | The project to launch the tamr VM instance in. | string |
n/a | yes |
tamr_instance_service_account | email of service account to attach to the tamr instance | string |
n/a | yes |
tamr_instance_subnet | subnetwork to attach instance too | string |
n/a | yes |
tamr_instance_zone | zone to deploy tamr vm | string |
n/a | yes |
tamr_sql_password | password for the cloud sql user | string |
n/a | yes |
dataproc_network_tags | list of network tags to attach to the dataproc nodes | list(string) |
[] |
no |
tamr_bigtable_project_id | The google project that the bigtable instance lives in. If not set will use the tamr_instance_project as the default value. | string |
"" |
no |
tamr_cloud_sql_project | project containing cloudsql instance. If not set will use the tamr_instance_project as the default value. | string |
"" |
no |
tamr_dataproc_cluster_config | If you do not want to use the default dataproc configuration template, pass in a complete dataproc configuration file to variable. If you are passing in a dataproc configure it should not be left padded, we will handle that inside of our template. It is expected to a yaml document of a dataproc cluster config Refrence spec is https://cloud.google.com/dataproc/docs/reference/rest/v1/ClusterConfig |
string |
"" |
no |
tamr_dataproc_cluster_enable_stackdriver_logging | Enabled stackdriver logging on dataproc clusters. This only used if using the built in tamr_dataproc_cluster_config configuration | bool |
true |
no |
tamr_dataproc_cluster_master_disk_size | Size of disk to use on dataproc master disk This only used if using the built in tamr_dataproc_cluster_config configuration | number |
1000 |
no |
tamr_dataproc_cluster_master_instance_type | Instance type to use as dataproc master This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"n1-highmem-4" |
no |
tamr_dataproc_cluster_service_account | Service account to attach to dataproc workers. If not set will use the tamr_instance_service_account as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"" |
no |
tamr_dataproc_cluster_subnetwork_uri | Subnetwork URI for dataproc to use. If not set will use the tamr_instance_subnet as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"" |
no |
tamr_dataproc_cluster_version | Version of dataproc to use. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"2.0" |
no |
tamr_dataproc_cluster_worker_machine_type | machine type of default worker pool. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"n1-standard-16" |
no |
tamr_dataproc_cluster_worker_num_instances | Number of default workers to use. This only used if using the built in tamr_dataproc_cluster_config configuration | number |
4 |
no |
tamr_dataproc_cluster_worker_num_local_ssds | Number of localssds to attach to each worker node. This only used if using the built in tamr_dataproc_cluster_config configuration | number |
2 |
no |
tamr_dataproc_cluster_worker_preemptible_machine_type | machine type of preemptible worker pool. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"n1-standard-16" |
no |
tamr_dataproc_cluster_worker_preemptible_num_instances | Number of preemptible workers to use. This only used if using the built in tamr_dataproc_cluster_config configuration | number |
0 |
no |
tamr_dataproc_cluster_worker_preemptible_num_local_ssds | Number of localssds to attach to each preemptible worker node. This only used if using the built in tamr_dataproc_cluster_config configuration | number |
2 |
no |
tamr_dataproc_cluster_zone | Zone to launch dataproc cluster into. If not set will use the tamr_instance_zone as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration | string |
"" |
no |
tamr_dataproc_image_version | Dataproc image versionmage | string |
"2.0" |
no |
tamr_dataproc_project_id | Project for the dataproc cluster. If not set will use the tamr_instance_project as the default value. | string |
"" |
no |
tamr_es_apihost | The hostname and port of the REST API endpoint of the Elasticsearch cluster to use. If unset will use < ip of vm>:9200 | string |
"" |
no |
tamr_es_enabled | Whether Tamr will index user data in Elasticsearch or not. Elasticsearch is used to power Tamr's interactive data UI, so when this is set to false Tamr will run 'headless,' that is, without its core UI capabilities. It can be useful to disable Elasticsearch in production settings where the models are trained on a separate instance and the goal is to maximize pipeline throughput. | bool |
true |
no |
tamr_es_number_of_shards | The number of shards to set when creating the Tamr index in Elasticsearch. Default value is the number of cores on the local host machine, so this should be overridden when using a remote Elasticsearch cluster. Note: this value is only applied when the index is created. | number |
1 |
no |
tamr_es_password | Password to use to authenticate to Elasticsearch, using basic authentication. Not required unless the Elasticsearch cluster you're using has security and authentication enabled. The value passed in may be encrypted. | string |
null |
no |
tamr_es_socket_timeout | Defines the socket timeout for Elasticsearch clients, in milliseconds. This is the timeout for waiting for data or, put differently, a maximum period of inactivity between two consecutive data packets. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined (system default). The default value is 900000, i.e., fifteen minutes. | number |
900000 |
no |
tamr_es_ssl_enabled | Whether to connect to Elasticsearch over https or not. Default is false (http). | bool |
false |
no |
tamr_es_user | Username to use to authenticate to Elasticsearch. Not required unless the Elasticsearch cluster you're using has security and authentication enabled. | string |
"" |
no |
tamr_hbase_namespace | HBase namespace to user, for bigtable this will be the table prefix. | string |
"ns0" |
no |
tamr_json_logging | Toggle json formatting for tamr logs. | bool |
false |
no |
tamr_license_key | Set a tamr license key | string |
"" |
no |
tamr_spark_driver_memory | Amount of memory spark should allocate to spark driver | string |
"12G" |
no |
tamr_spark_executor_cores | Amount of cores spark should allocate to each spark executor | number |
5 |
no |
tamr_spark_executor_instances | number of spark executor instances | number |
12 |
no |
tamr_spark_executor_memory | Amount of memory spark should allocate to each spark executor | string |
"13G" |
no |
tamr_spark_properties_override | json blob of spark properties to override, if not set will use a default set of properties that should work for most use cases | string |
"" |
no |
tamr_sql_user | username for the cloud sql user | string |
"tamr" |
no |
Name | Description |
---|---|
tamr_config_file | full tamr config file |
tmpl_dataproc_config | dataproc config |
This repo is based on:
Run make terraform/docs
to generate the section of docs around terraform inputs, outputs and requirements.
Run make lint
, this will run terraform fmt, in addition to a few other checks to detect whitespace issues.
NOTE: this requires having docker working on the machine running the test
- Update version contained in
VERSION
- Document changes in
CHANGELOG.md
- Create a tag in github for the commit associated with the version
Apache 2 Licensed. See LICENSE for full details.