Terraform Module Template

This terraform module automates populating some Tamr config variables that are generated as outputs from other GCP scale-out modules.

Examples

Minimal

Smallest complete fully working example. This example might require extra resources to run the example.

Minimal

Resources Created

This modules creates:

A template_file data source which renders the contents of a populated Tamr config.

Requirements

Name	Version
terraform	>= 1.0.0
google	>= 4.6.0

Providers

No provider.

Inputs

Name	Description	Type	Default	Required
tamr_bigtable_cluster_id	Bigtable cluster ID	`string`	n/a	yes
tamr_bigtable_instance_id	Bigtable instance ID	`string`	n/a	yes
tamr_bigtable_max_nodes	Max number of nodes to scale up to	`string`	n/a	yes
tamr_bigtable_min_nodes	Min number of nodes to scale down to	`string`	n/a	yes
tamr_cloud_sql_location	location for cloud sql instance. NOTE: this is either a region or a zone.	`string`	n/a	yes
tamr_cloud_sql_name	name of cloud sql instance	`string`	n/a	yes
tamr_dataproc_bucket	GCS bucket to use for the tamr dataproc cluster	`string`	n/a	yes
tamr_dataproc_region	Region the dataproc uses.	`string`	n/a	yes
tamr_filesystem_bucket	GCS bucket to use for the tamr default file system	`string`	n/a	yes
tamr_instance_internal_ip	internal ip of tamr vm	`string`	n/a	yes
tamr_instance_project	The project to launch the tamr VM instance in.	`string`	n/a	yes
tamr_instance_service_account	email of service account to attach to the tamr instance	`string`	n/a	yes
tamr_instance_subnet	subnetwork to attach instance too	`string`	n/a	yes
tamr_instance_zone	zone to deploy tamr vm	`string`	n/a	yes
tamr_sql_password	password for the cloud sql user	`string`	n/a	yes
dataproc_network_tags	list of network tags to attach to the dataproc nodes	`list(string)`	`[]`	no
tamr_bigtable_project_id	The google project that the bigtable instance lives in. If not set will use the tamr_instance_project as the default value.	`string`	`""`	no
tamr_cloud_sql_project	project containing cloudsql instance. If not set will use the tamr_instance_project as the default value.	`string`	`""`	no
tamr_dataproc_cluster_config	If you do not want to use the default dataproc configuration template, pass in a complete dataproc configuration file to variable. If you are passing in a dataproc configure it should not be left padded, we will handle that inside of our template. It is expected to a yaml document of a dataproc cluster config Refrence spec is https://cloud.google.com/dataproc/docs/reference/rest/v1/ClusterConfig	`string`	`""`	no
tamr_dataproc_cluster_enable_stackdriver_logging	Enabled stackdriver logging on dataproc clusters. This only used if using the built in tamr_dataproc_cluster_config configuration	`bool`	`true`	no
tamr_dataproc_cluster_master_disk_size	Size of disk to use on dataproc master disk This only used if using the built in tamr_dataproc_cluster_config configuration	`number`	`1000`	no
tamr_dataproc_cluster_master_instance_type	Instance type to use as dataproc master This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`"n1-highmem-4"`	no
tamr_dataproc_cluster_service_account	Service account to attach to dataproc workers. If not set will use the tamr_instance_service_account as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`""`	no
tamr_dataproc_cluster_subnetwork_uri	Subnetwork URI for dataproc to use. If not set will use the tamr_instance_subnet as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`""`	no
tamr_dataproc_cluster_version	Version of dataproc to use. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`"2.0"`	no
tamr_dataproc_cluster_worker_machine_type	machine type of default worker pool. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`"n1-standard-16"`	no
tamr_dataproc_cluster_worker_num_instances	Number of default workers to use. This only used if using the built in tamr_dataproc_cluster_config configuration	`number`	`4`	no
tamr_dataproc_cluster_worker_num_local_ssds	Number of localssds to attach to each worker node. This only used if using the built in tamr_dataproc_cluster_config configuration	`number`	`2`	no
tamr_dataproc_cluster_worker_preemptible_machine_type	machine type of preemptible worker pool. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`"n1-standard-16"`	no
tamr_dataproc_cluster_worker_preemptible_num_instances	Number of preemptible workers to use. This only used if using the built in tamr_dataproc_cluster_config configuration	`number`	`0`	no
tamr_dataproc_cluster_worker_preemptible_num_local_ssds	Number of localssds to attach to each preemptible worker node. This only used if using the built in tamr_dataproc_cluster_config configuration	`number`	`2`	no
tamr_dataproc_cluster_zone	Zone to launch dataproc cluster into. If not set will use the tamr_instance_zone as the default value. This only used if using the built in tamr_dataproc_cluster_config configuration	`string`	`""`	no
tamr_dataproc_image_version	Dataproc image versionmage	`string`	`"2.0"`	no
tamr_dataproc_project_id	Project for the dataproc cluster. If not set will use the tamr_instance_project as the default value.	`string`	`""`	no
tamr_es_apihost	The hostname and port of the REST API endpoint of the Elasticsearch cluster to use. If unset will use < ip of vm>:9200	`string`	`""`	no
tamr_es_enabled	Whether Tamr will index user data in Elasticsearch or not. Elasticsearch is used to power Tamr's interactive data UI, so when this is set to false Tamr will run 'headless,' that is, without its core UI capabilities. It can be useful to disable Elasticsearch in production settings where the models are trained on a separate instance and the goal is to maximize pipeline throughput.	`bool`	`true`	no
tamr_es_number_of_shards	The number of shards to set when creating the Tamr index in Elasticsearch. Default value is the number of cores on the local host machine, so this should be overridden when using a remote Elasticsearch cluster. Note: this value is only applied when the index is created.	`number`	`1`	no
tamr_es_password	Password to use to authenticate to Elasticsearch, using basic authentication. Not required unless the Elasticsearch cluster you're using has security and authentication enabled. The value passed in may be encrypted.	`string`	`null`	no
tamr_es_socket_timeout	Defines the socket timeout for Elasticsearch clients, in milliseconds. This is the timeout for waiting for data or, put differently, a maximum period of inactivity between two consecutive data packets. A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined (system default). The default value is 900000, i.e., fifteen minutes.	`number`	`900000`	no
tamr_es_ssl_enabled	Whether to connect to Elasticsearch over https or not. Default is false (http).	`bool`	`false`	no
tamr_es_user	Username to use to authenticate to Elasticsearch. Not required unless the Elasticsearch cluster you're using has security and authentication enabled.	`string`	`""`	no
tamr_hbase_namespace	HBase namespace to user, for bigtable this will be the table prefix.	`string`	`"ns0"`	no
tamr_json_logging	Toggle json formatting for tamr logs.	`bool`	`false`	no
tamr_license_key	Set a tamr license key	`string`	`""`	no
tamr_spark_driver_memory	Amount of memory spark should allocate to spark driver	`string`	`"12G"`	no
tamr_spark_executor_cores	Amount of cores spark should allocate to each spark executor	`number`	`5`	no
tamr_spark_executor_instances	number of spark executor instances	`number`	`12`	no
tamr_spark_executor_memory	Amount of memory spark should allocate to each spark executor	`string`	`"13G"`	no
tamr_spark_properties_override	json blob of spark properties to override, if not set will use a default set of properties that should work for most use cases	`string`	`""`	no
tamr_sql_user	username for the cloud sql user	`string`	`"tamr"`	no

Outputs

Name	Description
tamr_config_file	full tamr config file
tmpl_dataproc_config	dataproc config

References

This repo is based on:

Development

Generating Docs

Run make terraform/docs to generate the section of docs around terraform inputs, outputs and requirements.

Checkstyles

Run make lint, this will run terraform fmt, in addition to a few other checks to detect whitespace issues. NOTE: this requires having docker working on the machine running the test

Releasing new versions

Update version contained in VERSION
Document changes in CHANGELOG.md
Create a tag in github for the commit associated with the version

License

Apache 2 Licensed. See LICENSE for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
examples/minimal		examples/minimal
modules		modules
.gitignore		.gitignore
.releaserc.yaml		.releaserc.yaml
CHANGELOG.md		CHANGELOG.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
dataproc.yaml.tmpl		dataproc.yaml.tmpl
main.tf		main.tf
outputs.tf		outputs.tf
spark_properties.json		spark_properties.json
tamr_config.yaml.tmpl		tamr_config.yaml.tmpl
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terraform Module Template

Examples

Minimal

Resources Created

Requirements

Providers

Inputs

Outputs

References

Development

Generating Docs

Checkstyles

Releasing new versions

License

About

Releases 10

Packages

Contributors 8

Languages

License

Datatamer/terraform-gcp-tamr-config

Folders and files

Latest commit

History

Repository files navigation

Terraform Module Template

Examples

Minimal

Resources Created

Requirements

Providers

Inputs

Outputs

References

Development

Generating Docs

Checkstyles

Releasing new versions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 8

Languages

Packages