This article answers FAQs, describes advanced features that allow customization and debugging of Cromwell on Azure, as well as how to diagnose, debug, and work around known issues. We are actively tracking these as bugs to be fixed in upcoming releases!
-
Setup
- I am trying to setup Cromwell on Azure for multiple users on the same subscription, what are the ways I can do this?
- I ran the Cromwell on Azure installer and it failed. How can I fix it?
- How can I upgrade my Cromwell on Azure instance?
-
Analysis
- I submitted a job and it failed almost immediately. What happened to it?
- I can only run small workflows but not workflows that require multiple tasks with a large total cpu cores requirement. How do I increase workflows capacity?
- How do I setup my own WDL to run on Cromwell?
- How can I see how far along my workflow has progressed?
- My workflow failed at task X. Where should I look to determine why it failed?
- Which tasks failed?
- Some tasks are stuck or my workflow is stuck in the "inprogress" directory in the "workflows" container. Were there Azure infrastructure issues?
- My jobs are taking a long time in the "Preparing" task state, even with "smaller" input files and VMs being used. Why is that?
-
Customizing your instance
- How can I customize my Cromwell on Azure deployment?
- How can I use a specific Cromwell image version?
- How do I use input data files for my workflows from a different Azure Storage account that my lab or team is currently using?
- Can I connect a different batch account with previously increased quotas to run my workflows?
- How can I use private Docker containers for my workflows?
- A lot of tasks for my workflows run longer than 24 hours and have been randomly stopped. How can I run all my tasks on dedicated batch VMs?
- Can I get direct access to Cromwell's REST API?
-
Performance & Optimization
- How can I figure out how much Cromwell on Azure costs me?
- How much am I paying for my Cromwell on Azure instance?
- How are batch VMs selected to run tasks in a workflow?
- Do you have guidance on how to optimize my WDLs?
- How can I figure out how much Cromwell on Azure costs me?
-
Miscellaneous
- I cannot find my issue in this document and want more information from Cromwell, MySQL, or TES Docker container logs.
- I am running a large amount of workflows and MySQL storage disk is full
- How can I run CWL files on Cromwell on Azure?
There is currently a bug (which we are tracking) in a dependency tool we use to get files from Azure Storage to the VM to perform a task. For now, follow these steps as a workaround if you are running into errors getting access to your files using SAS tokens on Cromwell on Azure. If you followed these instructions to create a SAS URL, you’ll get something similar to
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28si=inputs-key&sr=c&sig=somestring
Focus on this part: si=inputs-key&sr=c
Manually change order of sr
and si
fields to get something similar to
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28&sr=c&si=inputs-keysig=somestring
After the change, sr=c&si=inputs-key should be the order in your SAS URL.
Update all the SAS URLs similarly and retry your workflow.
All TES tasks for my workflow are done running, but the trigger JSON file is still in the "inprogress" directory in the workflows container
- The root cause is most likely memory pressure on the host Linux VM because blobfuse processes grow to consume all physical memory.
You may see the following Cromwell container logs as a symptom:
Cromwell shutting down because it cannot access the database): Shutting down cromid-5bd1d24 as at least 15 minutes of heartbeat write errors have occurred between 2020-02-18T22:03:01.110Z and 2020-02-18T22:19:01.111Z (16.000016666666667 minutes
To mitigate, please resize your VM in the resource group to a machine with at least 14GB memory/RAM. Any workflows still in progress will not be affected.
- Another possible scenario is that the "mysql" database is in an unusable state, which means Cromwell cannot continue processing workflows.
You may see the following Cromwell container logs as a symptom:
Failed to instantiate Cromwell System. Shutting down Cromwell. liquibase.exception.LockException: Could not acquire change log lock. Currently locked by 012ec19c3285 (172.18.0.4) since 2/19/20 4:10 PM
Note: This has been fixed in Release 2.1. If you use the 2.1 deployer or update to this version, you can skip the mitigation steps below
For Release 2.0 and below: To mitigate, log on to the host VM and execute the following and then restart the VM:
sudo docker exec -it cromwellazure_mysqldb_1 bash -c 'mysql -ucromwell -Dcromwell_db -pcromwell -e"SELECT * FROM DATABASECHANGELOGLOCK;UPDATE DATABASECHANGELOGLOCK SET LOCKED=0, LOCKGRANTED=null, LOCKEDBY=null where ID=1;SELECT * FROM DATABASECHANGELOGLOCK;"'
Cromwell on Azure is designed to be flexible for single and multiple user scenarios. Here we have envisioned 4 general scenarios and demonstrated how they relate to your Azure account, Azure Batch service, Subscription ID, and Resource Groups, each depicted below.
-
The Individual User: This is the current standard deployment configuration for Cromwell on Azure. No extra steps beyond the deployment guide are necessary.
-
The Lab: This scenario is envisioned for small lab groups and teams sharing a common Azure resource (ie. a common bioinformatician(s), data scientist(s), or computational biologist(s) collaborating on projects from the same lab). Functionally, this setup does not differ from the "Individual User" configuration. We recommend a single "Cromwell Administrator" perform the initial Cromwell on Azure setup for the group. Ensure that this user has the appropriate role(s) on the Subscription ID as outlined here. Once deployed, this "Cromwell Administrator" can grant "Contributor" access to the created Cromwell storage account via the Azure Portal. This would allow granted users the ability to submit analysis jobs and retrieve results. It would also allow them the ability to view any analysis that has been run by the lab. As Cromwell submits all jobs to Azure Batch as one user, the billing for Cromwell on Azure usage would be collective for the entire lab, not broken down by individual users who submitted the jobs.
-
The Research Group: This scenario is envisioned for larger research groups where a common Azure subscription is shared, but users want/require their own instance of Cromwell on Azure. The initial Cromwell on Azure deployment is done as described in the deployment guide. After the first deployment of Cromwell on Azure is done on the Subscription, subsequent users will need to specify a separate Resource Group AND preexisting Azure Batch account name that is currently being utilized by the pre-existing deployment(s) of Cromwell on Azure. The Azure Batch account must exist in the same region as defined in the "--RegionName" configuration of the new Cromwell on Azure deployment. You can check all the configuration options here. See the invocation of the Linux deployment script for an example:
.\deploy-cromwell-on-azure-linux --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --ResourceGroupName <Your resource group> --BatchAccountName <Your Batch account name>
In this scenario, please note the lack of separation at the Azure Batch account level. While you will be able track resource usage independently due to the separate Cromwell users submitting analyses to Azure Batch (for your own tracking/internal billing purposes), anyone who has access to Azure Batch as a Contributor or Owner will be able to see everyone's Batch pools, and thus what they are running. For this scenario, we would recommend the Cromwell Administrator(s) be trusted personnel, such as your IT team.
-
The Institution: This is an enterprise level deployment scenario for a large organization with multiple Subscriptions and independent user groups within an internal hierarchy. In this scenario, due to the independent nature of the work being done and the desire/need to track specific resource usage (for your own internal billing purposes) you will have completely independent deployments of Cromwell on Azure.
To deploy, you'll need to verify whether an existing Azure Batch account already exists on your Subscription (to run Cromwell on Azure on the Subscription level), or within your Resource Group as described in the deployment guide, with appropriate roles set. If Azure Batch account is not deployed on your Subscription (or if you have available quota to create a new Batch account - the default for most accounts is 1 Batch account/region), then simply follow the deployment guide. If there is an existing Azure Batch account you're connecting to within your Subscription, simply follow the deployment recommendations outlined in [3], adding the appropriate flags for the deployment script. See the invocation of the Linux deployment script for an example:
.\deploy-cromwell-on-azure-linux --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --ResourceGroupName <Your resource group> --BatchAccountName <Your Batch account name>
Please note you can also mix scenarios 1, 2, and 3 within the Azure Enterprise Account in scenario 4.
When the Cromwell on Azure installer is run, if there are errors, the logs are printed in the terminal. Most errors are related to insufficient permissions to create resources in Azure on your behalf, or intermittent Azure failures. In case of an error, we terminate the installation process and begin deleting all the resources in the Resource Group if already created.
Deleting all the resources in the Resource Group may take a while but as soon as you see logs that the batch account was deleted, you may exit the current process using Ctrl+C or Command+C on terminal/command prompt/PowerShell. The deletion of other Azure resources can continue in the background on Azure. Re-run the installer after fixing any user errors like permissions from the previous try.
If you see an issue that is unrelated to your permissions, and re-trying the installer does not fix it, please file a bug on our GitHub issues.
Starting in version 1.x, for convenience, some configuration files are hosted on your Cromwell on Azure storage account, in the "configuration" container - containers-to-mount
, and cromwell-application.conf
. You can modify and save these file using Azure Portal UI "Edit Blob" option or simply upload a new file to replace the existing one. Follow these steps to upgrade your Cromwell on Azure instance to 2.x.
If a workflow you start has a task that failed immediately and lead to workflow failure be sure to check your input JSON files. Follow the instructions here and check out an example WDL and inputs JSON file here to ensure there are no errors in defining your input files.
For files hosted on an Azure Storage account that is connected to your Cromwell on Azure instance, the input path consists of 3 parts - the storage account name, the blob container name, file path with extension, following this format:
/<storageaccountname>/<containername>/<blobName>
Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
"/msgenpublicdata/inputs/chr21.read1.fq.gz"
Another possibility is that you are trying to use a storage account that hasn't been mounted to your Cromwell on Azure instance - either by default during setup or by following these steps to mount a different storage account.
Check out these known issues and mitigation for more commonly seen issues caused by bugs we are actively tracking.
If you are running a task in a workflow with a large cpu cores requirement, check if your Batch account has enough resource quotas. You can request more quotas by following these instructions.
For other resource quotas, like active jobs or pools, if there are not enough resources available, Cromwell on Azure keeps the tasks in queue until resources become available. This may lead to longer wait times for workflow completion.
To get started you can view this Hello World sample, an example WDL to convert FASTQ to UBAM or follow these steps to convert an existing public WDL for other clouds to run on Azure.
There are also links to ready-to-try WDLs for common workflows here
Instructions to write a WDL file for a pipeline from scratch are COMING SOON.
Each task in a workflow starts an Azure Batch VM. To see currently active tasks, navigate to your Azure Batch account connected to Cromwell on Azure on Azure Portal. Click on "Jobs" and then search for the Cromwell workflowId
to see all tasks associated with a workflow.
Cosmos DB stores information about all tasks in a workflow. For monitoring or debugging any workflow you may choose to query the database.
Navigate to your Cosmos DB instance on Azure Portal. Click on the "Data Explorer" menu item, Click on the "TES" container and select "Items".
You can write a SQL query to get all tasks that have not completed successfully in a workflow using the following query, replacing workflowId
with the id returned from Cromwell for your workflow:
SELECT * FROM c where startswith(c.description,"workflowId") AND c.state != "COMPLETE"
OR
SELECT * FROM c where startswith(c.id,"<first 9 character of the workflowId>") AND c.state != "COMPLETE"
When working with Cromwell on Azure, you may run into issues with Azure Batch or Storage accounts. For instance, if a file path cannot be found or if the WDL workflow failed with an unknown reason. For these scenarios, consider debugging or collecting more information using Application Insights.
Navigate to your Application Insights instance on Azure Portal. Click on the "Logs (Analytics)" menu item under the "Monitoring" section to get all logs from Cromwell on Azure's TES backend.
You can explore exceptions or logs to find the reason for failure, and use time ranges or Kusto Query Language to narrow your search.
Cromwell utilizes Blob storage containers and Blobfuse to allow your data to be accessed and processed. The Blob Storage Access Tier can have a demonstrable effect on your analysis time, particularly on your initial VM preparation. If you experience this, we would recommend setting your access tier to "Hot" instead of "Cool". You can do this under the "Access Tier" settings in the "Configuration" menu on Azure Portal. NOTE: this only affects users utilizing Gen2 Storage Accounts. All Gen 1 "Standard" blobs are access tier "Hot" by default.
To get logs from all the Docker containers or to use the Cromwell REST API endpoints, you may want to connect to the Linux host VM. At installation, a user is created to allow managing the host VM with username "vmadmin". The password is randomly generated and shown during installation. If you need to reset your VM password, you can do this using the Azure Portal or by following these instructions.
To connect to your host VM, you can either
- Construct your ssh connection string if you have the VM name
ssh vmadmin@<hostname>
OR - Navigate to the Connect button on the Overview blade of your Azure VM instance, then copy the ssh connection string.
Paste the ssh connection string in a command line, PowerShell or terminal application to log in.
Before deploying, you can choose to customize some input parameters to use existing Azure resources. Example:
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --VmSize "Standard_D2_v2"
Here is the summary of all configuration parameters:
Configuration parameter | Has default | Validated | Used by update | Comment |
---|---|---|---|---|
string SubscriptionId | N | Y | Y | Azure Subscription Id - Always required |
string RegionName | N | Y | N | Azure region name to deploy to - Required for new install |
string MainIdentifierPrefix = "coa" | Y | Y | N | Prefix for all resources to be deployed - Required to deploy but defaults to "coa" |
string VmOsVersion = "18.04-LTS" | Y | N | N | OS Version of the Linux Ubuntu VM to use as the host - Not required and defaults to Ubuntu 18.04 LTS |
string VmSize = "Standard_D3_v2" | Y | N | N | VM size of the Linux Ubuntu VM to use as the host - Not required and defaults to Standard_D3_v2 |
string VmUsername = "vmadmin"; | Y | N | Y | Username created on Cromwell on Azure Linux host - Not required and defaults to "vmadmin" |
string VmPassword | Y | N | Y | Required for update |
string VnetResourceGroupName | Y | Y | N | Available starting version 2.1. The resource group name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetName and SubnetName must be provided. |
string VnetName | Y | Y | N | Available starting version 2.1. The name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetResourceGroupName and SubnetName must be provided. |
string SubnetName | Y | Y | N | Available starting version 2.1. The subnet name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetResourceGroupName and VnetName must be provided. |
string ResourceGroupName | Y | Y | Y | Required for update. If provided for new Cromwell on Azure deployment, it must already exist. |
string BatchAccountName | Y | N | N | The name of the Azure Batch Account to use ; must be in the SubscriptionId and RegionName provided - Not required, generated automatically if not provided |
string StorageAccountName | Y | N | N | The name of the Azure Storage Account to use ; must be in the SubscriptionId provided - Not required, generated automatically if not provided |
string NetworkSecurityGroupName | Y | N | N | The name of the Network Security Group to use; must be in the SubscriptionId provided - Not required, generated automatically if not provided |
string CosmosDbAccountName | Y | N | N | The name of the Cosmos Db Account to use; must be in the SubscriptionId provided - Not required, generated automatically if not provided |
string ApplicationInsightsAccountName | Y | N | N | The name of the Application Insights Account to use; must be in the SubscriptionId provided - Not required, generated automatically if not provided |
string VmName | Y | N | Y | Name of the VM host that is part of the Cromwell on Azure deployment to update - Required for update if multiple VMs exist in the resource group |
string CromwellVersion | Y | N | Y | Cromwell version to use |
bool SkipTestWorkflow = false; | Y | Y | Y | Set to true to skip running the default test workflow |
bool Update = false; | Y | Y | Y | Set to true if you want to update your existing Cromwell on Azure deployment to the latest version. Required for update |
bool PrivateNetworking = false; | Y | Y | N | Available starting version 2.2. Set to true to create the host VM without public IP address. If set, VnetResourceGroupName, VnetName and SubnetName must be provided (and already exist). The deployment must be initiated from a machine that has access to that subnet. |
To choose a specific Cromwell version, you can specify the version as a configuration parameter before deploying Cromwell on Azure. Here is an example:
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --CromwellVersion 53
This version will persist through future updates until you set it again or revert to the default behavior by specifying --CromwellVersion ""
. See note below.
After deployment, you can still change the Cromwell docker image version being used.
Cromwell on Azure version 2.x
Run the deployer in update mode and specify the new Cromwell version.
.\deploy-cromwell-on-azure.exe --Update true --SubscriptionId <Your subscription ID> --ResourceGroupName <Your RG> --VmPassword <Your VM password> --CromwellVersion 54
The new version will persist through future updates until you set it again.
To revert to the default Cromwell version that is shipped with each deployer version, specify --CromwellVersion ""
.
Be aware of compatibility issues if downgrading the version.
The default version is listed here.
Cromwell on Azure version 1.x
Log on to the host VM using the ssh connection string as described in the instructions. Replace image name with the tag of your choice for the "cromwell" service in the docker-compose.yml
file.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify the cromwell service image name and save the file
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
. or run sudo reboot
. You can also restart the docker containers.
-
Add the VM identity as a Contributor to the Storage Account via Azure Portal or Azure CLI.
-
Navigate to the "configuration" container in the default storage account. Replace the values below with your Storage Account and Container names and add the line to the end of the
containers-to-mount
file:/yourstorageaccountname/yourcontainername
-
Save the changes and restart the VM
This is applicable if the VM and storage account are in different Azure tenants, or if you want to use SAS token anyway for security reasons
-
Add a SAS url for your desired container to the end of the
containers-to-mount
file. The SAS token can be at the account or container level and may be read-only or read-write depending on the usage.https://<yourstorageaccountname>.blob.core.windows.net:443/<yourcontainername>?<sastoken>
-
Save the changes and restart the VM
In both cases, the specified containers will be mounted as /yourstorageaccountname/yourcontainername/
on the Cromwell server. You can then use /yourstorageaccountname/yourcontainername/path
in the trigger, WDL, CWL, inputs and workflow options files.
Use a batch account for which I have already requested or received increased cores quota from Azure Support
Log on to the host VM using the ssh connection string as described in the instructions.
Cromwell on Azure version 2.x
Replace BatchAccountName
variable in the env-01-account-names.txt
file with the name of the desired batch account and save your changes.
cd /data/cromwellazure/
sudo nano env-01-account-names.txt
# Modify the BatchAccountName to your Batch Account name and save the file
Cromwell on Azure version 1.x
Replace BatchAccountName
environment variable for the "tes" service in the docker-compose.yml
file with the name of the desired batch account and save your changes.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify the BatchAccountName to your Batch Account name and save the file
To allow the host VM to use a batch account, add the VM identity as a Contributor to the Azure batch account via Azure Portal or Azure CLI.
To allow the host VM to read prices and information about types of machines available for the batch account, add the VM identity as a Billing Reader to the subscription with the configured Batch Account.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
. or run sudo reboot
.
Cromwell on Azure supports private Docker images for your WDL tasks hosted on Azure Container Registry or ACR.
To allow the host VM to use an ACR, add the VM identity as a Contributor to the Container Registry via Azure Portal or Azure CLI.
Configure my Cromwell on Azure instance to always use dedicated batch VMs to avoid getting preempted
By default, your workflows will run on low priority Azure batch nodes.
If you prefer to use dedicated Azure Batch nodes for all tasks, do the following:
Cromwell on Azure version 2.x
In file cromwell-application.conf
, in the configuration
container in the default storage account, in backend section, change preemptible: true
to preemptible: false
. Save your changes and restart the VM.
Note that you can override this setting for each task individually by setting the preemptible
boolean flag to true
or false
in the "runtime" attributes section of your task.
Cromwell on Azure version 1.x
Log on to the host VM using the ssh connection string as described in the instructions. Change the UsePreemptibleVmsOnly
environment variable for the "tes" service to "false" in the docker-compose.yml
file and save your changes.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify UsePreemptibleVmsOnly to false and save the file
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
.
Cromwell is run in server mode on the Linux host VM. After logging in to the host VM, it can be accessed via curl as described below:
Get all workflows
curl -X GET "http://localhost:8000/api/workflows/v1/query" -H "accept: application/json"
Get specific workflow's status by id
curl -X GET "http://localhost:8000/api/workflows/v1/{id}/status" -H "accept: application/json"
Get call-caching difference between two workflow calls
curl -X GET "http://localhost:8000/api/workflows/v1/callcaching/diff?workflowA={workflowId1}&callA={workflowName.callName1}&workflowB={workflowId2}&callB={workflowName.callName2}" -H "accept: application/json"
You can perform other Cromwell API calls following a similar pattern. To see all available API endpoints, see Cromwell's REST API here
To learn more about your Cromwell on Azure Resource Group's cost, navigate to the "Cost Analysis" menu item in the "Cost Management" section of your Azure Resource Group on the Azure Portal. More information here.
You can also use the Pricing Calculator to estimate your monthly cost.
VM price data is used to select the cheapest per hour VM for a task's runtime requirements, and is also stored in the TES database to allow calculation of total workflow cost. VM price data is obtained from the Azure RateCard API. Accessing the Azure RateCard API requires the VM's Billing Reader role to be assigned to your Azure subscription scope. If you don't have Owner, or both Contributor and User Access Administrator roles assigned to your Azure subscription, the deployer will not be able to complete this on your behalf - you will need to contact your Azure subscription administrator(s) to complete this for you. You will see a warning in the TES logs indicating that default VM prices are being used until this is resolved.
This section is COMING SOON.
The host VM is running multiple Docker containers that enable Cromwell on Azure - mysql, broadinstitute/cromwell, cromwellonazure/tes, cromwellonazure/triggerservice. On rare occasions, you may want to debug and diagnose issues with the Docker containers. After logging in to the host VM, run:
sudo docker ps
This command will list the names of all the Docker containers currently running. To get logs for a particular container, run:
sudo docker logs 'containerName'
To ensure that no data is corrupted for MySQL backed storage for Cromwell, Cromwell on Azure mounts MySQL files on to an Azure Managed Data Disk of size 32G. In case there is a need to increase the size of this data disk, follow instructions here.
Running workflows written in the Common Workflow Language(CWL) format is possible with a few modifications to your workflow submission.
For CWL workflows, all CWL resource keywords are supported, plus preemptible
(not in CWL spec). preemptible
defaults to true (set in Cromwell configuration file), so use preemptible
only if setting it to false (run on dedicated machine). TES keywords are also supported in CWL workflows, but we advise users to use the CWL ones.
CWL keywords: (CWL workflows only)
coresMin: number
ramMin: size in MB
tmpdirMin: size in MB - Cromwell on Azure version 2.0 and above only
outdirMin: size in MB - Cromwell on Azure version 2.0 and above only
(the final disk size is the sum of tmpDir and outDir values)
TES keywords: (both CWL and WDL workflows)
preemptible: true|false
Cromwell on Azure version 1.x known issue for CWL files: Cannot request specific HDD size Unfortunately, this is actually a bug in how Cromwell currently parses the CWL files and thus must be addressed in the Cromwell source code directly.
The current workaround for this is to increase the number of vCPUs
or memory
requested for a task, which will indirectly increase the amount of working disk space available. However, because this may cause inconsistent performance, we advise that if you are running a task that might consume a large amount of local scratch space, consider converting your workflow to the WDL format instead.