[TalkingAvatar] Add sample code for TTS talking avatar real-time API (#…

…2133) * [TalkingAvatar] Add sample code for TTS talking avatar real-time API * sample codes for batch avatar synthesis * Address repository check failure * update --------- Co-authored-by: Yulin Li <[email protected]>
Azure-Samples · Nov 14, 2023 · d56bdd1 · d56bdd1
1 parent cee450a
commit d56bdd1
Show file tree

Hide file tree

Showing 11 changed files with 1,401 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -18,6 +18,7 @@ proguard-rules.pro text
 *.cpp text
 *.cs text
 *.csproj text
+*.css text
 *.editorconfig text
 *.entitlements text
 *.go text

diff --git a/samples/batch-avatar/README.md b/samples/batch-avatar/README.md
@@ -0,0 +1,16 @@
+# Examples to use Batch Avatar Synthesis
+
+The Batch avatar synthesis API (Preview) provides asynchronous synthesis of talking avatar to generate avatar video content with the text input.
+The functionality is exposed through a REST API and is easy to access from many programming languages. The samples here do **NOT** require the installation of the Cognitive Service Speech SDK, but use the REST API directly instead.
+
+For a detailed explanation see the [batch synthesis documentation](https://docs.microsoft.com/azure/cognitive-services/speech-service/batch-synthesis) and the `README.md` in the language specific subdirectories.
+
+Available samples:
+
+| Language | Directory | Description |
+| ---------- | -------- | ----------- |
+| Python | [python](python) | Python client calling batch avatar synthesis REST API |
+
+## Note
+
+Refer to [this](../js/browser/avatar/README.md) for real time avatar synthesis.
diff --git a/samples/batch-avatar/python/README.md b/samples/batch-avatar/python/README.md
@@ -0,0 +1,27 @@
+# How to use the Speech Services Batch Avatar Synthesis API from Python
+
+## Install dependencies
+
+The sample uses the `requests` library. You can install it with the command
+
+```sh
+pip install requests
+```
+
+## Run the sample code
+
+The sample code itself is [synthesis.py](synthesis.py) and can be run using Python 3.8 or higher.
+You will need to adapt the following information to run the sample:
+
+1. Your Cognitive Services subscription key and region.
+
+    Some notes:
+
+    - You can get the subscription key from the "Keys and Endpoint" tab on your Cognitive Services or Speech resource in the Azure Portal.
+    - Batch avatar synthesis is only available for paid subscriptions, free subscriptions are not supported.
+    - Batch avatar synthesis is only available in these service regions: `West US 2`, `West Europe` and `South East Asia`
+
+1. (Optional:) The relationship between custom voice names and deployment ID, if you want to use custom voices.
+2. (Optional:) The URI of a writable Azure blob container, if you want to store the audio files in your own Azure storage.
+
+You can use a development environment like PyCharm or VS Code to edit, debug, and execute the sample.
diff --git a/samples/batch-avatar/python/synthesis.py b/samples/batch-avatar/python/synthesis.py
@@ -0,0 +1,121 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+# Copyright (c) Microsoft. All rights reserved.
+# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
+
+import json
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+
+import requests
+
+logging.basicConfig(stream=sys.stdout, level=logging.INFO,  # set to logging.DEBUG for verbose output
+        format="[%(asctime)s] %(message)s", datefmt="%m/%d/%Y %I:%M:%S %p %Z")
+logger = logging.getLogger(__name__)
+
+# Your Speech resource key and region
+# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
+
+SUBSCRIPTION_KEY = os.environ.get('SPEECH_KEY')
+SERVICE_REGION = os.environ.get('SPEECH_REGION')
+
+NAME = "Simple avatar synthesis"
+DESCRIPTION = "Simple avatar synthesis description"
+
+# The service host suffix.
+SERVICE_HOST = "customvoice.api.speech.microsoft.com"
+
+
+def submit_synthesis():
+    url = f'https://{SERVICE_REGION}.{SERVICE_HOST}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar'
+    header = {
+        'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY,
+        'Content-Type': 'application/json'
+    }
+
+    payload = {
+        'displayName': NAME,
+        'description': DESCRIPTION,
+        "textType": "PlainText",
+        'synthesisConfig': {
+            "voice": "en-US-JennyNeural",
+        },
+        # Replace with your custom voice name and deployment ID if you want to use custom voice.
+        # Multiple voices are supported, the mixture of custom voices and platform voices is allowed.
+        # Invalid voice name or deployment ID will be rejected.
+        'customVoices': {
+            # "YOUR_CUSTOM_VOICE_NAME": "YOUR_CUSTOM_VOICE_ID"
+        },
+        "inputs": [
+            {
+                "text": "Hi, I'm a virtual assistant created by Microsoft.",
+            },
+        ],
+        "properties": {
+            "customized": False, # set to True if you want to use customized avatar
+            "talkingAvatarCharacter": "lisa",  # talking avatar character
+            "talkingAvatarStyle": "graceful-sitting",  # talking avatar style, required for prebuilt avatar, optional for custom avatar
+            "videoFormat": "webm",  # mp4 or webm, webm is required for transparent background
+            "videoCodec": "vp9",  # hevc, h264 or vp9, vp9 is required for transparent background; default is hevc
+            "subtitleType": "soft_embedded",
+            "backgroundColor": "transparent",
+        }
+    }
+
+    response = requests.post(url, json.dumps(payload), headers=header)
+    if response.status_code < 400:
+        logger.info('Batch avatar synthesis job submitted successfully')
+        logger.info(f'Job ID: {response.json()["id"]}')
+        return response.json()["id"]
+    else:
+        logger.error(f'Failed to submit batch avatar synthesis job: {response.text}')
+
+
+def get_synthesis(job_id):
+    url = f'https://{SERVICE_REGION}.{SERVICE_HOST}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar/{job_id}'
+    header = {
+        'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY
+    }
+    response = requests.get(url, headers=header)
+    if response.status_code < 400:
+        logger.debug('Get batch synthesis job successfully')
+        logger.debug(response.json())
+        if response.json()['status'] == 'Succeeded':
+            logger.info(f'Batch synthesis job succeeded, download URL: {response.json()["outputs"]["result"]}')
+        return response.json()['status']
+    else:
+        logger.error(f'Failed to get batch synthesis job: {response.text}')
+
+
+def list_synthesis_jobs(skip: int = 0, top: int = 100):
+    """List all batch synthesis jobs in the subscription"""
+    url = f'https://{SERVICE_REGION}.{SERVICE_HOST}/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar?skip={skip}&top={top}'
+    header = {
+        'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY
+    }
+    response = requests.get(url, headers=header)
+    if response.status_code < 400:
+        logger.info(f'List batch synthesis jobs successfully, got {len(response.json()["values"])} jobs')
+        logger.info(response.json())
+    else:
+        logger.error(f'Failed to list batch synthesis jobs: {response.text}')
+
+
+if __name__ == '__main__':
+    job_id = submit_synthesis()
+    if job_id is not None:
+        while True:
+            status = get_synthesis(job_id)
+            if status == 'Succeeded':
+                logger.info('batch avatar synthesis job succeeded')
+                break
+            elif status == 'Failed':
+                logger.error('batch avatar synthesis job failed')
+                break
+            else:
+                logger.info(f'batch avatar synthesis job is still running, status [{status}]')
+                time.sleep(5)
diff --git a/samples/js/browser/avatar/README.md b/samples/js/browser/avatar/README.md
@@ -0,0 +1,74 @@
+# Instructions to run Microsoft Azure TTS Talking Avatar sample code
+
+## Basic Sample
+
+This sample demonstrates the basic usage of Azure text-to-speech avatar real-time API.
+
+* Step 1: Run the sample code by opening basic.html in a browser.
+
+* Step 2: Fill or select below information:
+    * Azure Speech Resource
+        * Region - the region of your Azure speech resource.
+        * Subscription Key - the subscription key of your Azure speech resource.
+    * ICE Server
+        * URL - the ICE server URL for WebRTC. e.g. `turn:relay.communication.microsoft.com:3478`. You can get the ICE server from ACS ([Azure Communication Services](https://learn.microsoft.com/azure/communication-services/overview)): you need follow [Create communication resource](https://learn.microsoft.com/azure/communication-services/quickstarts/create-communication-resource?tabs=windows&pivots=platform-azp) to create ACS resource, and then follow [Getting the relay configuration](https://learn.microsoft.com/azure/communication-services/quickstarts/relay-token?pivots=programming-language-python#getting-the-relay-configuration) to get ICE server url, ICE server username, and ICE server credential. For ICE server url, please make sure to use prefix `turn:`, instead of `stun:`.
+        * IceServerUsername - the username of the ICE server, which is provided together with the ICE server URL (see above).
+        * IceServerCredential - the credential (password) of the ICE server, which is provided together with the ICE server URL (see above).
+    * TTS Configuration
+        * TTS Voice - the voice of the TTS. Here is the [available TTS voices list](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts#supported-languages)
+        * Custom Voice Deployment ID (Endpoint ID) - the deployment ID (also called endpoint ID) of your custom voice. If you are not using a custom voice, please leave it empty.
+    * Avatar Configuration
+        * Avatar Character - The character of the avatar. By default it's 'lisa', and you can update this value to use a different avatar.
+        * Avatar Style - The style of the avatar. You can update this value to use a different avatar style. This parameter is optional for custom avatar.
+        * Background Color - The color of the avatar background.
+        * Custom Avatar - Check this if you are using a custom avatar.
+        * Transparent Background - Check this if you want to use transparent background for the avatar. When this is checked, the background color of the video stream from server side is automatically set to green(#00FF00FF), and the js code on client side (check the `makeBackgroundTransparent` function in main.js) will do the real-time matting by replacing the green color with transparent color.
+        * Video Crop - By checking this, you can crop the video stream from server side to a smaller size. This is useful when you want to put the avatar video into a customized rectangle area.
+
+* Step 3: Click `Start Session` button to setup video connection with Azure TTS Talking Avatar service. If everything goes well, you should see a live video with an avatar being shown on the web page.
+
+* Step 4: Type some text in the `Spoken Text` text box and click `Speak` button to send the text to Azure TTS Talking Avatar service. The service will synthesize the text to talking avatar video, and send the video stream back to the browser. The browser will play the video stream. You should see the avatar speaking the text you typed with mouth movement, and hear the voice which is synchronized with the mouth movement.
+
+* Step 5: You can either continue to type text in the `Spoken Text` text box and let the avatar speak that text by clicking `Speak` button, or click `Stop Session` button to stop the video connection with Azure TTS Talking Avatar service. If you click `Stop Session` button, you can click `Start Session` button to start a new video connection with Azure TTS Talking Avatar service.
+
+## Chat Sample
+
+This sample demonstrates the chat scenario, with integration of Azure speech-to-text, Azure OpenAI, and Azure text-to-speech avatar real-time API.
+
+* Step 1: Run the sample code by opening chat.html in a browser.
+
+* Step 2: Fill or select below information:
+    * Azure Speech Resource
+        * Region - the region of your Azure speech resource.
+        * Subscription Key - the subscription key of your Azure speech resource.
+    * Azure OpenAI Resource
+        * Endpoint - the endpoint of your Azure OpenAI resource, e.g. https://your-openai-resource-name.openai.azure.com/, which can be found in the `Keys and Endpoint` section of your Azure OpenAI resource in Azure portal.
+        * API Key - the API key of your Azure OpenAI resource, which can be found in the `Keys and Endpoint` section of your Azure OpenAI resource in Azure portal.
+        * Deployment Name - the name of your Azure OpenAI model deployment, which can be found in the `Model deployments` section of your Azure OpenAI resource in Azure portal.
+        * Enable BYOD (Bring Your Own Data) - check this if you want to use your own data to constrain the chat. If you check this, you need to fill `Azure Cognitive Search Resource` section below.
+    * Azure Cognitive Search Resource - if you want to constrain the chat within your own data, please follow [Quickstart: Chat with Azure OpenAI models using your own data](https://learn.microsoft.com/azure/cognitive-services/openai/use-your-data-quickstart?pivots=programming-language-studio) to create your data source, and then fill below information:
+        * Endpoint - the endpoint of your Azure Cognitive Search resource, e.g. https://your-cogsearch-resource-name.search.windows.net/, which can be found in the `Overview` section of your Azure Cognitive Search resource in Azure portal, appearing at `Essentials -> Url` field.
+        * API Key - the API key of your Azure Cognitive Search resource, which can be found in the `Keys` section of your Azure Cognitive Search resource in Azure portal. Please make sure to use the `Admin Key` instead of `Query Key`.
+        * Index Name - the name of your Azure Cognitive Search index, which can be found in the `Indexes` section of your Azure Cognitive Search resource in Azure portal.
+    * ICE Server
+        * URL - the ICE server URL for WebRTC. e.g. `turn:relay.communication.microsoft.com:3478`. You can get the ICE server from ACS ([Azure Communication Services](https://learn.microsoft.com/azure/communication-services/overview)): you need follow [Create communication resource](https://learn.microsoft.com/azure/communication-services/quickstarts/create-communication-resource?tabs=windows&pivots=platform-azp) to create ACS resource, and then follow [Getting the relay configuration](https://learn.microsoft.com/azure/communication-services/quickstarts/relay-token?pivots=programming-language-python#getting-the-relay-configuration) to get ICE server url, ICE server username, and ICE server credential. For ICE server url, please make sure to use prefix `turn:`, instead of `stun:`.
+        * IceServerUsername - the username of the ICE server, which is provided together with the ICE server URL (see above).
+        * IceServerCredential - the credential (password) of the ICE server, which is provided together with the ICE server URL (see above).
+    * STT / TTS Configuration
+        * STT Locale - the locale of the STT. Here is the [available STT languages list](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=stt#supported-languages)
+        * TTS Voice - the voice of the TTS. Here is the [available TTS voices list](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts#supported-languages)
+        * Custom Voice Deployment ID (Endpoint ID) - the deployment ID (also called endpoint ID) of your custom voice. If you are not using a custom voice, please leave it empty.
+    * Avatar Configuration
+        * Avatar Character - The character of the avatar. By default it's 'lisa', and you can update this value to use a different avatar.
+        * Avatar Style - The style of the avatar. You can update this value to use a different avatar style. This parameter is optional for custom avatar.
+        * Custom Avatar - Check this if you are using a custom avatar.
+
+* Step 3: Click `Open Video Connection` button to setup video connection with Azure TTS Talking Avatar service. If everything goes well, you should see a live video with an avatar being shown on the web page.
+
+* Step 4: Click `Start Microphone` button to start microphone (make sure to allow the microphone access tip box popping up in the browser), and then you can start chatting with the avatar with speech. The chat history (the text of what you said, and the response text by the Azure OpenAI chat API) will be shown beside the avatar. The avatar will then speak out the response of the chat API.
+
+* Step 5: If you want to clear the chat history and start a new round of chat, you can click `Clear Chat History` button. And if you want to stop the avatar service, please click `Close Video Connection` button to close the connection with avatar service. 
+
+# Additional Tip(s)
+
+* For the chat sample, you can edit the text in `System Prompt` text box to preset the context for the chat API. The chat API will then generate the response based on this context.