Tencent Cloud

Recent Pages

Speech Recognition

Last updated: 2024-03-01 14:57:12

Overview
This document describes how to use CI's speech recognition template SDK.
API
Operation
Description
Creating automatic speech recognition template
Creates a template
Creating a template
Deleting automatic speech recognition template
Deletes a template
Deleting a template
DescribeTemplates
Queries templates
Querying the list of templates.
Updating automatic speech recognition template
Modifies a template
Modifying a template
Basic Operations
Creating template
Feature description
This API is used to create a template.
Method prototype
def ci_create_asr_template(self, Bucket, Name, EngineModelType, ChannelNum,
                           ResTextFormat, FilterDirty=0, FilterModal=0, ConvertNumMode=0, SpeakerDiarization=0,
                           SpeakerNumber=0, FilterPunc=0, OutputFileType='txt', **kwargs)
Parameter description
Request has the following sub-nodes:
﻿
Request has the following sub-nodes:
Node Name (Keyword)
Description
Type
Required
Bucket
Bucket name
String
Yes
Name
Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*).
String
Yes
EngineModelType
Engine model type, divided into phone call and non-phone call scenarios.
Phone call scenarios:
8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
8k_en: 8 kHz, for English.
Non-phone call scenarios:
16k_zh: 16 kHz, for Mandarin in general scenarios.
16k_zh_video: 16 kHz, for audio/video scenarios.
16k_en: 16 kHz, for English.
16k_ca: 16 kHz, for Cantonese.
16k_ja: 16 kHz, for Japanese.
16k_zh_edu: For Mandarin in education scenarios.
16k_en_edu: For English in education scenarios.
16k_zh_medical: For healthcare scenarios.
16k_th: For Thai.
16k_zh_dialect: Multi-dialect, for up to 23 dialects.
String
Yes
ChannelNum
Number of sound channels:
1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported.
2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).
int
No
ResTextFormat
Format of the returned recognition result.
0: Recognition result text, including the list of segment timestamps.
1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles).
2: Detailed word-level recognition result, including punctuation marks and the speech speed value.
int
Yes
FilterDirty
Whether to filter restricted words (for the Mandarin engine only).
0: Does not filter.
1: Filters.
2: Replaces restricted words with *.
Default value: 0.
int
No
FilterModal
Whether to filter modal particles (for the Mandarin engine only).
0: Does not filter.
1: Filters partially.
2: Filters strictly.
Default value: 0.
int
No
ConvertNumMode
Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
0: Directly outputs Chinese numbers.
1: Intelligently converts based on the scenario.
3: Enables mathematic number conversion.
Default value: 0.
int
No
SpeakerDiarization
Whether to enable speaker separation:
0: No.
1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only).
Default value: 0.
Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.
int
No
SpeakerNumber
Number of speakers to be separated (with speaker separation enabled). Value range: 0–10.
0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.
int
No
FilterPunc
Whether to filter punctuation marks (currently for the Mandarin engine only):
0: Does not filter.
1: Filters the punctuation mark at the end of the sentence.
2: Filters all punctuation marks.
Default value: 0.
int
No
OutputFileType
Output file type. Valid values: txt (default), srt.
String
No
Sample request
def ci_create_asr_template():
    # Create a speech recognition template
    response = client.ci_create_asr_template(
        Bucket=bucket_name,
        Name='templateName',
        EngineModelType='16k_zh',
        ChannelNum=1,
        ResTextFormat=2,
    )
    print(response)
    return response
Response description
{
    'RequestId': 'NjMyMjliMWZfZWM0YTYyNjRfNWNmNF8xMDBh', 
    'Template': {
        'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32', 
        'Name': 'templateName', 
        'State': 'Normal', 
        'Tag': 'SpeechRecognition', 
        'CreateTime': '2022-09-15T11:25:19+0800', 
        'UpdateTime': '2022-09-15T11:25:19+0800', 
        'BucketId': 'testpic-1253960454', 
        'Category': 'Custom', 
        'SpeechRecognition': {
            'EngineModelType': '16k_zh', 
            'ChannelNum': '1', 
            'ResTextFormat': '2', 
            'FilterDirty': '0', 
            'FilterModal': '0', 
            'ConvertNumMode': '0', 
            'SpeakerDiarization': '0', 
            'SpeakerNumber': '0', 
            'FilterPunc': '0', 
            'OutputFileType': 'txt'
        }
    }
}
For more information on response fields, see the response in Creating Speech Recognition Template.
Deleting template
Feature description
This API is used to delete a template.
Method prototype
def ci_delete_asr_template(self, Bucket, TemplateId, **kwargs)
Parameter description
Parameter
Description
Type
Required
Bucket
Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
String
Yes
TemplateId
ID of the template to be canceled
String
Yes
Sample request
def ci_delete_asr_template():
    # Delete the specified speech recognition template
    response = client.ci_delete_asr_template(
        Bucket=bucket_name,
        TemplateId='t1bdxxxxxxxxxxxxxxxxx94a9',
    )
    print(response)
    return response
Response description
{
    'RequestId': 'NjMyMjlkZmRfZWM0YTYyNjRfNWNmNF8xMDBi', 
    'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32'
}
For more information on response fields, see the response in DeleteTemplate.
Querying template list
Feature description
This API is used to query the template list.
Method prototype
def ci_get_asr_template(self, Bucket, Category='Custom', Ids='', Name='', PageNumber=1, PageSize=10, **kwargs)
Parameter description
Parameter
Description
Type
Required
Bucket
Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
String
Yes
Category
Template category: Custom or Official. Default value: Custom.
String
No
Ids
Template ID. If you enter multiple IDs, separate them by comma.
String
No
Name
Template name prefix
String
No
PageNumber
Page number
Integer
No
PageSize
Number of entries per page
Integer
No
Sample request
def ci_get_asr_template():
    # Get the information of speech recognition templates
    response = client.ci_get_asr_template(
        Bucket=bucket_name,
    )
    print(response)
    return response
Response description
{
    'TotalCount': '1', 
    'RequestId': 'NjMyMjljNTlfMTIwNjUzMDlfMmUzYV8xMWNh', 
    'PageNumber': '1', 
    'PageSize': '10', 
    'TemplateList': [
        {
            'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32', 
            'Name': 'templateName', 
            'State': 'Normal', 
            'Tag': 'SpeechRecognition', 
            'CreateTime': '2022-09-15T11:25:19+0800', 
            'UpdateTime': '2022-09-15T11:25:19+0800', 
            'BucketId': 'testpic-1253960454', 
            'Category': 'Custom', 
            'SpeechRecognition': {
                'EngineModelType': '16k_zh', 
                'ChannelNum': '1', 
                'ResTextFormat': '2', 
                'FilterDirty': '0', 
                'FilterModal': '0', 
                'ConvertNumMode': '0', 
                'SpeakerDiarization': '0', 
                'SpeakerNumber': '0', 
                'FilterPunc': '0', 
                'OutputFileType': 'txt'
            }
        }
    ]
}
For more information on response fields, see the response in DescribeTemplates.
Modifying template
Feature description
This API is used to modify a template.
Method prototype
def ci_update_asr_template(self, Bucket, TemplateId, Name, EngineModelType, ChannelNum,
                           ResTextFormat, FilterDirty=0, FilterModal=0, ConvertNumMode=0, SpeakerDiarization=0,
                           SpeakerNumber=0, FilterPunc=0, OutputFileType='txt', **kwargs)  
Parameter description
Node Name (Keyword)
Description
Type
Required
bucketName
Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
String
Yes
templateId
ID of the template to be modified
String
Yes
Note: 
 Other parameters are the same as those of the template creation API as described in Creating Speech Recognition Template.
Sample request
def ci_update_asr_template():
    # Modify a speech recognition template
    response = client.ci_update_asr_template(
        Bucket=bucket_name,
        TemplateId='t1bdxxxxxxxxxxxxxxxxx94a9',
        Name='QueueId1',
        EngineModelType='16k_zh',
        ChannelNum=1,
        ResTextFormat=1,
    )
    print(response)
    return response
Response description
{
    'RequestId': 'NjMyMjlkNzhfMTIwNjUzMDlfMmUxZF8xMGM4', 
    'Template': {
        'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32', 
        'Name': 'QueueId1', 
        'State': 'Normal', 
        'Tag': 'SpeechRecognition', 
        'CreateTime': '2022-09-15T11:25:19+0800', 
        'UpdateTime': '2022-09-15T11:35:20+0800', 
        'BucketId': 'testpic-1253960454', 
        'Category': 'Custom', 
        'SpeechRecognition': {
            'EngineModelType': '16k_zh', 
            'ChannelNum': '1', 
            'ResTextFormat': '1', 
            'FilterDirty': '0', 
            'FilterModal': '0', 
            'ConvertNumMode': '0', 
            'SpeakerDiarization': '0', 
            'SpeakerNumber': '0', 
            'FilterPunc': '0', 
            'OutputFileType': 'txt'
        }
    }
}
For more information on response fields, see the response in Updating Speech Recognition Template.
﻿

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support

tencent cloud

Recent Pages

Speech Recognition

Overview

Basic Operations

Creating template

Feature description

Method prototype

Parameter description

Sample request

Response description

Deleting template

Feature description

Method prototype

Parameter description

Sample request

Response description

Querying template list

Feature description

Method prototype

Parameter description

Sample request

Response description

Modifying template

Feature description

Method prototype

Parameter description

Sample request

Response description

Was this page helpful?

Was this page helpful?

API	Operation	Description
Creating automatic speech recognition template	Creates a template	Creating a template
Deleting automatic speech recognition template	Deletes a template	Deleting a template
DescribeTemplates	Queries templates	Querying the list of templates.
Updating automatic speech recognition template	Modifies a template	Modifying a template

Node Name (Keyword)	Description	Type	Required
Bucket	Bucket name	String	Yes
Name	Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*).	String	Yes
EngineModelType	Engine model type, divided into phone call and non-phone call scenarios. Phone call scenarios: 8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio). 8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only). 8k_en: 8 kHz, for English. Non-phone call scenarios: 16k_zh: 16 kHz, for Mandarin in general scenarios. 16k_zh_video: 16 kHz, for audio/video scenarios. 16k_en: 16 kHz, for English. 16k_ca: 16 kHz, for Cantonese. 16k_ja: 16 kHz, for Japanese. 16k_zh_edu: For Mandarin in education scenarios. 16k_en_edu: For English in education scenarios. 16k_zh_medical: For healthcare scenarios. 16k_th: For Thai. 16k_zh_dialect: Multi-dialect, for up to 23 dialects.	String	Yes
ChannelNum	Number of sound channels: 1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported. 2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).	int	No
ResTextFormat	Format of the returned recognition result. 0: Recognition result text, including the list of segment timestamps. 1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles). 2: Detailed word-level recognition result, including punctuation marks and the speech speed value.	int	Yes
FilterDirty	Whether to filter restricted words (for the Mandarin engine only). 0: Does not filter. 1: Filters. 2: Replaces restricted words with *. Default value: 0.	int	No
FilterModal	Whether to filter modal particles (for the Mandarin engine only). 0: Does not filter. 1: Filters partially. 2: Filters strictly. Default value: 0.	int	No
ConvertNumMode	Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only): 0: Directly outputs Chinese numbers. 1: Intelligently converts based on the scenario. 3: Enables mathematic number conversion. Default value: 0.	int	No
SpeakerDiarization	Whether to enable speaker separation: 0: No. 1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only). Default value: 0. Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.	int	No
SpeakerNumber	Number of speakers to be separated (with speaker separation enabled). Value range: 0–10. 0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.	int	No
FilterPunc	Whether to filter punctuation marks (currently for the Mandarin engine only): 0: Does not filter. 1: Filters the punctuation mark at the end of the sentence. 2: Filters all punctuation marks. Default value: 0.	int	No
OutputFileType	Output file type. Valid values: txt (default), srt.	String	No

Parameter	Description	Type	Required
Bucket	Bucket name in the format of `BucketName-APPID`. For more information, see Bucket Overview.	String	Yes
TemplateId	ID of the template to be canceled	String	Yes

tencent cloud

Sign Up

Log in

Recent Pages

Speech Recognition

Overview

Basic Operations

Creating template

Feature description

Method prototype

Parameter description

Sample request

Response description

Deleting template

Feature description

Method prototype

Parameter description

Sample request

Response description

Querying template list

Feature description

Method prototype

Parameter description

Sample request

Response description

Modifying template

Feature description

Method prototype

Parameter description

Sample request

Response description

Was this page helpful?

Was this page helpful?