tencent cloud

Feedback

Speech Recognition

Last updated: 2024-03-01 14:57:12

    Overview

    This document describes how to use CI's speech recognition template SDK.
    API
    Operation
    Description
    Creates a template
    Creating a template
    Deleting automatic speech recognition template
    Deletes a template
    Deleting a template
    Queries templates
    Querying the list of templates.
    Modifies a template
    Modifying a template

    Basic Operations

    Creating template

    Feature description

    This API is used to create a template.

    Method prototype

    def ci_create_asr_template(self, Bucket, Name, EngineModelType, ChannelNum,
    ResTextFormat, FilterDirty=0, FilterModal=0, ConvertNumMode=0, SpeakerDiarization=0,
    SpeakerNumber=0, FilterPunc=0, OutputFileType='txt', **kwargs)

    Parameter description

    Request has the following sub-nodes:
    
    Request has the following sub-nodes:
    Node Name (Keyword)
    Description
    Type
    Required
    Bucket
    Bucket name
    String
    Yes
    Name
    Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*).
    String
    Yes
    EngineModelType
    Engine model type, divided into phone call and non-phone call scenarios.
    Phone call scenarios:
    8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
    8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
    8k_en: 8 kHz, for English. Non-phone call scenarios:
    16k_zh: 16 kHz, for Mandarin in general scenarios.
    16k_zh_video: 16 kHz, for audio/video scenarios.
    16k_en: 16 kHz, for English.
    16k_ca: 16 kHz, for Cantonese.
    16k_ja: 16 kHz, for Japanese.
    16k_zh_edu: For Mandarin in education scenarios.
    16k_en_edu: For English in education scenarios.
    16k_zh_medical: For healthcare scenarios.
    16k_th: For Thai.
    16k_zh_dialect: Multi-dialect, for up to 23 dialects.
    String
    Yes
    ChannelNum
    Number of sound channels:
    1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported.
    2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).
    int
    No
    ResTextFormat
    Format of the returned recognition result.
    0: Recognition result text, including the list of segment timestamps.
    1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles).
    2: Detailed word-level recognition result, including punctuation marks and the speech speed value.
    int
    Yes
    FilterDirty
    Whether to filter restricted words (for the Mandarin engine only).
    0: Does not filter.
    1: Filters.
    2: Replaces restricted words with *.
    Default value: 0.
    int
    No
    FilterModal
    Whether to filter modal particles (for the Mandarin engine only).
    0: Does not filter.
    1: Filters partially.
    2: Filters strictly.
    Default value: 0.
    int
    No
    ConvertNumMode
    Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
    0: Directly outputs Chinese numbers.
    1: Intelligently converts based on the scenario.
    3: Enables mathematic number conversion.
    Default value: 0.
    int
    No
    SpeakerDiarization
    Whether to enable speaker separation:
    0: No.
    1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only).
    Default value: 0.
    Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.
    int
    No
    SpeakerNumber
    Number of speakers to be separated (with speaker separation enabled). Value range: 0–10.
    0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.
    int
    No
    FilterPunc
    Whether to filter punctuation marks (currently for the Mandarin engine only):
    0: Does not filter.
    1: Filters the punctuation mark at the end of the sentence.
    2: Filters all punctuation marks.
    Default value: 0.
    int
    No
    OutputFileType
    Output file type. Valid values: txt (default), srt.
    String
    No

    Sample request

    def ci_create_asr_template():
    # Create a speech recognition template
    response = client.ci_create_asr_template(
    Bucket=bucket_name,
    Name='templateName',
    EngineModelType='16k_zh',
    ChannelNum=1,
    ResTextFormat=2,
    )
    print(response)
    return response

    Response description

    {
    'RequestId': 'NjMyMjliMWZfZWM0YTYyNjRfNWNmNF8xMDBh',
    'Template': {
    'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32',
    'Name': 'templateName',
    'State': 'Normal',
    'Tag': 'SpeechRecognition',
    'CreateTime': '2022-09-15T11:25:19+0800',
    'UpdateTime': '2022-09-15T11:25:19+0800',
    'BucketId': 'testpic-1253960454',
    'Category': 'Custom',
    'SpeechRecognition': {
    'EngineModelType': '16k_zh',
    'ChannelNum': '1',
    'ResTextFormat': '2',
    'FilterDirty': '0',
    'FilterModal': '0',
    'ConvertNumMode': '0',
    'SpeakerDiarization': '0',
    'SpeakerNumber': '0',
    'FilterPunc': '0',
    'OutputFileType': 'txt'
    }
    }
    }
    For more information on response fields, see the response in Creating Speech Recognition Template.

    Deleting template

    Feature description

    This API is used to delete a template.

    Method prototype

    def ci_delete_asr_template(self, Bucket, TemplateId, **kwargs)

    Parameter description

    Parameter
    Description
    Type
    Required
    Bucket
    Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
    String
    Yes
    TemplateId
    ID of the template to be canceled
    String
    Yes

    Sample request

    def ci_delete_asr_template():
    # Delete the specified speech recognition template
    response = client.ci_delete_asr_template(
    Bucket=bucket_name,
    TemplateId='t1bdxxxxxxxxxxxxxxxxx94a9',
    )
    print(response)
    return response

    Response description

    {
    'RequestId': 'NjMyMjlkZmRfZWM0YTYyNjRfNWNmNF8xMDBi',
    'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32'
    }
    For more information on response fields, see the response in DeleteTemplate.

    Querying template list

    Feature description

    This API is used to query the template list.

    Method prototype

    def ci_get_asr_template(self, Bucket, Category='Custom', Ids='', Name='', PageNumber=1, PageSize=10, **kwargs)

    Parameter description

    Parameter
    Description
    Type
    Required
    Bucket
    Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
    String
    Yes
    Category
    Template category: Custom or Official. Default value: Custom.
    String
    No
    Ids
    Template ID. If you enter multiple IDs, separate them by comma.
    String
    No
    Name
    Template name prefix
    String
    No
    PageNumber
    Page number
    Integer
    No
    PageSize
    Number of entries per page
    Integer
    No

    Sample request

    def ci_get_asr_template():
    # Get the information of speech recognition templates
    response = client.ci_get_asr_template(
    Bucket=bucket_name,
    )
    print(response)
    return response

    Response description

    {
    'TotalCount': '1',
    'RequestId': 'NjMyMjljNTlfMTIwNjUzMDlfMmUzYV8xMWNh',
    'PageNumber': '1',
    'PageSize': '10',
    'TemplateList': [
    {
    'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32',
    'Name': 'templateName',
    'State': 'Normal',
    'Tag': 'SpeechRecognition',
    'CreateTime': '2022-09-15T11:25:19+0800',
    'UpdateTime': '2022-09-15T11:25:19+0800',
    'BucketId': 'testpic-1253960454',
    'Category': 'Custom',
    'SpeechRecognition': {
    'EngineModelType': '16k_zh',
    'ChannelNum': '1',
    'ResTextFormat': '2',
    'FilterDirty': '0',
    'FilterModal': '0',
    'ConvertNumMode': '0',
    'SpeakerDiarization': '0',
    'SpeakerNumber': '0',
    'FilterPunc': '0',
    'OutputFileType': 'txt'
    }
    }
    ]
    }
    For more information on response fields, see the response in DescribeTemplates.

    Modifying template

    Feature description

    This API is used to modify a template.

    Method prototype

    def ci_update_asr_template(self, Bucket, TemplateId, Name, EngineModelType, ChannelNum,
    ResTextFormat, FilterDirty=0, FilterModal=0, ConvertNumMode=0, SpeakerDiarization=0,
    SpeakerNumber=0, FilterPunc=0, OutputFileType='txt', **kwargs)

    Parameter description

    Node Name (Keyword)
    Description
    Type
    Required
    bucketName
    Bucket name in the format of BucketName-APPID. For more information, see Bucket Overview.
    String
    Yes
    templateId
    ID of the template to be modified
    String
    Yes
    Note:
    Other parameters are the same as those of the template creation API as described in Creating Speech Recognition Template.

    Sample request

    def ci_update_asr_template():
    # Modify a speech recognition template
    response = client.ci_update_asr_template(
    Bucket=bucket_name,
    TemplateId='t1bdxxxxxxxxxxxxxxxxx94a9',
    Name='QueueId1',
    EngineModelType='16k_zh',
    ChannelNum=1,
    ResTextFormat=1,
    )
    print(response)
    return response

    Response description

    {
    'RequestId': 'NjMyMjlkNzhfMTIwNjUzMDlfMmUxZF8xMGM4',
    'Template': {
    'TemplateId': 't1c1287c04c147443da0b2cc7b8fbabf32',
    'Name': 'QueueId1',
    'State': 'Normal',
    'Tag': 'SpeechRecognition',
    'CreateTime': '2022-09-15T11:25:19+0800',
    'UpdateTime': '2022-09-15T11:35:20+0800',
    'BucketId': 'testpic-1253960454',
    'Category': 'Custom',
    'SpeechRecognition': {
    'EngineModelType': '16k_zh',
    'ChannelNum': '1',
    'ResTextFormat': '1',
    'FilterDirty': '0',
    'FilterModal': '0',
    'ConvertNumMode': '0',
    'SpeakerDiarization': '0',
    'SpeakerNumber': '0',
    'FilterPunc': '0',
    'OutputFileType': 'txt'
    }
    }
    }
    For more information on response fields, see the response in Updating Speech Recognition Template.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support