tencent cloud

Feedback

Creating Speech Recognition Template

Last updated: 2024-03-01 17:29:56

    Feature Description

    This API is used to create a speech recognition template.
    

    Request

    Sample request

    POST /template HTTP/1.1
    Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
    Date: <GMT Date>
    Authorization: <Auth String>
    Content-Length: <length>
    Content-Type: application/xml
    
    <body>
    Note:
    Authorization: Auth String (for more information, see Request Signature).
    When this feature is used by a sub-account, relevant permissions must be granted. For more information, see Authorization Granularity Details.

    Request headers

    This API only uses common request headers. For more information, see Common Request Headers.

    Request body

    This request requires the following request body:
    <Request>
    <Tag>SpeechRecognition</Tag>
    <Name>TemplateName</Name>
    <SpeechRecognition>
    <EngineModelType>16k_zh</EngineModelType>
    <ResTextFormat>1</ResTextFormat>
    <FilterDirty>0</FilterDirty>
    <FilterModal>1</FilterModal>
    <ConvertNumMode>0</ConvertNumMode>
    <SpeakerDiarization>1</SpeakerDiarization>
    <SpeakerNumber>0</SpeakerNumber>
    <FilterPunc>0</FilterPunc>
    <OutputFileType>txt</OutputFileType>
    </SpeechRecognition>
    </Request>
    The nodes are described as follows:
    Node Name (Keyword)
    Parent Node
    Description
    Type
    Required
    Request
    None
    Request container
    Container
    Yes
    Request has the following sub-nodes:
    Node Name (Keyword)
    Parent Node
    Description
    Type
    Required
    Constraints
    Tag
    Request
    Template tag: SpeechRecognition.
    String
    Yes
    No
    Name
    Request
    Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*).
    String
    Yes
    None
    SpeechRecognition
    Request
    Speech recognition parameter.
    Container
    Yes
    None
    SpeechRecognition has the following sub-nodes:
    
    SpeechRecognition has the following sub-nodes:
    Node Name (Keyword)
    Parent Node
    Description
    Type
    Required
    EngineModelType
    Request.Speech Recognition
    Engine model type, divided into phone call and non-phone call scenarios.
    Phone call scenarios:
    8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
    8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
    8k_en: 8 kHz, for English.
    Non-phone call scenarios:
    16k_zh: 16 kHz, for Mandarin in general scenarios.
    16k_zh_video: 16 kHz, for audio/video scenarios.
    16k_en: 16 kHz, for English.
    16k_ca: 16 kHz, for Cantonese.
    16k_ja: 16 kHz, for Japanese.
    16k_zh_edu: For Mandarin in education scenarios.
    16k_en_edu: For English in education scenarios.
    16k_zh_medical: For healthcare scenarios.
    16k_th: For Thai.
    16k_zh_dialect: Multi-dialect, for up to 23 dialects.
    Fast ASR is only supported for 8k_zh, 16k_zh, 16k_en, and 16k_zh_video.
    String
    Yes
    ChannelNum
    Request.Speech Recognition
    This parameter is supported only for general ASR.
    Number of sound channels:
    1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported.
    2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).
    Integer
    Yes
    ResTextFormat
    Request.Speech Recognition
    This parameter is supported only for general ASR.
    Format of the returned recognition result.
    0: Recognition result text, including the list of segment timestamps.
    1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles).
    2: Detailed word-level recognition result, including punctuation marks and the speech speed value.
    Integer
    Yes
    FilterDirty
    Request.Speech Recognition
    Whether to filter restricted words (for the Mandarin engine only).
    0: Does not filter.
    1: Filters.
    2: Replaces restricted words with *.
    Default value: 0.
    Integer
    No
    FilterModal
    Request.Speech Recognition
    Whether to filter modal particles (for the Mandarin engine only).
    0: Does not filter.
    1: Filters partially.
    2: Filters strictly.
    Default value: 0.
    Integer
    No
    ConvertNumMode
    Request.Speech Recognition
    Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
    0: Directly outputs Chinese numbers.
    1: Intelligently converts based on the scenario.
    3: Enables mathematic number conversion. This parameter is supported only for general ASR.
    Default value: 0.
    Integer
    No
    SpeakerDiarization
    Request.Speech Recognition
    Whether to enable speaker separation:
    0: No.
    1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only).
    Default value: 0. Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.
    Integer
    No
    SpeakerNumber
    Request.Speech Recognition
    This parameter is supported only for general ASR.
    Number of speakers to be separated (with speaker separation enabled). Value range: 0–10.
    0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.
    Integer
    No
    FilterPunc
    Request.Speech Recognition
    Whether to filter punctuation marks (currently for the Mandarin engine only):
    0: Does not filter.
    1: Filters the punctuation mark at the end of the sentence.
    2: Filters all punctuation marks.
    Default value: 0.
    Integer
    No
    OutputFileType
    Request.Speech Recognition
    Output file type. Valid values: txt (default), srt
    txt is supported only for fast ASR.
    String
    No
    FlashAsr
    Request.Speech Recognition
    Whether to enable fast ASR. Valid values: true, false (default).
    String
    No
    Format
    Request.Speech Recognition
    Audio format for fast ASR. Supported formats include WAV, PCM, OGG-OPUS, SPEEX, SILK, MP3, M4A, and AAC.
    String
    Yes if FlashAsr is true
    FirstChannelOnly
    Request.Speech Recognition
    Whether to recognize only the first sound channel (in fast ASR mode). Valid values: 0 (recognizes all sound channels); 1 (recognizes the first sound channel). Default value: 1.
    Integer
    No
    WordInfo
    Request.Speech Recognition
    Whether to display word-level timestamps (in fast ASR mode). Valid values: 0 (no); 1 (yes, excluding punctuation timestamps), 2 (yes, including punctuation timestamps). Default value: 0.
    Integer
    No
    

    Response

    Response headers

    This API only returns common response headers. For more information, see Common Response Headers.

    Response body

    The response body returns application/xml data. The following contains all the nodes:
    <Response>
    <Template>
    <Tag>SpeechRecognition</Tag>
    <Name>TemplateName</Name>
    <State>Normal</State>
    <Tag>SpeechRecognition</Tag>
    <CreateTime></CreateTime>
    <UpdateTime></UpdateTime>
    <BucketId></BucketId>
    <Category>Custom</Category>
    <SpeechRecognition>
    <EngineModelType>16k_zh</EngineModelType>
    <ResTextFormat>1</ResTextFormat>
    <FilterDirty>0</FilterDirty>
    <FilterModal>1</FilterModal>
    <ConvertNumMode>0</ConvertNumMode>
    <SpeakerDiarization>1</SpeakerDiarization>
    <SpeakerNumber>0</SpeakerNumber>
    <FilterPunc>0</FilterPunc>
    <OutputFileType>txt</OutputFileType>
    </SpeechRecognition>
    </Template>
    </Response>
    The nodes are as described below:
    Node Name (Keyword)
    Parent Node
    Description
    Type
    Response
    None
    Result storage container
    Container
    Response has the following sub-nodes:
    Node Name (Keyword)
    Parent Node
    Description
    Type
    TemplateId
    Response.Template
    Template ID
    String
    Name
    Response.Template
    Template name
    String
    BucketId
    Response.Template
    Template bucket
    String
    Category
    Response.Template
    Template category: Custom or Official
    String
    Tag
    Response.Template
    Template tag: SpeechRecognition.
    String
    UpdateTime
    Response.Template
    Update time
    String
    CreateTime
    Response.Template
    Creation time
    String
    SpeechRecognition
    Response.Template
    Same as Request.SpeechRecognition in the request body.
    Container

    Error codes

    There are no special error messages for this request. For common error messages, see Error Codes.

    Samples

    Request

    POST /template HTTP/1.1
    Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0e****
    Host: test-1234567890.ci.ap-chongqing.myqcloud.com
    Content-Length: 1666
    Content-Type: application/xml
    
    <Request>
    <Tag>SpeechRecognition</Tag>
    <Name>TemplateName</Name>
    <SpeechRecognition>
    <EngineModelType>16k_zh</EngineModelType>
    <ResTextFormat>1</ResTextFormat>
    <FilterDirty>0</FilterDirty>
    <FilterModal>1</FilterModal>
    <ConvertNumMode>0</ConvertNumMode>
    <SpeakerDiarization>1</SpeakerDiarization>
    <SpeakerNumber>0</SpeakerNumber>
    <FilterPunc>0</FilterPunc>
    <OutputFileType>txt</OutputFileType>
    </SpeechRecognition>
    </Request>

    Response

    HTTP/1.1 200 OK
    Content-Type: application/xml
    Content-Length: 100
    Connection: keep-alive
    Date: Thu, 14 Jul 2022 12:37:29 GMT
    Server: tencent-ci
    x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****
    
    <Response>
    <Template>
    <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
    <Name>TemplateName</Name>
    <State>Normal</State>
    <Tag>SpeechRecognition</Tag>
    <CreateTime>2020-08-05T11:35:24+0800</CreateTime>
    <UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
    <BucketId>test-1234567890</BucketId>
    <Category>Custom</Category>
    <SpeechRecognition>
    <EngineModelType>16k_zh</EngineModelType>
    <ChannelNum>1</ChannelNum>
    <ResTextFormat>0</ResTextFormat>
    <FilterDirty>1</FilterDirty>
    <FilterModal>0</FilterModal>
    <ConvertNumMode>1</ConvertNumMode>
    <SpeakerDiarization>0</SpeakerDiarization>
    <SpeakerNumber>0</SpeakerNumber>
    <FilterPunc>0</FilterPunc>
    </SpeechRecognition>
    </Template>
    </Response>
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support