tencent cloud

masukan

Creating Automatic Speech Recognition Template

Terakhir diperbarui:2022-09-27 15:22:07

    Feature Description

    This API is used to create a speech recognition template.

    API Explorer is recommended.
    Click to debug
    Tencent Cloud API Explorer provides various capabilities such as online call, signature verification, SDK code generation, and quick API search. You can also use it to query the request and response of each API call as well as generate sample code for calls.

    Request

    Sample request

    POST /template HTTP/1.1
    Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
    Date: <GMT Date>
    Authorization: <Auth String>
    Content-Length: <length>
    Content-Type: application/xml
    <body>
    
    Note:

    • Authorization: Auth String (for more information, see Request Signature).
    • When this feature is used by a sub-account, relevant permissions must be granted.

    Request headers

    This API only uses common request headers. For more information, see Common Request Headers.

    Request body

    This request requires the following request body:

    <Request>
      <Tag>SpeechRecognition</Tag>
      <Name>TemplateName</Name>
      <SpeechRecognition>
          <EngineModelType>16k_zh</EngineModelType>
          <ResTextFormat>1</ResTextFormat>
          <FilterDirty>0</FilterDirty>
          <FilterModal>1</FilterModal>
          <ConvertNumMode>0</ConvertNumMode>
          <SpeakerDiarization>1</SpeakerDiarization>
          <SpeakerNumber>0</SpeakerNumber>
          <FilterPunc>0</FilterPunc>
          <OutputFileType>txt</OutputFileType>
      </SpeechRecognition>
    </Request>
    

    The nodes are described as follows:

    Node Name (Keyword) Parent Node Description Type Required
    Request None Request container. Container Yes


    Request has the following sub-nodes:

    Node Name (Keyword) Parent Node Description Type Required Constraints
    Tag Request Template tag: SpeechRecognition. String Yes No
    Name Request Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*). String Yes None
    SpeechRecognition Request Speech recognition parameter. Container Yes None


    SpeechRecognition has the following sub-nodes:

    Node Name (Keyword) Parent Node Description Type Required
    EngineModelType Request.Speech
    Recognition
    Engine model type, divided into phone call and non-phone call scenarios.
    Phone call scenarios:
    • 8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
    • 8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
    • 8k_en: 8 kHz, for English.
    Non-phone call scenarios:
    • 16k_zh: 16 kHz, for Mandarin in general scenarios.
    • 16k_zh_video: 16 kHz, for audio/video scenarios.
    • 16k_en: 16 kHz, for English.
    • 16k_ca: 16 kHz, for Cantonese.
    • 16k_ja: 16 kHz, for Japanese.
    • 16k_zh_edu: For Mandarin in education scenarios.
    • 16k_en_edu: For English in education scenarios.
    • 16k_zh_medical: For healthcare scenarios.
    • 16k_th: For Thai.
    • 16k_zh_dialect: Multi-dialect, for up to 23 dialects.
    String Yes
    ChannelNum Request.Speech
    Recognition
    Number of sound channels:
    • 1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported.
    • 2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).
    Integer Yes
    ResTextFormat Request.Speech
    Recognition
    Format of the returned recognition result.
    • 0: Recognition result text, including the list of segment timestamps.
    • 1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles).
    • 2: Detailed word-level recognition result, including punctuation marks and the speech speed value.
    Integer Yes
    FilterDirty Request.Speech
    Recognition
    Whether to filter restricted words (for the Mandarin engine only).
    • 0: Does not filter.
    • 1: Filters.
    • 2: Replaces restricted words with *.
    • Default value: 0.
    Integer No
    FilterModal Request.Speech
    Recognition
    Whether to filter modal particles (for the Mandarin engine only).
    • 0: Does not filter.
    • 1: Filters partially.
    • 2: Filters strictly.
    • Default value: 0.
    Integer No
    ConvertNumMode Request.Speech
    Recognition
    Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
    • 0: Directly outputs Chinese numbers.
    • 1: Intelligently converts based on the scenario.
    • 3: Enables mathematic number conversion.
    • Default value: 0.
    Integer No
    SpeakerDiarization Request.Speech
    Recognition
    Whether to enable speaker separation:
    • 0: No.
    • 1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only).
    • Default value: 0.
    • Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.
    Integer No
    SpeakerNumber Request.Speech
    Recognition
    Number of speakers to be separated (with speaker separation enabled). Value range: 0–10.
  • 0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.
  • Integer No
    FilterPunc Request.Speech
    Recognition
    Whether to filter punctuation marks (currently for the Mandarin engine only):
    • 0: Does not filter.
    • 1: Filters the punctuation mark at the end of the sentence.
    • 2: Filters all punctuation marks.
    • Default value: 0.
    Integer No
    OutputFileType Request.Speech
    Recognition
    Output file type. Valid values: txt, srt. Default value: txt. String No

    Response

    Response headers

    This API only returns common response headers. For more information, see Common Response Headers.

    Response body

    The response body returns application/xml data. The following contains all the nodes:

    <Response>
      <Template>
          <Tag>SpeechRecognition</Tag>
          <Name>TemplateName</Name>
          <State>Normal</State>
          <Tag>SpeechRecognition</Tag>
          <CreateTime></CreateTime>
          <UpdateTime></UpdateTime>
          <BucketId></BucketId>
          <Category>Custom</Category>
          <SpeechRecognition>
              <EngineModelType>16k_zh</EngineModelType>
              <ResTextFormat>1</ResTextFormat>
              <FilterDirty>0</FilterDirty>
              <FilterModal>1</FilterModal>
              <ConvertNumMode>0</ConvertNumMode>
              <SpeakerDiarization>1</SpeakerDiarization>
              <SpeakerNumber>0</SpeakerNumber>
              <FilterPunc>0</FilterPunc>
              <OutputFileType>txt</OutputFileType>
          </SpeechRecognition>
      </Template>
    </Response>
    

    The nodes are as described below:

    Node Name (Keyword) Parent Node Description Type
    Response None Response container Container


    Response has the following sub-nodes:

    Node Name (Keyword) Parent Node Description Type
    TemplateId Response.Template Template ID. String
    Name Response.Template Template name. String
    BucketId Response.Template Template bucket. String
    Category Response.Template Template category: Custom or Official. String
    Tag Response.Template Template tag: SpeechRecognition. String
    UpdateTime Response.Template Update time. String
    CreateTime Response.Template Creation time. String
    SpeechRecognition Response.Template Same as the Request.SpeechRecognition in the request body. Container

    Error codes

    There are no special error messages for this request. For common error messages, see Error Codes.

    Samples

    Request

    POST /template HTTP/1.1
    Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0e****
    Host: test-1234567890.ci.ap-chongqing.myqcloud.com
    Content-Length: 1666
    Content-Type: application/xml
    <Request>
      <Tag>SpeechRecognition</Tag>
      <Name>TemplateName</Name>
      <SpeechRecognition>
          <EngineModelType>16k_zh</EngineModelType>
          <ResTextFormat>1</ResTextFormat>
          <FilterDirty>0</FilterDirty>
          <FilterModal>1</FilterModal>
          <ConvertNumMode>0</ConvertNumMode>
          <SpeakerDiarization>1</SpeakerDiarization>
          <SpeakerNumber>0</SpeakerNumber>
          <FilterPunc>0</FilterPunc>
          <OutputFileType>txt</OutputFileType>
      </SpeechRecognition>
    </Request>
    

    Response

    HTTP/1.1 200 OK
    Content-Type: application/xml
    Content-Length: 100
    Connection: keep-alive
    Date: Thu, 14 Jul 2022 12:37:29 GMT
    Server: tencent-ci
    x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****
    <Response>
      <Template>
          <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
          <Name>TemplateName</Name>
          <State>Normal</State>
          <Tag>SpeechRecognition</Tag>
          <CreateTime>2020-08-05T11:35:24+0800</CreateTime>
          <UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
          <BucketId>test-1234567890</BucketId>
          <Category>Custom</Category>
          <SpeechRecognition>
              <EngineModelType>16k_zh</EngineModelType>
              <ChannelNum>1</ChannelNum>
              <ResTextFormat>0</ResTextFormat>
              <FilterDirty>1</FilterDirty>
              <FilterModal>0</FilterModal>
              <ConvertNumMode>1</ConvertNumMode>
              <SpeakerDiarization>0</SpeakerDiarization>
              <SpeakerNumber>0</SpeakerNumber>
              <FilterPunc>0</FilterPunc>
          </SpeechRecognition>
      </Template>
    </Response>
    
    Hubungi Kami

    Hubungi tim penjualan atau penasihat bisnis kami untuk membantu bisnis Anda.

    Dukungan Teknis

    Buka tiket jika Anda mencari bantuan lebih lanjut. Tiket kami tersedia 7x24.

    Dukungan Telepon 7x24