Skip to content

Speech Recognition (Async)

1. Introduction

A speech recognition API that converts user-supplied long speech data (up to 4 hours) into text. This API operates in an asynchronous manner, which enable the development of applications or the establishment of automatic speech-to-text conversion services using text obtained from speech data.

WARNING

⚠ Songs or audio with loud background music are not supported.

2. Example

Below is a simple usage example. A long audio file is sent to the API, which responds by converting the voice into text.

POST

shell
curl -X POST 'https://apis.daglo.ai/stt/v1/async/transcripts' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "audio": {
        "source": {
            "url": "https://storage.googleapis.com/bkt-actionpower-examples/audio/actionpower_hello.wav"
        }
    }
}'
text
{"rid":"12345678-abcd-efgh-1234-abcdefghijkl"}

GET

shell
curl 'https://apis.daglo.ai/stt/v1/async/transcripts/<RID>' \
  --header 'Authorization: Bearer <API_TOKEN>'
text
{
    "rid": "12345678-abcd-efgh-1234-abcdefghijkl",
    "status": "transcribed",
    "sttResults": [
        {
            "transcript": "안녕하세요. 액션 파워입니다. 음성 인식의 선두주자 액션 파워의 기술을 만나보세요. ",
            "words": [
                {
                    "word": "안녕하세요. ",
                    "startTime": {
                        "nanos": 60000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 860000000,
                        "seconds": "0"
                    },
                    "segmentId": "1"
                },
                {
                    "word": "액션 ",
                    "startTime": {
                        "nanos": 820000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 419999999,
                        "seconds": "1"
                    },
                    "segmentId": "2"
                },
                {
                    "word": "파워입니다. ",
                    "startTime": {
                        "nanos": 379999999,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 20000000,
                        "seconds": "2"
                    },
                    "segmentId": "2"
                },
                {
                    "word": "음성 ",
                    "startTime": {
                        "nanos": 990000000,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 509999999,
                        "seconds": "2"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "인식의 ",
                    "startTime": {
                        "nanos": 470000000,
                        "seconds": "2"
                    },
                    "endTime": {
                        "nanos": 830000000,
                        "seconds": "2"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "선두주자 ",
                    "startTime": {
                        "nanos": 790000000,
                        "seconds": "2"
                    },
                    "endTime": {
                        "nanos": 150000000,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "액션 ",
                    "startTime": {
                        "nanos": 110000000,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 669999999,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "파워의 ",
                    "startTime": {
                        "nanos": 629999999,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 950000000,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "기술을 ",
                    "startTime": {
                        "nanos": 950000000,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 879999999,
                        "seconds": "5"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "만나보세요. ",
                    "startTime": {
                        "nanos": 839999999,
                        "seconds": "5"
                    },
                    "endTime": {
                        "nanos": 479999999,
                        "seconds": "6"
                    },
                    "segmentId": "3"
                }
            ],
            "keywords": null
        }
    ]
}

3. Instructions

1) Getting an API Key

  • Create an account in the API console.
  • Go to the token menu and issue a new token.
  • Copy the issued token information and use it as an authentication token when request.

2) Sending a request

  • Send a request to the specified endpoint with the required parameters.
  • For more detailed API parameter information, please refer to API Reference.

a. POST Send long audio to transcribe

  • Endpoint
shell
POST https://apis.daglo.ai/stt/v1/async/transcripts

b. GET Get long audio transcription

  • Endpoint
shell
GET <https://apis.daglo.ai/stt/v1/async/transcripts/{rid}>

c. Get Callback

  • If you want it to be processed by a callback function after the request, the server completes the task and sends the completion status to the specified URL. For more information, please refer to the Get(Polling) and Callback document.

3) Verifying audio format is supported

  • File size: up to 2GB
  • Audio duration: Within 4 hours
  • Support file formats

🔊 audio

.3gp, .3gpp .ac3, .aac, .aiff, .amr, .au, .flac, .m4a, .mp3, .mxf, .opus, .ra, .wav, .weba

📹 video

.asx, .avi, .ogm, .ogv, .m4v, .mov, .mp4, .mpeg, .mpg, .wmv

WARNING

⚠ Even if the format is the same, transcription may not proceed if the actual content (encoding) is different.

4) Get a response

a. POST Send long audio to transcribe

  • Success
View response code
  • 200
  • 204
    • No Content. The request was successful, but no result is returned.
    • For transcription, transcription is complete, but the result is empty. :::
  • Error
View response code
  • 400: Bad Request. Invalid format.
  • 401: Unauthorized.
  • 403: Forbidden. Unauthorized access.
  • 413: Payload Too Large. Request too large.
  • 415: Unsupported Media Type.
  • 429: Too Many Requests.
  • 500: Internal Server Error.
  • 503: We are processing so many requests that we are temporarily unable to respond. Please try again in a moment.

b. GET Get long audio transcription

- **Endpoint**

	```shell
	GET https://apis.daglo.ai/stt/v1/async/transcripts/{rid}
	
	```

- **Success**
::: details View response code
  • 200

    • ai_requested: Request has been started.
    • uploaded: File upload completed.
    • file_processing: Pre-processing file.
    • transcribing: Transcription in-progress.
    • post_processing: Post-processing.
    • transcribed: Transcription completed.
    • transcript_error: An error occurred while transcribing. Please wait a moment and request again.
    • file_error: There is an error in the file. Please check the file and request it again.
  • 204

    • No Content. Request was successful, but no result is returned.
    • For transcription, transcription is complete, but the result is empty.

    :::

    • Error
    View response code
  • 401: Unauthorized.

  • 403: Forbidden. Unauthorized access.

  • 404: Not Found.

  • 429: Too Many Requests.

  • 500: Internal Server Error. ::: c. Receiving the response as a callback

    If you want your request to be processed by a callback function, upon completion the API server can send the completion status to a URL the user has specified. For more information, please refer to the Get(Polling) and Callback document.

Update history

  • 20240902 ver1.0 API document has been created.