Speech Recognition (Async)

1. Introduction

A speech recognition API that converts user-supplied long speech data (up to 4 hours) into text. This API operates in an asynchronous manner, which enable the development of applications or the establishment of automatic speech-to-text conversion services using text obtained from speech data.

WARNING

⚠ Songs or audio with loud background music are not supported.

2. Example

Below is a simple usage example. A long audio file is sent to the API, which responds by converting the voice into text.

POST

shell

curl -X POST 'https://apis.daglo.ai/stt/v1/async/transcripts' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "audio": {
        "source": {
            "url": "https://storage.googleapis.com/bkt-actionpower-examples/audio/actionpower_hello.wav"
        }
    }
}'

text

{"rid":"12345678-abcd-efgh-1234-abcdefghijkl"}

GET

shell

curl 'https://apis.daglo.ai/stt/v1/async/transcripts/<RID>' \
  --header 'Authorization: Bearer <API_TOKEN>'

text

{
    "rid": "12345678-abcd-efgh-1234-abcdefghijkl",
    "status": "transcribed",
    "sttResults": [
        {
            "transcript": "안녕하세요. 액션 파워입니다. 음성 인식의 선두주자 액션 파워의 기술을 만나보세요. ",
            "words": [
                {
                    "word": "안녕하세요. ",
                    "startTime": {
                        "nanos": 60000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 860000000,
                        "seconds": "0"
                    },
                    "segmentId": "1"
                },
                {
                    "word": "액션 ",
                    "startTime": {
                        "nanos": 820000000,
                        "seconds": "0"
                    },
                    "endTime": {
                        "nanos": 419999999,
                        "seconds": "1"
                    },
                    "segmentId": "2"
                },
                {
                    "word": "파워입니다. ",
                    "startTime": {
                        "nanos": 379999999,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 20000000,
                        "seconds": "2"
                    },
                    "segmentId": "2"
                },
                {
                    "word": "음성 ",
                    "startTime": {
                        "nanos": 990000000,
                        "seconds": "1"
                    },
                    "endTime": {
                        "nanos": 509999999,
                        "seconds": "2"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "인식의 ",
                    "startTime": {
                        "nanos": 470000000,
                        "seconds": "2"
                    },
                    "endTime": {
                        "nanos": 830000000,
                        "seconds": "2"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "선두주자 ",
                    "startTime": {
                        "nanos": 790000000,
                        "seconds": "2"
                    },
                    "endTime": {
                        "nanos": 150000000,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "액션 ",
                    "startTime": {
                        "nanos": 110000000,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 669999999,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "파워의 ",
                    "startTime": {
                        "nanos": 629999999,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 950000000,
                        "seconds": "4"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "기술을 ",
                    "startTime": {
                        "nanos": 950000000,
                        "seconds": "4"
                    },
                    "endTime": {
                        "nanos": 879999999,
                        "seconds": "5"
                    },
                    "segmentId": "3"
                },
                {
                    "word": "만나보세요. ",
                    "startTime": {
                        "nanos": 839999999,
                        "seconds": "5"
                    },
                    "endTime": {
                        "nanos": 479999999,
                        "seconds": "6"
                    },
                    "segmentId": "3"
                }
            ],
            "keywords": null
        }
    ]
}

3. Instructions

1) Getting an API Key

Create an account in the API console.
Go to the token menu and issue a new token.
Copy the issued token information and use it as an authentication token when request.

2) Sending a request

Send a request to the specified endpoint with the required parameters.
For more detailed API parameter information, please refer to API Reference.

a. POST Send long audio to transcribe

Endpoint

shell

POST https://apis.daglo.ai/stt/v1/async/transcripts

b. GET Get long audio transcription

Endpoint

shell

GET <https://apis.daglo.ai/stt/v1/async/transcripts/{rid}>

c. Get Callback

If you want it to be processed by a callback function after the request, the server completes the task and sends the completion status to the specified URL. For more information, please refer to the Get(Polling) and Callback document.

3) Verifying audio format is supported

File size: up to 2GB
Audio duration: Within 4 hours
Support file formats

🔊 audio

.3gp, .3gpp .ac3, .aac, .aiff, .amr, .au, .flac, .m4a, .mp3, .mxf, .opus, .ra, .wav, .weba

📹 video

.asx, .avi, .ogm, .ogv, .m4v, .mov, .mp4, .mpeg, .mpg, .wmv

WARNING

⚠ Even if the format is the same, transcription may not proceed if the actual content (encoding) is different.

4) Get a response

a. POST Send long audio to transcribe

Success

View response code

200
204
- No Content. The request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty. :::
Error

View response code

400: Bad Request. Invalid format.
401: Unauthorized.
403: Forbidden. Unauthorized access.
413: Payload Too Large. Request too large.
415: Unsupported Media Type.
429: Too Many Requests.
500: Internal Server Error.
503: We are processing so many requests that we are temporarily unable to respond. Please try again in a moment.

b. GET Get long audio transcription

- **Endpoint**

	```shell
	GET https://apis.daglo.ai/stt/v1/async/transcripts/{rid}
	
	```

- **Success**
::: details View response code

200
- ai_requested: Request has been started.
- uploaded: File upload completed.
- file_processing: Pre-processing file.
- transcribing: Transcription in-progress.
- post_processing: Post-processing.
- transcribed: Transcription completed.
- transcript_error: An error occurred while transcribing. Please wait a moment and request again.
- file_error: There is an error in the file. Please check the file and request it again.
204
- No Content. Request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
- Error
View response code
401: Unauthorized.
403: Forbidden. Unauthorized access.
404: Not Found.
429: Too Many Requests.
500: Internal Server Error. ::: c. Receiving the response as a callback
If you want your request to be processed by a callback function, upon completion the API server can send the completion status to a URL the user has specified. For more information, please refer to the Get(Polling) and Callback document.

Update history

20240902 ver1.0 API document has been created.

Speech Recognition (Async) ​

1. Introduction ​

2. Example ​

3. Instructions ​

1) Getting an API Key ​

2) Sending a request ​

3) Verifying audio format is supported ​

4) Get a response ​

Speech Recognition (Async)

1. Introduction

2. Example

3. Instructions

1) Getting an API Key

2) Sending a request

3) Verifying audio format is supported

4) Get a response