Speech Recognition (Async)
1. Introduction
A speech recognition API that converts user-supplied long speech data (up to 4 hours) into text. This API operates in an asynchronous manner, which enable the development of applications or the establishment of automatic speech-to-text conversion services using text obtained from speech data.
WARNING
⚠ Songs or audio with loud background music are not supported.
2. Example
Below is a simple usage example. A long audio file is sent to the API, which responds by converting the voice into text.
POST
curl -X POST 'https://apis.daglo.ai/stt/v1/async/transcripts' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
"audio": {
"source": {
"url": "https://storage.googleapis.com/bkt-actionpower-examples/audio/actionpower_hello.wav"
}
}
}'
{"rid":"12345678-abcd-efgh-1234-abcdefghijkl"}
GET
curl 'https://apis.daglo.ai/stt/v1/async/transcripts/<RID>' \
--header 'Authorization: Bearer <API_TOKEN>'
{
"rid": "12345678-abcd-efgh-1234-abcdefghijkl",
"status": "transcribed",
"sttResults": [
{
"transcript": "안녕하세요. 액션 파워입니다. 음성 인식의 선두주자 액션 파워의 기술을 만나보세요. ",
"words": [
{
"word": "안녕하세요. ",
"startTime": {
"nanos": 60000000,
"seconds": "0"
},
"endTime": {
"nanos": 860000000,
"seconds": "0"
},
"segmentId": "1"
},
{
"word": "액션 ",
"startTime": {
"nanos": 820000000,
"seconds": "0"
},
"endTime": {
"nanos": 419999999,
"seconds": "1"
},
"segmentId": "2"
},
{
"word": "파워입니다. ",
"startTime": {
"nanos": 379999999,
"seconds": "1"
},
"endTime": {
"nanos": 20000000,
"seconds": "2"
},
"segmentId": "2"
},
{
"word": "음성 ",
"startTime": {
"nanos": 990000000,
"seconds": "1"
},
"endTime": {
"nanos": 509999999,
"seconds": "2"
},
"segmentId": "3"
},
{
"word": "인식의 ",
"startTime": {
"nanos": 470000000,
"seconds": "2"
},
"endTime": {
"nanos": 830000000,
"seconds": "2"
},
"segmentId": "3"
},
{
"word": "선두주자 ",
"startTime": {
"nanos": 790000000,
"seconds": "2"
},
"endTime": {
"nanos": 150000000,
"seconds": "4"
},
"segmentId": "3"
},
{
"word": "액션 ",
"startTime": {
"nanos": 110000000,
"seconds": "4"
},
"endTime": {
"nanos": 669999999,
"seconds": "4"
},
"segmentId": "3"
},
{
"word": "파워의 ",
"startTime": {
"nanos": 629999999,
"seconds": "4"
},
"endTime": {
"nanos": 950000000,
"seconds": "4"
},
"segmentId": "3"
},
{
"word": "기술을 ",
"startTime": {
"nanos": 950000000,
"seconds": "4"
},
"endTime": {
"nanos": 879999999,
"seconds": "5"
},
"segmentId": "3"
},
{
"word": "만나보세요. ",
"startTime": {
"nanos": 839999999,
"seconds": "5"
},
"endTime": {
"nanos": 479999999,
"seconds": "6"
},
"segmentId": "3"
}
],
"keywords": null
}
]
}
3. Instructions
1) Getting an API Key
- Create an account in the API console.
- Go to the token menu and issue a new token.
- Copy the issued token information and use it as an authentication token when request.
2) Sending a request
- Send a request to the specified endpoint with the required parameters.
- For more detailed API parameter information, please refer to API Reference.
a. POST
Send long audio to transcribe
- Endpoint
POST https://apis.daglo.ai/stt/v1/async/transcripts
b. GET
Get long audio transcription
- Endpoint
GET <https://apis.daglo.ai/stt/v1/async/transcripts/{rid}>
c. Get Callback
- If you want it to be processed by a callback function after the request, the server completes the task and sends the completion status to the specified URL. For more information, please refer to the Get(Polling) and Callback document.
3) Verifying audio format is supported
- File size: up to 2GB
- Audio duration: Within 4 hours
- Support file formats
🔊 audio
.3gp, .3gpp .ac3, .aac, .aiff, .amr, .au, .flac, .m4a, .mp3, .mxf, .opus, .ra, .wav, .weba
📹 video
.asx, .avi, .ogm, .ogv, .m4v, .mov, .mp4, .mpeg, .mpg, .wmv
WARNING
⚠ Even if the format is the same, transcription may not proceed if the actual content (encoding) is different.
4) Get a response
a. POST
Send long audio to transcribe
- Success
View response code
200
204
- No Content. The request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty. :::
- Error
View response code
400
: Bad Request. Invalid format.401
: Unauthorized.403
: Forbidden. Unauthorized access.413
: Payload Too Large. Request too large.415
: Unsupported Media Type.429
: Too Many Requests.500
: Internal Server Error.503
: We are processing so many requests that we are temporarily unable to respond. Please try again in a moment.
b. GET
Get long audio transcription
- **Endpoint**
```shell
GET https://apis.daglo.ai/stt/v1/async/transcripts/{rid}
```
- **Success**
::: details View response code
200
ai_requested
: Request has been started.uploaded
: File upload completed.file_processing
: Pre-processing file.transcribing
: Transcription in-progress.post_processing
: Post-processing.transcribed
: Transcription completed.transcript_error
: An error occurred while transcribing. Please wait a moment and request again.file_error
: There is an error in the file. Please check the file and request it again.
204
- No Content. Request was successful, but no result is returned.
- For transcription, transcription is complete, but the result is empty.
:::
- Error
View response code
401
: Unauthorized.403
: Forbidden. Unauthorized access.404
: Not Found.429
: Too Many Requests.500
: Internal Server Error. ::: c. Receiving the response as a callbackIf you want your request to be processed by a callback function, upon completion the API server can send the completion status to a URL the user has specified. For more information, please refer to the Get(Polling) and Callback document.
Update history
- 20240902 ver1.0 API document has been created.