Real-time Speech Recognition (Streaming)

1. Introduction

This document guides you on how to convert real-time speech to text in web browsers using the Daglo API. It explains how to implement voice recognition using the Daglo API library in a JavaScript environment.

WARNING

⚠️ Notes

Currently only available in JavaScript environments running in web browsers.
Real-time speech recognition currently only supports ‘Korean’ language.
Conversion is not supported for songs or audio with loud background music.

1) Key Features

Real-time speech recognition: Convert microphone input (audio stream) to text in real-time
High-accuracy text conversion: Apply the latest speech recognition technology for high-accuracy text conversion
Fast response time: Process speech in real-time for immediate text output
Web browser support: Use immediately in web browsers without additional installation

2. Prerequisites

1) Requirements

Latest version of web browsers (Chrome, Edge, Firefox, etc.)
Microphone access permission
Daglo API account and API token
Daglo API GitHub repository

3. Sample Code

Below is an example of implementing real-time speech recognition using the Daglo API library.

html

<!DOCTYPE html>
<html lang="ko">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>dagloAPI stream STT Example</title>
    <link href="./index.css" rel="stylesheet">
  </head>
  <body>
    <div id="liveView" class="videoView">
      <label>API Token</label>
      <input id="token" placeholder="YOUR TOKEN">
      <br>
      <button id="enableButton" class="enable-btn">
        <span class="enable-btn-label">Microphone ON</span>
      </button>

      <p id="result"></p>
      <div id="transcripts"></div>
      <div id="speech-list"></div>
    </div>

    <script type="module">
      import { DagloAPI } from 'https://actionpower.github.io/dagloapi-js-beta/lib/daglo-api.module.js';

      document.getElementById('enableButton').addEventListener('click', async (event) => {
        const dagloToken = document?.getElementById('token').value?.trim();

        let client = new DagloAPI({
          apiToken: dagloToken
        });
        let transcriber = client.stream.transcriber();

        transcriber.on('transcript', (data) => {
          console.log('[#] onTranscript', data);

          if (data?.text) {
            const span = document.createElement('span');
            span.textContent = data?.text;
            document.getElementById('transcripts').append(span);
          }
        })

        let stream;

        try {
          // capture the microphone
          stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        }
        catch (err) {
          console.log("The following error occured: " + err);
          return alert("getUserMedia not supported on your browser");
        }

        if (stream) {
          transcriber.connect(stream);
        }
      });
    </script>
  </body>
</html>

4. Daglo API Library Description

The following section analyzes the key parts of the example code to explain how to use the Daglo API library.

1) Loading the Library

javascript

import { DagloAPI } from 'https://actionpower.github.io/dagloapi-js-beta/lib/daglo-api.module.js';

This imports the Daglo API library.

2) Creating a DagloAPI Instance

javascript

let client = new DagloAPI({
  apiToken: 'YOUR_API_TOKEN', // Necessary: API Token
});

Initialize a DagloAPI instance with the API token entered by the user. This instance provides access to various features of the API.

3) Creating a Transcriber

javascript

let transcriber = client.stream.transcriber();

Create a transcriber for speech stream processing through the client.stream object.

4) Registering a Text Conversion Event Listener

javascript

transcriber.on('transcript', (data) => {
  console.log('[#] onTranscript', data);

  if (data?.text) {
    const span = document.createElement('span');
    span.textContent = data?.text;
    document.getElementById('transcripts').append(span);
  }
})

Register a callback function that is called whenever a transcript event occurs. This event occurs each time voice is converted to text.

The callback function receives a data object containing the converted text information. In the code, it logs to the console and, if text is present, displays it on the screen.

5) Accessing the Microphone and Acquiring an Audio Stream

javascript

stream = await navigator.mediaDevices.getUserMedia({ audio: true });

Use the browser's getUserMedia API to request microphone access permission and acquire an audio stream.

6) Connecting the Audio Stream to the Transcriber

javascript

transcriber.connect(stream);

Connect the acquired audio stream to the transcriber to start real-time voice recognition.

5. Troubleshooting

1) Microphone Access Permission

Most browsers only allow microphone access in HTTPS environments.
During local development, access is possible on localhost, but HTTPS must be used when deploying.

2) Browser Compatibility

The Daglo API is optimized for the latest web browsers.
The audio capture API (getUserMedia) is not supported in IE, so it is recommended to use the latest versions of Chrome, Firefox, Safari, Edge, etc.

3) Network Connection

Real-time speech recognition requires a stable network connection.
Speech recognition performance may degrade if the network connection is unstable.

6. Conclusion

This document has examined how to implement real-time speech recognition using the Daglo API library. Through the provided example code, you can simply implement voice recognition features in a web browser environment and utilize them in various applications.

For more details, please refer to the API Reference and GitHub repository.

Real-time Speech Recognition (Streaming) ​

1. Introduction ​

1) Key Features ​

2. Prerequisites ​

1) Requirements ​

3. Sample Code ​

4. Daglo API Library Description ​

1) Loading the Library ​

2) Creating a DagloAPI Instance ​

3) Creating a Transcriber ​

4) Registering a Text Conversion Event Listener ​

5) Accessing the Microphone and Acquiring an Audio Stream ​

6) Connecting the Audio Stream to the Transcriber ​

5. Troubleshooting ​

1) Microphone Access Permission ​

2) Browser Compatibility ​

3) Network Connection ​

6. Conclusion ​

Real-time Speech Recognition (Streaming)

1. Introduction

1) Key Features

2. Prerequisites

1) Requirements

3. Sample Code

4. Daglo API Library Description

1) Loading the Library

2) Creating a DagloAPI Instance

3) Creating a Transcriber

4) Registering a Text Conversion Event Listener

5) Accessing the Microphone and Acquiring an Audio Stream

6) Connecting the Audio Stream to the Transcriber

5. Troubleshooting

1) Microphone Access Permission

2) Browser Compatibility

3) Network Connection

6. Conclusion