Skip to content

Real-time Speech Recognition (Streaming)

1. Introduction

This document guides you on how to convert real-time speech to text in web browsers using the Daglo API. It explains how to implement voice recognition using the Daglo API library in a JavaScript environment.

WARNING

⚠️ Notes

  • Currently only available in JavaScript environments running in web browsers.
  • Real-time speech recognition currently only supports ‘Korean’ language.
  • Conversion is not supported for songs or audio with loud background music.

1) Key Features

  • Real-time speech recognition: Convert microphone input (audio stream) to text in real-time
  • High-accuracy text conversion: Apply the latest speech recognition technology for high-accuracy text conversion
  • Fast response time: Process speech in real-time for immediate text output
  • Web browser support: Use immediately in web browsers without additional installation

2. Prerequisites

1) Requirements

  • Latest version of web browsers (Chrome, Edge, Firefox, etc.)
  • Microphone access permission
  • Daglo API account and API token
  • Daglo API GitHub repository

3. Sample Code

Below is an example of implementing real-time speech recognition using the Daglo API library.

html
<!DOCTYPE html>
<html lang="ko">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>dagloAPI stream STT Example</title>
    <link href="./index.css" rel="stylesheet">
  </head>
  <body>
    <div id="liveView" class="videoView">
      <label>API Token</label>
      <input id="token" placeholder="YOUR TOKEN">
      <br>
      <button id="enableButton" class="enable-btn">
        <span class="enable-btn-label">Microphone ON</span>
      </button>

      <p id="result"></p>
      <div id="transcripts"></div>
      <div id="speech-list"></div>
    </div>

    <script type="module">
      import { DagloAPI } from 'https://actionpower.github.io/dagloapi-js-beta/lib/daglo-api.module.js';

      document.getElementById('enableButton').addEventListener('click', async (event) => {
        const dagloToken = document?.getElementById('token').value?.trim();

        let client = new DagloAPI({
          apiToken: dagloToken
        });
        let transcriber = client.stream.transcriber();

        transcriber.on('transcript', (data) => {
          console.log('[#] onTranscript', data);

          if (data?.text) {
            const span = document.createElement('span');
            span.textContent = data?.text;
            document.getElementById('transcripts').append(span);
          }
        })

        let stream;

        try {
          // capture the microphone
          stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        }
        catch (err) {
          console.log("The following error occured: " + err);
          return alert("getUserMedia not supported on your browser");
        }

        if (stream) {
          transcriber.connect(stream);
        }
      });
    </script>
  </body>
</html>

4. Daglo API Library Description

The following section analyzes the key parts of the example code to explain how to use the Daglo API library.

1) Loading the Library

javascript
import { DagloAPI } from 'https://actionpower.github.io/dagloapi-js-beta/lib/daglo-api.module.js';

This imports the Daglo API library.

2) Creating a DagloAPI Instance

javascript
let client = new DagloAPI({
  apiToken: 'YOUR_API_TOKEN', // Necessary: API Token
});

Initialize a DagloAPI instance with the API token entered by the user. This instance provides access to various features of the API.

3) Creating a Transcriber

javascript
let transcriber = client.stream.transcriber();

Create a transcriber for speech stream processing through the client.stream object.

4) Registering a Text Conversion Event Listener

javascript
transcriber.on('transcript', (data) => {
  console.log('[#] onTranscript', data);

  if (data?.text) {
    const span = document.createElement('span');
    span.textContent = data?.text;
    document.getElementById('transcripts').append(span);
  }
})

Register a callback function that is called whenever a transcript event occurs. This event occurs each time voice is converted to text.

The callback function receives a data object containing the converted text information. In the code, it logs to the console and, if text is present, displays it on the screen.

5) Accessing the Microphone and Acquiring an Audio Stream

javascript
stream = await navigator.mediaDevices.getUserMedia({ audio: true });

Use the browser's getUserMedia API to request microphone access permission and acquire an audio stream.

6) Connecting the Audio Stream to the Transcriber

javascript
transcriber.connect(stream);

Connect the acquired audio stream to the transcriber to start real-time voice recognition.

5. Troubleshooting

1) Microphone Access Permission

  • Most browsers only allow microphone access in HTTPS environments.
  • During local development, access is possible on localhost, but HTTPS must be used when deploying.

2) Browser Compatibility

  • The Daglo API is optimized for the latest web browsers.
  • The audio capture API (getUserMedia) is not supported in IE, so it is recommended to use the latest versions of Chrome, Firefox, Safari, Edge, etc.

3) Network Connection

  • Real-time speech recognition requires a stable network connection.
  • Speech recognition performance may degrade if the network connection is unstable.

6. Conclusion

This document has examined how to implement real-time speech recognition using the Daglo API library. Through the provided example code, you can simply implement voice recognition features in a web browser environment and utilize them in various applications.

For more details, please refer to the API Reference and GitHub repository.