Getting Started with TensorFlow.js – Real-Time Object Detection

Ever wondered how object detection works in web applications? With TensorFlow.js, you can leverage pre-trained models to build powerful machine learning applications directly in the browser. In this guide, I’ll walk you through creating a real-time object detection app using TensorFlow.js and the pre-trained Coco-SSD model. This project is beginner-friendly and perfect for exploring the potential of TensorFlow.js.

What are we building?

A web-based app that:

  • Accesses your webcam feed.
  • Uses a pre-trained object detection model (Coco-SSD).
  • Displays detected objects in real-time with bounding boxes and labels.

What is needed?

  • A modern web browser (e.g., Chrome, Edge).
  • Basic JavaScript knowledge.
  • A text editor (vscode or similar) and web server (or just open the HTML file locally).

The Markup

Here’s the markup that the code will live in. Minimal styling needed, including our assets for tensorflow.js and coco-ssd, and finally your script.js file where the action lives.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>TensorFlow Object Detection</title>
    <style>
        body, html {
            margin: 0;
            padding: 0;
            height: 100%;
            overflow: hidden;
        }

        canvas {
            position: absolute;
            left: 0;
        }
    </style>

    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>
    <script src="script.js">  </script>

</head>
<body>
    <h1>TensorFlow Object Detection</h1>
</body>
</html>

The script

Here’s the full script we’ll use for object detection. Let’s break it into sections to understand what each part does.

window.onload = async () => {
  // 1. Create and set up the video element
  const video = document.createElement('video');
  video.width = 640;
  video.height = 480;
  document.body.appendChild(video);

  // 2. Create and set up the canvas element
  const canvas = document.createElement('canvas');
  canvas.width = 640;
  canvas.height = 480;
  document.body.appendChild(canvas);
  const ctx = canvas.getContext('2d');

  // 3. Access the webcam
  try {
    const stream = await navigator.mediaDevices.getUserMedia({ video: true });
    video.srcObject = stream;
    await video.play();
  } catch (error) {
    console.error('Error accessing the webcam:', error);
    return;
  }

  // 4. Load the pre-trained Coco-SSD model
  const model = await cocoSsd.load();
  console.log('Coco-SSD model loaded!');

  // 5. Define a function to draw predictions
  function drawPredictions(predictions) {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    predictions.forEach((prediction) => {
      const [x, y, width, height] = prediction.bbox;
      ctx.strokeStyle = 'red';
      ctx.lineWidth = 2;
      ctx.strokeRect(x, y, width, height);
      ctx.font = '18px Arial';
      ctx.fillStyle = 'red';
      ctx.fillText(
        `${prediction.class} (${Math.round(prediction.score * 100)}%)`,
        x,
        y > 10 ? y - 5 : 10
      );
    });
  }

  // 6. Detect objects and draw predictions in a loop
  async function detectAndDraw() {
    const predictions = await model.detect(video);
    drawPredictions(predictions);
    requestAnimationFrame(detectAndDraw);
  }

  // Start the detection loop
  detectAndDraw();
};

The Breakdown

  1. Set Up the Video and Canvas Elements
    • The video element is used to display the webcam feed.
    • The canvas element acts as an overlay to draw bounding boxes and labels for detected objects. The ctx variable provides a 2D drawing context for the canvas.
  2. Access the Webcam
    • The navigator.mediaDevices.getUserMedia API requests access to the webcam. If successful, the webcam feed is set as the srcObject of the video element.
    • If access is denied or an error occurs, the error is logged to the console.
  3. Load the Coco-SSD Model
    • The cocoSsd.load() function loads the pre-trained object detection model. This model recognizes over 90 object classes, including people, cars, animals, and more.
  4. Draw Predictions
    • The drawPredictions function loops through each detected object and:
      • Draws a bounding box around the detected object.
      • Displays the object’s label and confidence score as text.
  5. Detect and Draw in Real-Time
    • The detectAndDraw function runs the model’s detect method on the video feed to get predictions.
    • It calls drawPredictions to update the canvas with the latest results.
    • The requestAnimationFrame method ensures the detection loop runs smoothly and continuously.

What’s Happening?

This project combines TensorFlow.js’s machine learning capabilities with the browser’s native APIs for video and drawing. It’s a lightweight and powerful demonstration of AI in the browser, without requiring any server-side processing.

Building a real-time object detection app is a rewarding way to get started with TensorFlow.js. This breakdown helps you understand how all the pieces fit together, making it easier to expand or adapt for future projects.

Further reading:

References and Resources

  1. TensorFlow.js Documentation
  2. Web APIs

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *