Ever wondered how object detection works in web applications? With TensorFlow.js, you can leverage pre-trained models to build powerful machine learning applications directly in the browser. In this guide, I’ll walk you through creating a real-time object detection app using TensorFlow.js and the pre-trained Coco-SSD model. This project is beginner-friendly and perfect for exploring the potential of TensorFlow.js.
What are we building?
A web-based app that:
- Accesses your webcam feed.
- Uses a pre-trained object detection model (Coco-SSD).
- Displays detected objects in real-time with bounding boxes and labels.
What is needed?
- A modern web browser (e.g., Chrome, Edge).
- Basic JavaScript knowledge.
- A text editor (vscode or similar) and web server (or just open the HTML file locally).
The Markup
Here’s the markup that the code will live in. Minimal styling needed, including our assets for tensorflow.js and coco-ssd, and finally your script.js file where the action lives.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>TensorFlow Object Detection</title>
<style>
body, html {
margin: 0;
padding: 0;
height: 100%;
overflow: hidden;
}
canvas {
position: absolute;
left: 0;
}
</style>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd"></script>
<script src="script.js"> </script>
</head>
<body>
<h1>TensorFlow Object Detection</h1>
</body>
</html>
The script
Here’s the full script we’ll use for object detection. Let’s break it into sections to understand what each part does.
window.onload = async () => {
// 1. Create and set up the video element
const video = document.createElement('video');
video.width = 640;
video.height = 480;
document.body.appendChild(video);
// 2. Create and set up the canvas element
const canvas = document.createElement('canvas');
canvas.width = 640;
canvas.height = 480;
document.body.appendChild(canvas);
const ctx = canvas.getContext('2d');
// 3. Access the webcam
try {
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
video.srcObject = stream;
await video.play();
} catch (error) {
console.error('Error accessing the webcam:', error);
return;
}
// 4. Load the pre-trained Coco-SSD model
const model = await cocoSsd.load();
console.log('Coco-SSD model loaded!');
// 5. Define a function to draw predictions
function drawPredictions(predictions) {
ctx.clearRect(0, 0, canvas.width, canvas.height);
predictions.forEach((prediction) => {
const [x, y, width, height] = prediction.bbox;
ctx.strokeStyle = 'red';
ctx.lineWidth = 2;
ctx.strokeRect(x, y, width, height);
ctx.font = '18px Arial';
ctx.fillStyle = 'red';
ctx.fillText(
`${prediction.class} (${Math.round(prediction.score * 100)}%)`,
x,
y > 10 ? y - 5 : 10
);
});
}
// 6. Detect objects and draw predictions in a loop
async function detectAndDraw() {
const predictions = await model.detect(video);
drawPredictions(predictions);
requestAnimationFrame(detectAndDraw);
}
// Start the detection loop
detectAndDraw();
};
The Breakdown
- Set Up the Video and Canvas Elements
- The
video
element is used to display the webcam feed. - The
canvas
element acts as an overlay to draw bounding boxes and labels for detected objects. Thectx
variable provides a 2D drawing context for the canvas.
- The
- Access the Webcam
- The
navigator.mediaDevices.getUserMedia
API requests access to the webcam. If successful, the webcam feed is set as thesrcObject
of the video element. - If access is denied or an error occurs, the error is logged to the console.
- The
- Load the Coco-SSD Model
- The
cocoSsd.load()
function loads the pre-trained object detection model. This model recognizes over 90 object classes, including people, cars, animals, and more.
- The
- Draw Predictions
- The
drawPredictions
function loops through each detected object and:- Draws a bounding box around the detected object.
- Displays the object’s label and confidence score as text.
- The
- Detect and Draw in Real-Time
- The
detectAndDraw
function runs the model’sdetect
method on the video feed to get predictions. - It calls
drawPredictions
to update the canvas with the latest results. - The
requestAnimationFrame
method ensures the detection loop runs smoothly and continuously.
- The
What’s Happening?
This project combines TensorFlow.js’s machine learning capabilities with the browser’s native APIs for video and drawing. It’s a lightweight and powerful demonstration of AI in the browser, without requiring any server-side processing.
Building a real-time object detection app is a rewarding way to get started with TensorFlow.js. This breakdown helps you understand how all the pieces fit together, making it easier to expand or adapt for future projects.
Further reading:
References and Resources
- TensorFlow.js Documentation
- Official TensorFlow.js site: https://www.tensorflow.org/js
- Coco-SSD model: https://github.com/tensorflow/tfjs-models/tree/master/coco-ssd
- Web APIs
- MediaDevices API: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices
- Canvas API: https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API
Leave a Reply