How to use Amazon Nova for video analysis, object detection with bounding boxes, and content annotation. Includes TypeScript examples for S3 and local video processing with the AWS Bedrock SDK.

Amazon Nova is a generative AI service on AWS Bedrock that enables developers to build intelligent video analysis applications -- from content annotation and object detection to automated video summarization. This guide walks through Amazon Nova's multimodal capabilities with production-ready TypeScript examples for processing both local and S3-hosted videos, including bounding box object detection.
Amazon Nova is Amazon's generative AI service specifically designed for multimodal content analysis, with robust capabilities for processing video data. Nova can:
These capabilities make Nova an ideal choice for applications requiring deep video analysis, content moderation, accessibility features, and automated content summarization.
Before diving into implementation, ensure you have:
Full fledged TypeScript examples are available in the GitHub repository.
First, install the necessary dependencies for working with Nova:
Refer to the package.json below for the dependencies:
You'll also need to set up your AWS credentials either via environment variables or AWS CLI configuration.
One of the most common use cases is analyzing videos stored in S3 buckets. One of the core difference between processing local videos and S3 videos is the limitation of video size. For local videos, the maximum size is 25MB, while for S3 videos, the maximum size is 1GB. Here's a code snippet for example, only the relevant code is shown.
For development and testing, you might want to process videos stored locally, you can check the core difference in the request schema is the "source" field. For local videos, the "source" field is "bytes" and for S3 videos, the "source" field is "s3Location" with "bucketOwner" field.
You may see similar response (Results: ) as the one below, with necessary logs for debugging:
One of Nova's most powerful features is its ability to detect objects in videos and provide bounding box coordinates. This can be used for applications like content moderation, accessibility, or interactive video experiences.
The following example demonstrates how to specify the prompt for object detection and process the image with the Nova API using retry logic.
Check the sample images below with bounding boxes detected:





Based on the examples and documentation, here are some best practices to follow when working with Nova:
Implement exponential backoff with jitter for API calls:
Nova's pricing is based on token usage. Monitor and log token consumption:
Amazon Nova's video analysis capabilities can be applied in various domains:
Amazon Nova represents a significant advancement in multimodal AI capabilities, particularly for video analysis. By providing developers with powerful tools to extract meaning from video content, Nova enables a wide range of applications that were previously challenging to implement.
As multimodal AI continues to evolve, services like Nova will become increasingly important for developers looking to build sophisticated applications that can understand and process visual content. By following the guidelines and best practices outlined in this article, you can effectively leverage Nova's capabilities to enhance your applications with intelligent video analysis.
Aaron is a senior software engineer and AI researcher specializing in generative AI, multimodal systems, and cloud-native AI infrastructure. He writes about cutting-edge AI developments, practical tutorials, and deep technical analysis at fp8.co.
Compare commercial and open-source multimodal video search platforms. Discover which tools support text, image, and voice queries for precise scene retrieval.
Multimodal AI, Video SearchExplore DeepSeek AI multimodal models from DeepSeek-VL to Janus and JanusFlow. Learn how each architecture advances vision-language understanding and generation.
Multimodal AI, DeepSeek