HunyuanVideo Latent Feature Extraction Tool

This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.

Features

Support for various video formats and resolutions
Multi-GPU parallel processing for improved efficiency
Support for multiple aspect ratios
High-performance VAE model for feature extraction
Automatic skipping of already processed videos, supporting resume functionality

Usage

1. Configuration File

Input dataset Format

The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:

The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows


/path/to/0.json
/path/to/1.json
/path/to/2.json
...

IMPORTANT: Make sure each video's video_id is unique!!!

The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows


{
  "video_path": "/path/to/video.mp4",
  "raw_caption": {
    "long caption": "Detailed description text of the video"
  }
}

Configure parameters in hyvideo/hyvae_extract/vae.yaml:


vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path
video_url_files: "/path/to/meta_file.list"     # Video metadata file list
output_base_dir: "/path/to/output/directory"   # Output directory
sample_n_frames: 129                           # Number of frames to sample
target_size:                                   # Target size
  - bucket_size
  - bucket_size
enable_multi_aspect_ratio: True                # Enable multiple aspect ratios
use_stride: True                               # Use stride sampling

Bucket Size Reference

The target_size parameter defines the resolution bucket size. Here are the recommended values for different quality levels:

Quality	Bucket Size	Typical Resolution
720p	960	1280×720 or similar
540p	720	960×540 or similar
360p	480	640×360 or similar

When enable_multi_aspect_ratio is set to True, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.

2. Run Extraction


# Set environment variables
export HOST_GPU_NUM=8  # Set the number of GPUs to use

# Run extraction script
cd HunyuanVideo-I2V
bash hyvideo/hyvae_extract/start.sh

3. Single GPU Run


cd HunyuanVideo-I2V
export PYTHONPATH=${PYTHONPATH}:`pwd`
export HOST_GPU_NUM=1
CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'

Output Files

The program generates the following files in the specified output directory:

{video_id}.npy - Latent feature array of the video
json_path/{video_id}.json - JSON file containing video metadata, including:
- video_id: Video ID
- latent_shape: Shape of the latent features
- video_path: Original video path
- prompt: Video description/prompt
- npy_save_path: Path where the latent features are saved


output_base_dir/
│
├── {video_id_1}.npy # Latent feature array for video 1
├── {video_id_2}.npy # Latent feature array for video 2
├── {video_id_3}.npy # Latent feature array for video 3
│ ...
├── {video_id_n}.npy # Latent feature array for video n
│
└── json_path/ # Directory containing metadata JSON files
│     ├── {video_id_1}.json # Metadata for video 1
│     ├── {video_id_2}.json # Metadata for video 2
│     ├── {video_id_3}.json # Metadata for video 3
│     │ ...
│     └── {video_id_n}.json # Metadata for video n

Advanced Configuration

Multiple Aspect Ratio Processing

When enable_multi_aspect_ratio is set to True, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.

Stride Sampling

When use_stride is set to True, the system automatically adjusts the sampling stride based on the video's frame rate:

When frame rate >= 50fps, stride is 2
When frame rate < 50fps, stride is 1

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111