logo
0
0
WeChat Login
jacobkong<jacobkong@tencent.com>
clean code and add README

中文阅读

HunyuanVideo Latent Feature Extraction Tool

This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.

Features

  • Support for various video formats and resolutions
  • Multi-GPU parallel processing for improved efficiency
  • Support for multiple aspect ratios
  • High-performance VAE model for feature extraction
  • Automatic skipping of already processed videos, supporting resume functionality

Usage

1. Configuration File

Input dataset Format

The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:

The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows

/path/to/0.json /path/to/1.json /path/to/2.json ...

The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows

{ "video_path": "/path/to/video.mp4", "raw_caption": { "long caption": "Detailed description text of the video" } }

Configure parameters in hyvideo/hyvae_extract/vae.yaml:

vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path video_url_files: "/path/to/meta_file.list" # Video metadata file list output_base_dir: "/path/to/output/directory" # Output directory sample_n_frames: 129 # Number of frames to sample target_size: # Target size - bucket_size - bucket_size enable_multi_aspect_ratio: True # Enable multiple aspect ratios use_stride: True # Use stride sampling

Bucket Size Reference

The target_size parameter defines the resolution bucket size. Here are the recommended values for different quality levels:

QualityBucket SizeTypical Resolution
720p9601280×720 or similar
540p720960×540 or similar
360p480640×360 or similar

When enable_multi_aspect_ratio is set to True, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.

2. Run Extraction

# Set environment variables export HOST_GPU_NUM=8 # Set the number of GPUs to use # Run extraction script cd HunyuanVideo-I2V bash hyvideo/hyvae_extract/start.sh

3. Single GPU Run

cd HunyuanVideo-I2V export PYTHONPATH=${PYTHONPATH}:`pwd` export HOST_GPU_NUM=1 CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'

Output Files

The program generates the following files in the specified output directory:

  1. {video_id}.npy - Latent feature array of the video
  2. json_path/{video_id}.json - JSON file containing video metadata, including:
    • video_id: Video ID
    • latent_shape: Shape of the latent features
    • video_path: Original video path
    • prompt: Video description/prompt
    • npy_save_path: Path where the latent features are saved
output_base_dir/ │ ├── {video_id_1}.npy # Latent feature array for video 1 ├── {video_id_2}.npy # Latent feature array for video 2 ├── {video_id_3}.npy # Latent feature array for video 3 │ ... ├── {video_id_n}.npy # Latent feature array for video n │ └── json_path/ # Directory containing metadata JSON files │ ├── {video_id_1}.json # Metadata for video 1 │ ├── {video_id_2}.json # Metadata for video 2 │ ├── {video_id_3}.json # Metadata for video 3 │ │ ... │ └── {video_id_n}.json # Metadata for video n

Advanced Configuration

Multiple Aspect Ratio Processing

When enable_multi_aspect_ratio is set to True, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.

Stride Sampling

When use_stride is set to True, the system automatically adjusts the sampling stride based on the video's frame rate:

  • When frame rate >= 50fps, stride is 2
  • When frame rate < 50fps, stride is 1