This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.
The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:
The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows
/path/to/0.json /path/to/1.json /path/to/2.json ...
IMPORTANT: Make sure each video's video_id is unique!!!
The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows
{
"video_path": "/path/to/video.mp4",
"raw_caption": {
"long caption": "Detailed description text of the video"
}
}
Configure parameters in hyvideo/hyvae_extract/vae.yaml:
vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path
video_url_files: "/path/to/meta_file.list" # Video metadata file list
output_base_dir: "/path/to/output/directory" # Output directory
sample_n_frames: 129 # Number of frames to sample
target_size: # Target size
- bucket_size
- bucket_size
enable_multi_aspect_ratio: True # Enable multiple aspect ratios
use_stride: True # Use stride sampling
The target_size parameter defines the resolution bucket size. Here are the recommended values for different quality levels:
| Quality | Bucket Size | Typical Resolution |
|---|---|---|
| 720p | 960 | 1280×720 or similar |
| 540p | 720 | 960×540 or similar |
| 360p | 480 | 640×360 or similar |
When enable_multi_aspect_ratio is set to True, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.
# Set environment variables
export HOST_GPU_NUM=8 # Set the number of GPUs to use
# Run extraction script
cd HunyuanVideo-I2V
bash hyvideo/hyvae_extract/start.sh
cd HunyuanVideo-I2V
export PYTHONPATH=${PYTHONPATH}:`pwd`
export HOST_GPU_NUM=1
CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'
The program generates the following files in the specified output directory:
{video_id}.npy - Latent feature array of the videojson_path/{video_id}.json - JSON file containing video metadata, including:
output_base_dir/ │ ├── {video_id_1}.npy # Latent feature array for video 1 ├── {video_id_2}.npy # Latent feature array for video 2 ├── {video_id_3}.npy # Latent feature array for video 3 │ ... ├── {video_id_n}.npy # Latent feature array for video n │ └── json_path/ # Directory containing metadata JSON files │ ├── {video_id_1}.json # Metadata for video 1 │ ├── {video_id_2}.json # Metadata for video 2 │ ├── {video_id_3}.json # Metadata for video 3 │ │ ... │ └── {video_id_n}.json # Metadata for video n
When enable_multi_aspect_ratio is set to True, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.
When use_stride is set to True, the system automatically adjusts the sampling stride based on the video's frame rate: