cuda 12.8
NVIDIA L20
gcc 11.4.1
编译
cmake -B build cmake --build build --parallel `nproc`
运行,可以把输出重定向到log文件中
./build/gpu-benchmark-naive > naive.log ./build/gpu-benchmark-cublas > cublas.log ./build/gpu-benchmark-hierarchical-tiling-1 > hierarchical-tiling.log ./build/gpu-benchmark-vecmemcpy-unrollmemcpy-2 > vecmemcpy-unrollmemcpy.log
然后通过python绘图,将对比数据绘制成图片,存放在perf_comparison.png文件中
conda create -n gemm python=3.11 -y
conda activate gemm
pip install -r requirements.txt
python plot_benchmark.py a.log b.log c.log ...