logo
0
0
WeChat Login
YYmicro<yymicro1044566658@outlook.com>
doc: 更新README

cuda 12.8
NVIDIA L20
gcc 11.4.1

编译

cmake -B build cmake --build build --parallel `nproc`

运行,可以把输出重定向到log文件中

./build/gpu-benchmark-naive > naive.log ./build/gpu-benchmark-cublas > cublas.log ./build/gpu-benchmark-hierarchical-tiling-1 > hierarchical-tiling.log ./build/gpu-benchmark-vecmemcpy-unrollmemcpy-2 > vecmemcpy-unrollmemcpy.log

然后通过python绘图,将对比数据绘制成图片,存放在perf_comparison.png文件中

conda create -n gemm python=3.11 -y conda activate gemm pip install -r requirements.txt python plot_benchmark.py a.log b.log c.log ...