This document records test commands and IDE run-configuration settings for the repository. Keep it in sync when adding test subpackages, pytest options, or benchmark output helpers.
For installed-package user examples, including Clarabel on gw_maxcut from a
downstream project, see docs/convex-benchmark-usage.md.
For native wheel variant behavior and release CI checks, see
docs/native-wheel-variants.md and docs/release-ci.md.
The C++ cone benchmark compares real backends:
uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short
The cross-backend report compares cpp paths for NumPy, Torch CPU, Torch CUDA, and CuPy CUDA inputs against the matching Python backend implementation:
uv run pytest -q .\tests\benchmark-cpp\test_cpp_backend_interop_benchmarks.py --tb=short
Run both benchmark reports together with:
uv run pytest -q .\tests\benchmark-cpp --tb=short
The test prints a C++ cone backend comparisons table in pytest terminal
summary output. The table includes cone kind, operation, rows, sample type,
allocation mode, cpp input backend, Python/backend baseline, median cpp time,
median Python/backend time, ratio, status, and documented reason when cpp is
not expected to beat the baseline yet.
Benchmark status values:
fast: cpp median time is at least 5% faster than host.parity: cpp is within the configured parity band, currently 10% relative
or 0.002 ms absolute for tiny helpers where timer resolution dominates.documented: cpp is slower than the parity band and the test includes a
concrete implementation reason that must be visible in the table.If native C++ sources or bindings changed, rebuild the editable package in the Visual Studio Developer Shell before running the benchmark:
$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
uv sync --reinstall-package optfuncs --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short
-q is safe for this benchmark; the comparison table is emitted through
pytest_terminal_summary, not ordinary print output.
Stage 3 benchmarks should measure allocator, adapter, wrapper, kernel, and solver-loop costs separately. The goal is not to make every fresh-return helper beat NumPy, Torch, or CuPy allocation. The goal is to make performance APIs use backend-native allocation, caller-provided outputs, workspace reuse, and batch/fused kernels.
Planned benchmark groups:
| Group | What it measures | Expected result |
|---|---|---|
alloc only | Backend-native empty/zero allocation with no cone math. | Fresh-return helpers may be parity/documented. |
fill only | Zero-fill or backend fill on caller-owned buffers. | Native/backend path should match or beat host. |
one write only | Single scalar write into existing output. | Native/backend path should beat fresh allocation. |
wrapper overhead | Python dispatch into adapter/native and return. | Must be reported separately from kernel time. |
full API | User-facing unit_vector, project, or scaling allocation path. | Convenience API targets backend-native parity. |
out API | unit_vector_into, project_into, and scaling into output. | Must preserve output pointer identity and beat full API. |
workspace | Repeated calls using preallocated scratch buffers. | Must reduce allocations in solver-loop workloads. |
fused | Combined operations such as project plus violation. | Must beat equivalent repeated scalar calls. |
Recommended future command shape:
uv run pytest -q .\tests\benchmark-cpp --tb=short
When Stage 3-specific files are added, keep them under tests/benchmark-cpp
and reuse pytest terminal summary output. Each row should include backend, cone,
operation, dimension, allocation mode, memory state, median time, ratio, status,
and documented reason when native cannot win.
CUDA benchmark rules:
.item(), host logging, implicit default-stream synchronization, and
accidental host-device transfers inside timed regions.Stage 3 acceptance:
unit_vector() fresh-return may be parity or documented.unit_vector_into, batch/fused helpers, project/violation/scaling kernels,
and workspace solver-loop paths should beat Stage 2 or the fair host baseline.Use the repository .venv created by uv sync as the project interpreter:
B:\FUNQITANG\optimization-test\.venv\Scripts\python.exe
Create a Python tests > pytest run configuration:
| Field | Value |
|---|---|
| Target | tests\benchmark-cpp\test_cpp_operator_benchmarks.py |
| Working directory | B:\FUNQITANG\optimization-test |
| Additional arguments | -q --tb=short |
| Environment variables | PYTHONUTF8=1 |
Run the configuration from PyCharm's Run tool window. The benchmark table appears after the pytest warnings summary. If PyCharm does not show the table, remove any setting that hides pytest terminal summary output.
When native C++ changed, run the VS DevShell uv sync --reinstall-package
command above from a terminal first, then run the PyCharm pytest configuration.
For Stage 3 benchmark development, duplicate this configuration and change the
target to tests\benchmark-cpp so new adapter, workspace, batch, and fused
benchmark files run together and still emit one terminal summary.
Use CLion's CMake support for navigation and an external run configuration for pytest.
For native code indexing, open the repository root and configure CMake with:
cmake -S . -B build\clion-check2 -G "Visual Studio 18 2026" -A x64 -DCMAKE_CUDA_ARCHITECTURES=75
Create an External Tool or Shell Script run configuration:
| Field | Value |
|---|---|
| Program | C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe |
| Arguments | -NoProfile -ExecutionPolicy Bypass -Command "uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short" |
| Working directory | B:\FUNQITANG\optimization-test |
For a rebuild-and-test configuration, use this PowerShell command in the same run configuration:
$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'; & $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation; uv sync --reinstall-package optfuncs --group torch-cu130 --group cupy-cu13 --extra convex --dev; uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short
The rebuild-and-test command is slower but guarantees CLion is running against the extension produced from current native sources.
For Stage 3 native work, add CMake variants for allocator and library policy
experiments instead of editing source-level defaults. Useful configure flags are
-DOPTFUNC_BLAS=AUTO|OFF|SYSTEM|OPENBLAS,
-DOPTFUNC_USE_CUBLAS=ON|OFF, -DOPTFUNC_NATIVE_CPU_TUNE=ON, and
-DOPTFUNC_MSVC_ARCH=AVX2.
The cpp backend must not call NumPy functions from C++ or hide a NumPy fast
path behind native benchmark results. It can use native techniques that NumPy
also relies on internally:
OPTFUNC_NATIVE_CPU_TUNE=ON and, on MSVC,
OPTFUNC_MSVC_ARCH=AVX2 or another supported /arch value.cmake -S . -B build\clion-blas -G "Visual Studio 18 2026" -A x64 -DOPTFUNC_BLAS=OPENBLAS
Use -DOPTFUNC_BLAS=AUTO for the default non-fatal probe, OFF to disable
host BLAS, SYSTEM to require CMake's default BLAS, or OPENBLAS to require
OpenBLAS. CUDA dense helpers can link cuBLAS through
-DOPTFUNC_USE_CUBLAS=ON; turn it off only when comparing custom CUDA kernels
against vendor libraries.
When a C++ path still cannot beat the host backend, the benchmark must print a
documented row with the concrete reason. Current expected reasons are
allocation-dominated tiny helpers, absence of a requested native library, or a
portable fallback such as a PSD eigensolver path that has not yet been replaced
with BLAS/LAPACK/cuSolver.
Run cone operator correctness and backend parity:
uv run pytest -q tests\test_cone_operators.py --tb=short
Run the standalone C++ cone subpackage embedding checks used to protect downstream conic solver imports:
uv run pytest -q tests\test_cpp_package_embed.py --tb=short
This test blocks SciPy, CVXPY, Torch, and CuPy in subprocesses, changes to a
non-repository working directory, and verifies that optfunc.cvxs.cones.cpp
can still be imported and used with NumPy arrays. It also builds a wheel and
installs it with pip install --no-deps --target semantics to catch missing
native extension or stub files. It now builds the plain optfuncs wheel as
CPU-only and verifies has_cuda() is false without the addon. If the active uv
environment has no pip module, the test uses uv pip install --no-deps --target as the equivalent installer path.
When nvcc is available, the same file also builds the optfuncs-cu130 addon
wheel and verifies that it contains optfuncs_cuda130/_cpp_cuda* without
owning any optfunc/ package files.
Run the full suite:
uv run pytest -q
Run cone code lint checks:
uv run ruff check src\optfunc\cvxs\cones tests\benchmark-cpp tests\test_cone_operators.py