Public

WeChat Login

Pull requests

Events Packages Insights

main

optimization-test/README.test.md

leo2raise

Release v0.0.7

e28b947f

0 commits

PreviewCode viewBlame

optfunc Test And IDE Guide

This document records test commands and IDE run-configuration settings for the repository. Keep it in sync when adding test subpackages, pytest options, or benchmark output helpers.

For installed-package user examples, including Clarabel on gw_maxcut from a downstream project, see docs/convex-benchmark-usage.md.

For native wheel variant behavior and release CI checks, see docs/native-wheel-variants.md and docs/release-ci.md.

Cone C++ Benchmark

The C++ cone benchmark compares real backends:

uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short

The cross-backend report compares cpp paths for NumPy, Torch CPU, Torch CUDA, and CuPy CUDA inputs against the matching Python backend implementation:

uv run pytest -q .\tests\benchmark-cpp\test_cpp_backend_interop_benchmarks.py --tb=short

Run both benchmark reports together with:

uv run pytest -q .\tests\benchmark-cpp --tb=short

The test prints a C++ cone backend comparisons table in pytest terminal summary output. The table includes cone kind, operation, rows, sample type, allocation mode, cpp input backend, Python/backend baseline, median cpp time, median Python/backend time, ratio, status, and documented reason when cpp is not expected to beat the baseline yet.

Benchmark status values:

fast: cpp median time is at least 5% faster than host.
parity: cpp is within the configured parity band, currently 10% relative or 0.002 ms absolute for tiny helpers where timer resolution dominates.
documented: cpp is slower than the parity band and the test includes a concrete implementation reason that must be visible in the table.

If native C++ sources or bindings changed, rebuild the editable package in the Visual Studio Developer Shell before running the benchmark:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
uv sync --reinstall-package optfuncs --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short

-q is safe for this benchmark; the comparison table is emitted through pytest_terminal_summary, not ordinary print output.

Stage 3 Adapter Benchmarks

Stage 3 benchmarks should measure allocator, adapter, wrapper, kernel, and solver-loop costs separately. The goal is not to make every fresh-return helper beat NumPy, Torch, or CuPy allocation. The goal is to make performance APIs use backend-native allocation, caller-provided outputs, workspace reuse, and batch/fused kernels.

Planned benchmark groups:

Group	What it measures	Expected result
`alloc only`	Backend-native empty/zero allocation with no cone math.	Fresh-return helpers may be parity/documented.
`fill only`	Zero-fill or backend fill on caller-owned buffers.	Native/backend path should match or beat host.
`one write only`	Single scalar write into existing output.	Native/backend path should beat fresh allocation.
`wrapper overhead`	Python dispatch into adapter/native and return.	Must be reported separately from kernel time.
`full API`	User-facing `unit_vector`, `project`, or scaling allocation path.	Convenience API targets backend-native parity.
`out API`	`unit_vector_into`, `project_into`, and scaling into output.	Must preserve output pointer identity and beat full API.
`workspace`	Repeated calls using preallocated scratch buffers.	Must reduce allocations in solver-loop workloads.
`fused`	Combined operations such as project plus violation.	Must beat equivalent repeated scalar calls.

Recommended future command shape:

uv run pytest -q .\tests\benchmark-cpp --tb=short

When Stage 3-specific files are added, keep them under tests/benchmark-cpp and reuse pytest terminal summary output. Each row should include backend, cone, operation, dimension, allocation mode, memory state, median time, ratio, status, and documented reason when native cannot win.

CUDA benchmark rules:

Use CUDA event timing for device kernels.
Report warm and cold memory-pool states separately.
Avoid .item(), host logging, implicit default-stream synchronization, and accidental host-device transfers inside timed regions.
Validate Torch and CuPy stream semantics before trusting timing data.

Stage 3 acceptance:

unit_vector() fresh-return may be parity or documented.
unit_vector_into, batch/fused helpers, project/violation/scaling kernels, and workspace solver-loop paths should beat Stage 2 or the fair host baseline.
Tests must print every benchmark comparison, not only pass/fail.

PyCharm Configuration

Use the repository .venv created by uv sync as the project interpreter:

B:\FUNQITANG\optimization-test\.venv\Scripts\python.exe

Create a Python tests > pytest run configuration:

Field	Value
Target	`tests\benchmark-cpp\test_cpp_operator_benchmarks.py`
Working directory	`B:\FUNQITANG\optimization-test`
Additional arguments	`-q --tb=short`
Environment variables	`PYTHONUTF8=1`

Run the configuration from PyCharm's Run tool window. The benchmark table appears after the pytest warnings summary. If PyCharm does not show the table, remove any setting that hides pytest terminal summary output.

When native C++ changed, run the VS DevShell uv sync --reinstall-package command above from a terminal first, then run the PyCharm pytest configuration.

For Stage 3 benchmark development, duplicate this configuration and change the target to tests\benchmark-cpp so new adapter, workspace, batch, and fused benchmark files run together and still emit one terminal summary.

CLion Configuration

Use CLion's CMake support for navigation and an external run configuration for pytest.

For native code indexing, open the repository root and configure CMake with:

cmake -S . -B build\clion-check2 -G "Visual Studio 18 2026" -A x64 -DCMAKE_CUDA_ARCHITECTURES=75

Create an External Tool or Shell Script run configuration:

Field	Value
Program	`C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe`
Arguments	`-NoProfile -ExecutionPolicy Bypass -Command "uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short"`
Working directory	`B:\FUNQITANG\optimization-test`

For a rebuild-and-test configuration, use this PowerShell command in the same run configuration:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'; & $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation; uv sync --reinstall-package optfuncs --group torch-cu130 --group cupy-cu13 --extra convex --dev; uv run pytest -q .\tests\benchmark-cpp\test_cpp_operator_benchmarks.py --tb=short

The rebuild-and-test command is slower but guarantees CLion is running against the extension produced from current native sources.

For Stage 3 native work, add CMake variants for allocator and library policy experiments instead of editing source-level defaults. Useful configure flags are -DOPTFUNC_BLAS=AUTO|OFF|SYSTEM|OPENBLAS, -DOPTFUNC_USE_CUBLAS=ON|OFF, -DOPTFUNC_NATIVE_CPU_TUNE=ON, and -DOPTFUNC_MSVC_ARCH=AVX2.

Native Optimization Options

The cpp backend must not call NumPy functions from C++ or hide a NumPy fast path behind native benchmark results. It can use native techniques that NumPy also relies on internally:

Compiler optimization and auto-vectorization through Release CMake flags.
Opt-in machine tuning through OPTFUNC_NATIVE_CPU_TUNE=ON and, on MSVC, OPTFUNC_MSVC_ARCH=AVX2 or another supported /arch value.
OpenMP for large host loops where thread overhead is amortized.
C++-owned ndarray allocation through nanobind and capsule ownership for allocation-dominated helpers such as unit vectors and identity matrices.
Cone-specific algebraic reductions, for example applying SOC Lorentz quadratic operators as structured rank updates instead of dense O(n^3) matrix multiplications.
Optional host BLAS selected by CMake:

cmake -S . -B build\clion-blas -G "Visual Studio 18 2026" -A x64 -DOPTFUNC_BLAS=OPENBLAS

Use -DOPTFUNC_BLAS=AUTO for the default non-fatal probe, OFF to disable host BLAS, SYSTEM to require CMake's default BLAS, or OPENBLAS to require OpenBLAS. CUDA dense helpers can link cuBLAS through -DOPTFUNC_USE_CUBLAS=ON; turn it off only when comparing custom CUDA kernels against vendor libraries.

When a C++ path still cannot beat the host backend, the benchmark must print a documented row with the concrete reason. Current expected reasons are allocation-dominated tiny helpers, absence of a requested native library, or a portable fallback such as a PSD eigensolver path that has not yet been replaced with BLAS/LAPACK/cuSolver.

Common Test Commands

Run cone operator correctness and backend parity:

uv run pytest -q tests\test_cone_operators.py --tb=short

Run the standalone C++ cone subpackage embedding checks used to protect downstream conic solver imports:

uv run pytest -q tests\test_cpp_package_embed.py --tb=short

This test blocks SciPy, CVXPY, Torch, and CuPy in subprocesses, changes to a non-repository working directory, and verifies that optfunc.cvxs.cones.cpp can still be imported and used with NumPy arrays. It also builds a wheel and installs it with pip install --no-deps --target semantics to catch missing native extension or stub files. It now builds the plain optfuncs wheel as CPU-only and verifies has_cuda() is false without the addon. If the active uv environment has no pip module, the test uses uv pip install --no-deps --target as the equivalent installer path.

When nvcc is available, the same file also builds the optfuncs-cu130 addon wheel and verifies that it contains optfuncs_cuda130/_cpp_cuda* without owning any optfunc/ package files.

Run the full suite:

uv run pytest -q

Run cone code lint checks:

uv run ruff check src\optfunc\cvxs\cones tests\benchmark-cpp tests\test_cone_operators.py

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111