logo
0
1
WeChat Login
Release v0.0.9 [skip ci]

optfunc Developer Guide

This document is for maintainers and contributors. The public README focuses on install and user-facing APIs; this file covers development workflows, release checks, project structure, interface contracts, and design conventions.

Before changing cone APIs, native cone sources, backend packages, or CMake/scikit-build wiring, read .codex/build.dev.md and treat it as the project-local build contract.

For pytest commands, benchmark output expectations, and PyCharm/CLion run configuration details, read README.test.md.

User-facing long-form guides for installed-package usage belong under docs/. Keep the root README.md focused on entry points and link to docs/ for detailed tutorials.

Local Development

Shared prerequisites:

  • Python 3.12 or newer. .python-version pins the default development interpreter to 3.12; native wheels have been smoke-built locally for CPython 3.12, 3.13, and 3.14.
  • uv for environment synchronization, lockfile refresh, builds, and publish commands.
  • CMake, scikit-build-core, nanobind, and a C++17 compiler for native cone builds.
  • CUDA 13.x toolkit only when building or validating the optfuncs-cuda130 addon.

Windows native development requires MSVC/Visual Studio Build Tools. Load the project VS DevShell in the same PowerShell session before native rebuilds, benchmark runs that compile extensions, or uv build:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
uv sync --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q

Linux native development needs a C++ toolchain, CMake, and Ninja/pkg-config as needed by the platform. CPU development does not require CUDA. If CMake reports that Python Development.Module is missing, install a uv-managed interpreter and build with --managed-python --python 3.12.12:

uv python install 3.12.12
uv sync --no-default-groups --group torch-cpu --extra convex --dev
uv run pytest -q

Linux CUDA compile development should run in an environment with CUDA 13.x headers and nvcc, matching the CI compiler image:

uv sync --no-default-groups --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q tests/test_cone_operators.py --tb=short

Install one Torch backend extra for unconstrained optfunc work:

uv sync --group torch-cpu
uv run pytest -q --optfunc-report

For convex benchmark development, the convex extra is sufficient. Convex generation uses NumPy/Python parameters and must not import Torch.

uv sync --extra convex
uv run pytest tests/test_convex_benchmarks.py -q

Run the full suite with one Torch extra plus convex, because unconstrained optfunc tests exercise PyTorch.

uv sync --group torch-cpu --extra convex --dev
uv run pytest -q

The CUDA development path used in this repository is:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
uv sync --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q --optfunc-report

Only one Torch group should be active at a time. The Torch groups are mutually exclusive in pyproject.toml; the convex extra and cupy-cu13 test group are additive.

Local Build And Publish Checks

Builds use scikit-build-core, CMake, and nanobind. The main package and CUDA addon are intentionally split:

  • optfuncs owns src/optfunc and installs the CPU-only optfunc/cvxs/cones/cpp/_cpp* extension.
  • optfuncs-cuda130 owns only optfuncs_cuda130 and installs optfuncs_cuda130/_cpp_cuda*.

Local release-matrix build uses CPython 3.12, 3.13, and 3.14 for both the CPU main wheel and the CUDA addon. On Linux, prefer uv-managed Python so CMake can find Development.Module without relying on distro Python headers:

export DIST_DIR=dist-local
export PYTHON_VERSIONS="3.12.12 3.13.11 3.14.2"
export CUDA_ARCHS="75;80;86;89;90"
uv python install $PYTHON_VERSIONS
uv build --sdist --no-sources --managed-python --python 3.12.12 --out-dir "$DIST_DIR"

for PY in $PYTHON_VERSIONS; do
  uv build --wheel --no-sources --managed-python --python "$PY" \
    --out-dir "$DIST_DIR" \
    --config-setting=cmake.define.OPTFUNC_WITH_CUDA=OFF \
    --config-setting=cmake.define.CMAKE_BUILD_TYPE=Release

  uv build --wheel --no-sources --managed-python --python "$PY" \
    packages/optfuncs-cuda130 --out-dir "$DIST_DIR" \
    "--config-setting=cmake.define.CMAKE_CUDA_ARCHITECTURES=$CUDA_ARCHS" \
    --config-setting=cmake.define.OPTFUNC_WITH_CUDA=ON \
    --config-setting=cmake.define.OPTFUNC_CUDA_ADDON=ON \
    --config-setting=cmake.define.OPTFUNC_USE_CUBLAS=ON \
    --config-setting=cmake.define.OPTFUNC_USE_CUSOLVER=ON \
    --config-setting=cmake.define.CMAKE_BUILD_TYPE=Release
done

REPAIRED_DIST_DIR="${DIST_DIR}-repaired"
rm -rf "$REPAIRED_DIST_DIR"
mkdir -p "$REPAIRED_DIST_DIR"
for WHL in "$DIST_DIR"/*.whl; do
  uvx --from auditwheel --with patchelf auditwheel repair \
    --plat manylinux_2_38_x86_64 \
    --exclude libcublas.so.13 \
    --exclude libcublasLt.so.13 \
    --exclude libcusolver.so.12 \
    --exclude libcusparse.so.12 \
    --exclude libnvJitLink.so.13 \
    -w "$REPAIRED_DIST_DIR" \
    "$WHL"
done
cp "$DIST_DIR"/*.tar.gz "$REPAIRED_DIST_DIR"/

Before publishing locally or tagging a release, verify wheel contents from a clean directory. The main wheel must include optfunc/cvxs/cones/cpp/_cpp* and must not include optfuncs_cuda130; the addon wheel must include optfuncs_cuda130/_cpp_cuda* and must not include optfunc/. Linux addon wheels must not vendor NVIDIA CUDA libraries; cuBLAS/cuSolver should be provided by the user's CUDA installation or by the optional optfuncs-cuda130[cuda-toolkit] dependency.

python scripts/verify_release_dist.py \
  --dist "$REPAIRED_DIST_DIR" \
  --version 0.0.7 \
  --platform linux \
  --require-sdist

On Windows, load the project VS DevShell first, then build the same matrix:

$DistDir = 'dist-windows'
$PythonVersions = @('3.12.12', '3.13.11', '3.14.2')
$CudaArchs = '75;80;86;89;90'
foreach ($Py in $PythonVersions) {
    uv python install $Py
}
foreach ($Py in $PythonVersions) {
    uv build --wheel --no-sources --managed-python --python $Py --out-dir $DistDir `
      --config-setting=cmake.define.OPTFUNC_WITH_CUDA=OFF `
      --config-setting=cmake.define.CMAKE_BUILD_TYPE=Release
    uv build --wheel --no-sources --managed-python --python $Py `
      packages/optfuncs-cuda130 --out-dir $DistDir `
      "--config-setting=cmake.define.CMAKE_CUDA_ARCHITECTURES=$CudaArchs" `
      --config-setting=cmake.define.OPTFUNC_WITH_CUDA=ON `
      --config-setting=cmake.define.OPTFUNC_CUDA_ADDON=ON `
      --config-setting=cmake.define.OPTFUNC_USE_CUBLAS=ON `
      --config-setting=cmake.define.OPTFUNC_USE_CUSOLVER=ON `
      --config-setting=cmake.define.CMAKE_BUILD_TYPE=Release
}

python scripts/verify_release_dist.py `
  --dist $DistDir `
  --version 0.0.7 `
  --platform windows

Local verification on May 17, 2026 covered Windows and WSL Linux cp312, cp313, and cp314 CPU and CUDA addon wheels using CMAKE_CUDA_ARCHITECTURES=75;80;86;89;90.

Local publish dry-runs should use the same upload and check URLs as CI:

export RELEASE_DIST_DIR=dist-release
rm -rf "$RELEASE_DIST_DIR"
mkdir -p "$RELEASE_DIST_DIR"
cp dist-linux-repaired/* "$RELEASE_DIST_DIR"/
cp dist-windows/* "$RELEASE_DIST_DIR"/
python scripts/verify_release_dist.py \
  --dist "$RELEASE_DIST_DIR" \
  --version 0.0.7 \
  --platform linux \
  --platform windows \
  --require-sdist
export UV_PUBLISH_TOKEN="${PYPI_TOKEN:-dry-run-token}"
uv publish --dry-run --publish-url https://upload.pypi.org/legacy/ \
  --check-url https://pypi.org/simple/optfuncs/ "$RELEASE_DIST_DIR"/*

Remove --dry-run and use a real token only after the file list and target URL are correct.

PyPI never allows reusing a filename after it has been uploaded and deleted. For v0.0.7, the release CI applies wheel build tag 1 to avoid filenames burned by an interrupted upload while keeping the public version at 0.0.7.

When documenting usage from downstream uv projects, recommend splitting optfuncs dependencies by downstream optional extra. For example:

uv add --optional test "optfuncs[convex]>=0.0.7"
uv add --optional torch "optfuncs[torch-cu130]>=0.0.7"

This produces a downstream shape where test owns CVXPY/MOSEK benchmark support without installing Torch, and the target project explicitly chooses the Torch hardware runtime. Published Torch extras are intentionally hardware-specific: torch-cpu, torch-cu118, torch-cu126, torch-cu128, torch-cu130, torch-rocm, and torch-xpu. Keep downstream extra names aligned with the target project's packaging scheme, but keep the selected optfuncs[...] extra hardware-specific.

Downstream extraDependency
testoptfuncs[convex]>=0.0.7
torchoptfuncs[torch-cu130]>=0.0.7

Release Flow

The release pipeline is tag-driven: pushing vX.Y.Z triggers the CNB tag_push pipeline. CNB publishes Linux cp312, cp313, and cp314 optfuncs CPU wheels, matching Linux optfuncs-cuda130 CUDA addon wheels, and the optfuncs sdist from a regular CPU runner. Windows cp312, cp313, and cp314 wheels are published by the external Windows workflow because CNB hosted runners are Linux Docker runners.

  1. Confirm the worktree contains only intended release changes.
  2. Update pyproject.toml to the target version.
  3. Refresh the lockfile after dependency or metadata changes.
  4. Run tests and build local smoke wheels for the full Python/platform matrix.
  5. Merge the release commit to main.
  6. Create or overwrite the release tag.
  7. Push main and the tag so CNB publishes Linux wheels and sdist.
  8. Confirm the external Windows workflow publishes win_amd64 CPU and CUDA addon wheels for cp312, cp313, and cp314.

Typical commands:

uv version X.Y.Z --frozen
uv lock
uv run pytest -q
# Build the full local release matrix with the commands above, then tag.

git switch main
git merge --ff-only <release-branch>
git tag -f -a vX.Y.Z -m "Release vX.Y.Z"
git push origin main
git push --force origin vX.Y.Z

For native-cone releases, local Windows smoke builds use the same matrix commands from a PowerShell session after loading the project VS DevShell. Verify all produced wheel names before tagging:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
Get-ChildItem dist-windows -Filter *.whl | Sort-Object Name

The CNB tag pipeline uses nvidia/cuda:13.0.2-devel-ubuntu24.04 on regular cnb:arch:amd64, builds with explicit portable CUDA architectures for the addon, and fails before publish if the Linux wheels do not contain their expected native extensions. Detailed release CI behavior is documented in docs/release-ci.md.

Use git tag -f and git push --force origin vX.Y.Z only when intentionally overwriting an existing release tag. Before force-pushing a tag, verify the tag target:

git show --no-patch --format="%h %D%n%s" vX.Y.Z
git ls-remote --tags origin "vX.Y.Z^{}"

PyPI Post-Publish Checks

After CNB publishes the tag, verify both metadata and runtime behavior from an environment that is not importing the local src tree.

Check the published page and package metadata:

python -m pip index versions optfuncs
python -m pip download --no-deps optfuncs==X.Y.Z -d /tmp/optfuncs-check
python -m pip show optfuncs

Create a clean environment and install from PyPI:

python -m venv /tmp/optfuncs-pypi-check
/tmp/optfuncs-pypi-check/bin/python -m pip install --upgrade pip
/tmp/optfuncs-pypi-check/bin/python -m pip install "optfuncs[torch,convex]==X.Y.Z"

On Windows PowerShell:

python -m venv $env:TEMP\optfuncs-pypi-check
& $env:TEMP\optfuncs-pypi-check\Scripts\python -m pip install --upgrade pip
& $env:TEMP\optfuncs-pypi-check\Scripts\python -m pip install "optfuncs[torch,convex]==X.Y.Z"

Run import and API smoke checks from outside the repository:

cd /tmp
python - <<'PY'
import optfunc
from optfunc import BenchmarkRegistry, ConvexFamily, OptFuncRegistry

print(optfunc.__file__)
assert "site-packages" in (optfunc.__file__ or "")

sphere = OptFuncRegistry.create("sphere", dim=3)
assert sphere.meta.name == "Sphere"

gw = BenchmarkRegistry.create("gw_maxcut", constraints="convex", dim=6, seed=5)
assert gw.meta.optimal_value is not None
assert gw.known_solution("X").shape == (6, 6)

family = ConvexFamily.create("gw_maxcut", dim=6)
assert family.name == "GoemansWilliamsonMaxCutSDP"

print("optfuncs PyPI smoke check passed")
PY

If a release includes new extras or dependency changes, inspect wheel metadata:

python -m pip download --no-deps optfuncs==X.Y.Z -d /tmp/optfuncs-wheel
python -m zipfile -l /tmp/optfuncs-wheel/optfuncs-X.Y.Z-*.whl

Confirm optfunc/convex.py, optfunc/cvxs/base.py, optfunc/testing/core.py, optfunc/__init__.pyi, optfunc/py.typed, and optfunc/cvxs/cones/cpp/_cpp*, and optfuncs-X.Y.Z.dist-info/METADATA are present in the main platform wheels. The metadata should list the expected optional extras convex, torch-cpu, torch-cu118, torch-cu126, torch-cu128, torch-cu130, cupy-cu13, torch-rocm, and torch-xpu. The torch-cu130 and cupy-cu13 extras must require optfuncs-cuda130==X.Y.Z. The addon wheel must contain optfuncs_cuda130/_cpp_cuda* and must not contain optfunc/ package files. The addon may expose optfuncs-cuda130[cuda-toolkit] for users who want uv to install cuda-toolkit[cublas,cusolver]; do not make those NVIDIA runtime packages mandatory because many CUDA users provide cuBLAS/cuSOLVER themselves.

Project Structure

The package uses a standard src layout.

PathResponsibility
src/optfunc/__init__.pyLazy runtime public exports for optfunc classes, registries, convex APIs, and testing helpers.
src/optfunc/__init__.pyiIDE/type-checker visible public exports matching the lazy runtime facade.
src/optfunc/functional.pyCore Torch optfunc abstraction, callable wrapper, and unconstrained registry.
src/optfunc/registry.pyTorch-free BenchmarkRegistry router for constraints="none" and constraints="convex".
src/optfunc/func/Built-in differentiable benchmark functions registered through OptFuncRegistry.
src/optfunc/convex.pyStatic compatibility facade for old convex imports.
src/optfunc/cvxs/base.pyConvexFamily, ProblemFamily, and shared native-first family contracts.
src/optfunc/cvxs/native.pyCppNativeProblem*, Clarabel-style canonical P, q, A, b, and ordered cone block data.
src/optfunc/cvxs/cones/Cone descriptors, abstract operator protocols, backend router, and backend packages.
src/optfunc/cvxs/cones/cpp/Dedicated nanobind C++/CUDA cone backend package and CLion-readable native CMake subproject.
src/optfunc/cvxs/cones/*_backend/Host, NumPy, PyTorch CPU, PyTorch CUDA, and CuPy CUDA operator implementations used for comparison and routing.
src/optfunc/cvxs/translators/Translation from optfuncs conic data to solver/file formats such as MOSEK PTF.
src/optfunc/cvxs/_utils/Cone-specific and data-structure-specific helpers, including Clarabel-style PSD svec.
src/optfunc/cvxs/*.pyBuilt-in native-first convex problem families registered through ConvexFamily.
src/optfunc/testing/core.pyPytest-oriented optimizer cases, problem wrappers, result reports, and evaluator logic.
src/optfunc/pytest_plugin.pyPytest plugin option and terminal report collection.
docs/User-facing installed-package guides for cone backends, convex benchmarks, and solver integration.
tests/test_benchmarks.pyUnit and smoke tests for optfunc metadata, derivatives, minimizers, and Torch optimizer harness behavior.
tests/test_convex_benchmarks.pyUnit, solver, batch, and packaging smoke tests for convex problem generation and APIs.
tests/test_cone_operators.pyNative cone correctness, ownership, zero-copy rejection, CUDA availability, and performance tests.
tests/test_adam_optimizer.pyExample generated pytest case for the built-in Adam helper.
scripts/Helper scripts for selecting a Torch backend extra.
CMakeLists.txtscikit-build-core CMake entry point that delegates native cone targets to src/optfunc/cvxs/cones/cpp/CMakeLists.txt.
pyproject.tomlPackage metadata, extras, pytest entry point, uv indexes, scikit-build-core configuration, and hardware dependency groups.
uv.lockLocked dependency graph. Refresh with uv lock after dependency or metadata changes.

Interface Specification

Torch Optfuncs

TorchOptFunction is the base class for differentiable unconstrained benchmark functions.

  • Inputs are one-dimensional torch.Tensor values with shape (dim,).
  • forward(x) returns a scalar tensor.
  • grad, value_and_grad, hessian, and hvp use Torch autograd by default.
  • global_minimizer() returns a known minimizer when available.
  • distance_to_optimum(x) measures distance to the known optimum set.
  • project_to_bounds(x) clamps into the documented search box.

Implementations should keep metadata complete: name, dimension, bounds, category, global minimum, note, and source URL where possible.

CallableOptFunction adapts user callables into the same interface. It is primarily useful for tests and small local benchmarks.

OptFuncRegistry owns unconstrained Torch benchmark names. Registration names are normalized by lowercasing and removing spaces, underscores, and hyphens.

Convex Problem Families

ProblemFamily is the base class for a family of parameterised convex programs. In cvxs, theta means the parameter dictionary required to generate one optimization problem; it is not the solver variable. The unified user-facing methods are:

sample_parameters(seed: int = 0) -> dict[str, numpy.ndarray | float | int]
sample_parameters(n: int, *, seed: int = 0) -> list[dict[str, numpy.ndarray | float | int]]
sample_theta_seed(seed: int = 0) -> dict[str, int]
sample_theta_seed(n: int, *, seed: int = 0) -> list[dict[str, int]]
build_instance(
    theta: Mapping[str, numpy.ndarray | float | int]
    | Sequence[Mapping[str, numpy.ndarray | float | int]],
) -> ConicProblem | list[ConicProblem]
perturb(
    theta: Mapping[str, numpy.ndarray | float | int],
    n: int | None = None,
    *,
    magnitude: float = 1.0,
    seed: int = 0,
) -> dict[str, numpy.ndarray | float | int] | list[dict[str, numpy.ndarray | float | int]]
generate_sequence(n: int, *, seed: int = 0) -> list[ConicProblem]
_analytic_optimal_solution(problem: ConicProblem) -> dict[str, numpy.ndarray] | None
_analytic_optimal_value(problem: ConicProblem) -> float | None
_clarabel_fallback_optimal_value(problem: ConicProblem) -> float | None

sample_theta_* names identify concrete sampling methods and must follow the same single-or-many convention as sample_parameters. Matching perturb_* methods identify how a theta sequence is derived from an existing theta, such as the default sample_theta_seed / perturb_seed pair or the dense-QP sample_theta_uniform / perturb_uniform pair. gw_maxcut also exposes sample_theta_weights / perturb_weights for explicit edge-weight theta where neighboring problems differ by small continuous perturbations instead of a fresh seed. lyapunov_lmi and markowitz_sdp use smooth explicit theta as their default sampler because they model warm-start sequences directly.

ConicProblem is an alias for CppNativeProblem. The family and registry path returns cpp-native problem descriptions: each instance has .theta, .cpp_native, .to_problem_data(), .to_cvxpy_problem(), .to_ptf_text(), and .write_ptf(path).

ConicProblemData is the internal solver-facing representation. It stores Clarabel-style P, q, A, b, and cone block metadata for min 1/2 x.T P x + q.T x subject to A x + s = b, s in K. Cone blocks are ordered as zero, nonnegative, second_order, psd, then exponential; sort_cones must be used whenever blocks are assembled out of order because it reorders A rows and b slices with the cone metadata.

Cone block descriptors live under optfunc.cvxs.cones. ZeroCone, NonnegativeCone, SecondOrderCone, PsdCone, and ExponentialCone describe Clarabel metadata only: kind, dim, rows, and optional name. They do not own projection, containment, or violation behavior.

Use estimate_problem_memory_requirements(...) and can_fit_problem_on_current_devices(...) before handing generated cvxs problems to a solver backend. Cone descriptors, ConicProblemBuilder, and ConicProblemData also run configurable construction-time memory checks so oversized dense NT-scaling or native workspace requirements fail before operators or batch plans are created. The guard is controlled by OPTFUNC_CVXS_MEMORY_CHECK, OPTFUNC_CVXS_MEMORY_LIMIT_BYTES, OPTFUNC_CVXS_CUDA_MEMORY_LIMIT_BYTES, and OPTFUNC_CVXS_MEMORY_FRACTION.

The top-level optfunc facade remains lazy for public exports, and cone-only packages such as optfunc.cvxs.cones.cpp must stay importable without importing problem-family modules. Inside cvxs problem-family modules, CVXPY is a required convex dependency and may be imported directly; users still receive cpp-native problem descriptions first and convert by chaining .to_cvxpy_problem() when they need CVXPY.

Compute behavior lives in backend-specific operator packages. Use make_cone_operator(cone, backend="cpp") for routed construction, or import a backend operator directly from packages such as optfunc.cvxs.cones.cpp, numpy_backend, host_torch_backend, cuda_torch_backend, or cuda_cupy_backend. Operators implement project, project_into, contains, violation, unit_vector, nt_scaling_matrix, and sparse H-matrix value fills such as hmatrix_csr_values_into and hmatrix_coo_values_into. The native backend is optfunc.cvxs.cones.cpp._cpp, installed inside the cpp package and exposed through thin Python wrapper classes such as PsdOperatorCpp and PsdConeCpp. Normal code must not depend on optfunc.cvxs.cones._native.

For fixed problem data, make_cone_batch_plan(...) returns ConeBatchPlanCpp, which owns the cone-block grouping plan for problem-level NT-scaling and sparse H fills. It exposes serial correctness methods and planned batch methods; the default problem-level paths use the planned methods, not reduced-KKT ownership. Factorization, solve, permutation lifecycle, and reduced-KKT storage remain solver responsibilities outside optfunc.cvxs. COO sparse H fill has explicit row_sorted and block_scan methods. The current tested default is CPU sorted rows -> row_sorted, and CUDA or unsorted rows -> block_scan; preference tests must be updated whenever a new strategy changes that segmentation.

Native method defaults are selected from pytest preference matrices, not a single hard-coded benchmark. Current CUDA NT-scaling defaults are SOC cuBLAS for rows <= 32 and batch >= 16, SOC custom otherwise, PSD analytic for dim <= 2, PSD custom for 3 <= dim < 8 and batch < 16, PSD cuBLAS for tested small/batch cases through dim <= 32, and PSD cuSolver matrix powers plus cuBLAS GEMM for 33 <= dim <= 128. PSD conversion supports dim <= 128; PSD CUDA sparse H uses direct W-matrix CSR/COO gather for dim >= 3 so it does not materialize cone-space dense H for large PSD blocks.

Native wrappers may choose NumPy, Torch, or CuPy primitives for simple diagonal operators when pytest preference cases show the backend path beats native binding overhead. Current examples are diagonal zero/nonnegative dense-CSR H fills and CPU diagonal PSD dense-CSR H fills; non-canonical or non-diagonal patterns continue to native kernels.

External solvers may pass descriptor-like objects with kind, dim, rows, and optional name to make_cone_operator. Keep this duck-typed contract stable: descriptor objects are metadata, while C++ operator instances expose kind, rows, descriptor_dim, to_descriptor(...), and to_descriptors(...) for integration into solver-owned cone tables.

Native calls accept only one-dimensional contiguous float64 inputs; non-contiguous views and dtype mismatches must fail instead of copying. Torch CUDA and CuPy calls pass the active backend CUDA stream pointer to native kernels. Benchmark comparisons must include construction, validation, allocation, wrapper overhead, and the actual operation.

CppNativeProblem wraps cpp-native conic data plus CVXPY conversion and exposes:

  • problem: the underlying cvxpy.Problem;
  • variables: named CVXPY variables;
  • theta: parameter dictionary used to generate this concrete problem;
  • data: deterministic instance data used to construct the problem;
  • problem_data: optfuncs-owned Clarabel-style conic data for translation;
  • cpp_native: alias for the raw ConicProblemData;
  • to_problem_data() / to_clarabel_data(): return solver-facing conic data;
  • to_cvxpy_problem(): return the CVXPY problem converted from the native description;
  • to_ptf_text() / write_ptf(path): export the generated problem to PTF;
  • meta: name, cone type, dimension, seed, reference solver, and optimum value;
  • solve(...): default MOSEK solve with SCS fallback;
  • known_solution(name=None): known theoretical or encoded reference solution;
  • distance_to_optimum(name=None): distance from current variable value to the known solution.

Convex internals use np.float64 by default through optfunc.cvxs._utils.data.CVX_DTYPE. Family data, helper outputs, and stored known solutions should preserve at least 10 decimal places unless an external solver returns a looser tolerance. When a construction needs reductions such as matrix-vector products, trace products, or sums for known optima, use the CVX_HIGH_DTYPE helpers in optfunc.cvxs._utils.data and then store the resulting public data as np.float64.

ConvexFamily is the only registry and routing proxy for convex problem family names. Its public entry points are:

  • ConvexFamily.create(name, **family_kwargs) for a ProblemFamily;
  • ConvexFamily.available() for registered normalized names.

BenchmarkRegistry.create(name, constraints="convex", ...) is a convenience router that creates a family with ConvexFamily.create(...), samples one theta from seed unless an explicit theta is provided, and calls family.build_instance(...). Prefer ConvexFamily directly when working with parameter sequences or batches.

Current family-backed convex registry names are zero/zero_cone_qp, nonneg/nonnegative_cone_qp, psd/psd_cone_projection, gw_maxcut/maxcut_sdp, qcp/constrained_qp, lyapunov_lmi/lyapunov_stability_lmi, and markowitz_sdp/markowitz_portfolio_sdp. Direct subclass use, such as GoemansWilliamsonMaxCutSDPFamily(dim=8), ContinuousTimeLyapunovStabilityLMIFamily(dim=4), LongOnlyMarkowitzPortfolioSDPFamily(n_assets=8, n_factors=3), or ConstrainedQP(n_vars=50, n_eq=12, n_ineq=25), must expose the same sample_parameters, build_instance, generate_sequence, and perturb behavior as the registry path.

Each ProblemFamily subclass docstring must state the theta sequence model, whether an analytic optimizer and optimal value are available, and the DOI for the mathematical source or modeling background. Families with an analytic reference implement _analytic_optimal_solution(...) and _analytic_optimal_value(...); solver-only families return None from those methods and use _clarabel_fallback_optimal_value(...) for regression tests.

PSD Vectorization Notes

PSD cone vectorization is not the same as ordinary matrix flattening. The helper optfunc.cvxs._utils.psd.svec_clarabel follows the Clarabel SDP example: lower-triangular row order, equivalently upper-triangular column order, with off-diagonal entries scaled by sqrt(2).

For a symmetric matrix [[x1, x2, x4], [x2, x3, x5], [x4, x5, x6]], svec_clarabel returns [x1, sqrt(2) * x2, x3, sqrt(2) * x4, sqrt(2) * x5, x6]. Use vec_col_major or vec_row_major from optfunc.cvxs._utils.data only for ordinary matrix vectorization, not for PSD cone data. If a future backend needs MOSEK cookbook lower-column svec, add a separate helper such as svec_mosek_lower_col instead of reusing svec_clarabel.

References:

Solver Translation Notes

Translation modules should consume ConicProblemData, not raw CVXPY objects, unless the translator explicitly validates and canonicalizes the CVXPY problem. The MOSEK PTF translator writes linear conic constraints directly from b - A x in K. Quadratic objectives with PSD P are exported through a second-order-cone epigraph variable; indefinite P, non-DCP CVXPY problems, and unsupported cones must raise explicit translation errors.

Optimizer Testing

OptimizerCase describes one pytest item. For unconstrained functions:

  • constraints="none";
  • opt_func is a registry name or a TorchOptFunction;
  • the optimizer receives OptimizationProblem;
  • the optimizer returns a final torch.Tensor or OptimizerResult.

For convex benchmarks:

  • constraints="convex";
  • opt_func is a ConvexFamily family name, a ProblemFamily, or a direct CppNativeProblem;
  • batch_size controls how many same-family instances to generate;
  • perturb_magnitude controls successive parameter perturbations;
  • the optimizer receives ConvexOptimizationProblem;
  • the optimizer may solve instances in place and return None, or return one or more CppNativeProblem instances.

ConvexOptimizationProblem exposes:

  • family: the ProblemFamily for generated cases, or None for direct instances;
  • instances: concrete cpp-native problem objects;
  • parameters: the sampled parameter dictionaries;
  • problem: the single instance for non-batch cases;
  • budget and case_id.

EvaluationReport is shared across both paths. For convex batches, steps is the number of instances and history records per-instance value, gap, and distance metrics.

Project Design Language

The project should keep a small, explicit API surface. Public objects should be easy to import from optfunc, and implementation details should stay inside their modules.

Core conventions:

  • Registry first: users choose built-ins by stable names, while direct class or object construction stays available for advanced use.
  • Deterministic randomness: stochastic benchmark families accept seeds or numpy.random.Generator; generated data should be reproducible.
  • Explicit references: if a benchmark has a known optimum, encode it in the problem object. If not, make the reference-solver path clear.
  • Optional dependencies by capability: Torch backends live in Torch extras; CVXPY/MOSEK live in convex. Convex modules must not import Torch, and import errors should point users to the right extra.
  • Pytest oriented: testing helpers should produce normal pytest test functions, independent parametrized cases, readable failure messages, and optional terminal summaries.
  • Conservative compatibility: keep older registry aliases when adding new abstractions. Route old entry points through new implementations when possible.
  • Small abstractions: add a class only when it describes a real concept in the package, such as a differentiable function, a convex family, a concrete CVXPY instance, or an optimizer evaluation case.

Avoid hidden behavior that makes benchmark results hard to interpret. Optimizer wrappers should decide how to solve, when to stop, and what history to expose. The harness should evaluate and report, not disguise algorithm choices.

Standard Usage Paths

Use the direct class path for simple manual experiments:

from optfunc import Sphere

f = Sphere(dim=8)
value = f(f.global_minimizer())

Use OptFuncRegistry for configurable unconstrained tests:

from optfunc import OptFuncRegistry

f = OptFuncRegistry.create("rosenbrock", dim=4)

Use BenchmarkRegistry only when one call site must switch between constraint modes:

from optfunc import BenchmarkRegistry

benchmark = BenchmarkRegistry.create("sphere", constraints="none", dim=4)
convex = BenchmarkRegistry.create("gw_maxcut", constraints="convex", dim=8, seed=0)

Use ConvexFamily for parameterised convex programs and batches:

from optfunc import ConvexFamily

family = ConvexFamily.create("gw_maxcut", dim=8)
theta_sequence = family.sample_parameters(3, seed=0)
problems = family.build_instance(theta_sequence)

assert [problem.theta for problem in problems] == theta_sequence
cpp_data = problems[0].cpp_native
ptf_text = problems[0].to_ptf_text()
cvxpy_problem = problems[0].to_cvxpy_problem()

Use make_optimizer_tests for pytest integration:

from optfunc.testing import OptimizerCase, make_optimizer_tests, make_torch_adam

test_adam = make_optimizer_tests(
    optimizer=make_torch_adam(),
    cases=[OptimizerCase(opt_func="sphere", constraints="none", dim=4)],
)

For convex solver tests:

from optfunc.testing import ConvexOptimizationProblem, OptimizerCase, make_optimizer_tests


def cvxpy_solver(problem: ConvexOptimizationProblem):
    for instance in problem:
        instance.solve()


test_cvxpy = make_optimizer_tests(
    optimizer=cvxpy_solver,
    cases=[
        OptimizerCase(
            opt_func="gw_maxcut",
            constraints="convex",
            dim=8,
            batch_size=4,
            seed=0,
        )
    ],
)

Adding A Built-In Torch Optfunc

  1. Add a module under src/optfunc/func/.
  2. Subclass TorchOptFunction.
  3. Register it with @OptFuncRegistry.register(...).
  4. Add the import and __all__ entry in src/optfunc/func/__init__.py.
  5. Add lazy export metadata in src/optfunc/__init__.py if users should import the class from optfunc.
  6. Add tests for registry availability, known minimizer value, gradient at the minimizer when meaningful, and optimizer smoke behavior if appropriate.

Adding A Built-In Convex Family

  1. Add a ProblemFamily subclass in src/optfunc/cvxs/<family_name>.py.
  2. Register it with @ConvexFamily.register(...).
  3. Import convex contracts from .base and helper code from ._utils.*; do not import from optfunc.convex.
  4. Implement build_instance; override sample_parameters through a named sample_theta_* method and override the matching perturb_* method when the family theta is not seed-only.
  5. Return cpp-native CppNativeProblem instances from build_instance, and accept both one theta dictionary and a sequence of theta dictionaries; do not add a representation selector to the user-facing family or registry interface.
  6. Pass the generation theta into the returned instance and ensure .cpp_native, .to_cvxpy_problem(), and .to_ptf_text() work from the same generated description.
  7. Include deterministic data, complete meta, and known solutions when available.
  8. Add the module import and public exports in src/optfunc/cvxs/__init__.py.
  9. Add tests for deterministic sampling, seed variation, DCP validity, solver behavior, known optimum checks, sequence generation, and packaging import.

Packaging Notes

The package is imported as optfunc but published as optfuncs.

Builds use scikit-build-core and CMake. Runtime packages are copied from src/optfunc, and the native cone extension is installed into optfunc/cvxs/cones/cpp:

[build-system]
requires = ["scikit-build-core>=0.11", "nanobind>=2.4"]
build-backend = "scikit_build_core.build"

[tool.scikit-build]
wheel.packages = ["src/optfunc"]

When adding new modules under src/optfunc, uv build --no-sources should show those modules being copied into the wheel. tests/test_convex_benchmarks.py contains a wheel import smoke test to catch missing package-data or discovery issues for the convex APIs.

The cone package has a dedicated nanobind C++/CUDA backend, optfunc.cvxs.cones.cpp._cpp. The repository root CMakeLists.txt delegates to src/optfunc/cvxs/cones/cpp/CMakeLists.txt, which owns the native cone subpackage for IDE parsing and package installation. It provides C++ classes for zero, nonnegative, second-order, PSD, and exponential cones, with host and CUDA projection paths. Stage 2 CUDA validation is mandatory, so local CUDA cone tests should fail when the CUDA toolkit/runtime or CUDA Python backends are unavailable.

Packaging smoke tests must verify that a built wheel contains optfunc/cvxs/cones/cpp/_cpp* and optfunc/cvxs/cones/cpp/__init__.pyi, and that optfunc.cvxs.cones.cpp can be imported from a clean downstream working directory with optional dependencies blocked. This protects the C++ cone subpackage as a reusable component for independently developed conic solvers.

Native cone optimization is split between compiler policy and algorithm policy. Portable Release builds stay wheel-compatible by default. Machine-specific host CPU tuning is opt-in through CMake options such as OPTFUNC_NATIVE_CPU_TUNE and OPTFUNC_MSVC_ARCH; use those for local benchmark experiments, not published wheel defaults. Per-cone C++ kernels should keep small-input serial paths, OpenMP host paths, and CUDA stream-aware paths explicit so tests can compare total user-facing runtime across machines.

Native cone code may use CMake-selected BLAS libraries, but it must not call NumPy from C++ to pass native benchmarks. Host BLAS is selected with OPTFUNC_BLAS=AUTO|OFF|SYSTEM|OPENBLAS; AUTO is the default non-fatal probe, SYSTEM is the right choice for CI images that provide MKL, BLIS, Accelerate, or another tuned BLAS through CMake, and OPENBLAS is the portable explicit choice for local Linux/Windows performance builds. When no external BLAS target is found, native cone kernels use the project C++ implementation and compiler auto-vectorization rather than a Python/NumPy fallback; Eigen can be considered as a future header-only fallback, but it is not expected to beat tuned BLAS for large dense kernels. CUDA dense helpers can link cuBLAS with OPTFUNC_USE_CUBLAS=ON, which remains separate from optional Python package dependencies such as optfuncs-cuda130[cuda-toolkit].

Stage 3 may replace the Stage 2 native/backend internals. The target shape is Python semantic API, backend adapter allocation/validation, and Python-free native kernels. See .codex/stage-cpp-cones.md for the Stage 3 task list and README.test.md for the adapter/workspace/fused benchmark matrix.

On Windows, load the project-local VS DevShell from .codex/skills/vs-dev-shell/SKILL.md in the same PowerShell session before uv sync, tests that rebuild editable wheels, or uv build:

$vsDevShell = 'A:\Visual Studio\18\Enterprise\Common7\Tools\Launch-VsDevShell.ps1'
& $vsDevShell -Arch amd64 -HostArch amd64 -SkipAutomaticLocation
uv sync --group torch-cu130 --group cupy-cu13 --extra convex --dev
uv run pytest -q tests/test_cone_operators.py
uv build --no-sources