Custom Mapping
This section shows how to exploit XXZ structure explicitly. See Optimization of the XXZ Model, where the XXZ model, optimization logic, and helper functions are defined. The public model constructor itself is documented in XXZ.
We use the fact that many offdiagonal elements share one value and that the mapping kernel is memory-bound. By reducing index/data traffic in mapping, we usually speed up the full POLFED pipeline.
Build a Structured Mapping
The workflow below uses the public constructor xxz_hamiltonian together with the helper functions defined in Optimization of the XXZ Model:
get_diags_and_offdiagonals_single_value(Delta, L, Lup; ...)mapvec_with_xxz!(...)
In practice, we first construct the XXZ Hamiltonian through the public model constructor, then extract diagonal/offdiagonal elements and use them to build a custom mapping.
using Polfed
using Polfed.Models: xxz_hamiltonian
using BenchmarkTools
using LinearAlgebra
L = 18
Lup = L ÷ 2
Delta = 0.55
mat = xxz_hamiltonian(L, Lup, 1.0, Delta, 0.0; boundary=:periodic, field=0.0, use_sparse=true)
diags, offdiag_val, flat, starts = get_diags_and_offdiagonals_single_value(Delta, L, Lup)
f_custom! = mapvec_with_xxz!(diags, offdiag_val, flat, starts)What these two helpers do:
get_diags_and_offdiagonals_single_value(Delta, L, Lup; ...)builds the XXZ matrix through the public constructor and extracts a compact representation: diagonal values (diags), the common offdiagonal value (offdiag_val), flattened offdiagonal column indices (flat), and row-start pointers (starts).mapvec_with_xxz!(diags, offdiag_val, flat, starts)builds and returns a mapping functionf_custom!(Y, X)that applies the matrix action using that compact representation, avoiding generic sparse matrix traversal overhead in each mapping call.
For more details about the XXZ model and the related mapping, see Optimization of the XXZ Model.
Benchmark Mapping Against Generic mul!
Spectral transformation is the dominant runtime component of POLFED. Because of that, optimizing mapping often speeds up almost the whole algorithm and decreases runtime.
When constructing new mappings, it is useful to benchmark mapping kernels directly first. There are multiple reasons. First, this isolates benchmarks to mapping only, making them fast and efficient to test. This also avoids getting different results from different polfed runs due to different memory allocation/layout effects. Benchmarking only the mapping repeatedly evaluates the same kernel, so the average time is reliable. In the end, this tells you whether speedup is expected for this specific mapping.
X = rand(size(mat, 1)); Y = similar(X)
@btime mul!($Y, $mat, $X)
@btime $f_custom!($Y, $X)For smaller system sizes (L=18) there is an approximate speedup factor of ~1.4, whereas for larger system sizes a factor of ~2.5 up to ~3.0 seems to persist.
| L | generic mul! |
custom mapping | speedup factor |
|---|---|---|---|
| 18 | 856.653 μs | 579.533 μs | 1.48 |
| 20 | 5.402 ms | 2.340 ms | 2.31 |
| 22 | 25.727 ms | 10.202 ms | 2.52 |
| 24 | 109.943 ms | 41.825 ms | 2.63 |
Use Custom Mapping in POLFED
After the mapping benchmark is finished, we can test it in full POLFED runs.
x0 = rand(size(mat, 1)); x0 ./= norm(x0)
howmany = 120
target = 0.0
f_mul! = (Y, X) -> mul!(Y, mat, X)
# Full POLFED with matrix -> mul! path
vals_mul, vecs_mul, report_mul = polfed(mat, x0, howmany, target; produce_report=true)
# Full POLFED with custom mapping path
vals_custom, vecs_custom, report_custom = polfed(f_custom!, x0, howmany, target; produce_report=true)
display_report(report_mul)
display_report(report_custom)The speedup trend should be similar to the mapping-kernel benchmark. If f_custom! is faster than mul!, POLFED usually benefits similarly because mapping dominates total work. Compare matrix multiplications and timing blocks in the two reports.
As discussed in Optimized Mapping and Reducing Memory Access, it is important to reduce memory access as much as possible. That is why it is beneficial to construct a rescaled mapping.
Emin = first(collect(Polfed.Lanczos.lanczos(f_custom!, x0, 1; which=:SR, maxdim=1000)[1]))
Emax = last(collect(Polfed.Lanczos.lanczos(f_custom!, x0, 1; which=:LR, maxdim=1000)[1]))
spread = (Emax - Emin) / 2
center = (Emax + Emin) / 2
diags_rescaled = @. (diags - center) / spread
offdiag_val_rescaled = offdiag_val * (1 / spread)
f_benchmark_rescaled! = mapvec_with_xxz!(diags_rescaled, offdiag_val_rescaled, flat, starts)
mapping_rescaled = MappingConfig(
f!_rescaled=f_benchmark_rescaled!,
Emin=Emin,
Emax=Emax,
)
vals_rescaled, vecs_rescaled, report_rescaled = polfed(f_custom!, x0, howmany, target; produce_report=true, mapping=mapping_rescaled)
display_report(report_rescaled)This gives another speedup by reducing runtime rescaling overhead. This is exactly what POLFED does for you when optimize_mapping=true is used, as demonstrated in Automatic Optimization.
This page was generated using Literate.jl.