Time-Series Clustering¶
Functions in pyflow_acdc.Time_series_clustering. Representative-period
clustering reduces long time-series inputs to a weighted set of scenarios for
multi_scenario_TEP(), multi_period_MS_TEP(),
and related TEP drivers.
Workflow guides: Transmission Expansion Planning (TEP and MS TEP) (TEP and MS TEP), Multi-period Transmission Expansion Planning (MP TEP and MP+MS TEP) (MP TEP and MP+MS TEP).
clustering_options¶
TEP functions accept a clustering_options dict, normally processed by
cluster_analysis(). To reload saved clusters without
re-running the algorithms, pass precomputed_clusters_path (see
Precomputed clusters and the MS TEP example in
pyflow_tests/doc_examples/tep/02_multi_scenario_tep.py).
Use with years_data="23,24" on NS_MTDC_2025 so time-series length matches the
JSON payload.
Key |
Role |
|---|---|
|
Number of representative periods (scenarios) passed to the clustering algorithm. |
|
TS types to include (e.g. |
|
List of price-zone names whose attached series are kept for market-linked
types ( |
|
Two-element list |
|
Three-element list |
|
e.g. |
|
JSON path; skips re-clustering when set (see
|
|
When |
The four keys highlighted in the doc example work together as a preprocessing
pipeline before the chosen cluster_algorithm runs: restrict which series
enter the feature matrix (time_series, central_market), optionally drop
high-CV or highly correlated columns (thresholds,
correlation_decisions), then form n_clusters representative periods.
Set print_details=True while tuning a case; set it False once options
are fixed.
Examples¶
Runnable scripts live in pyflow_tests/doc_examples/clustering/ and are executed
by test_docs_clustering.py.
Precomputed clusters¶
"""Docs: api/clustering.rst — Precomputed clusters"""
import pyflow_acdc as pyf
from pyflow_tests.test_constants import north_sea_ms_clustering_options
grid, _ = pyf.cases["NS_MTDC_2025"](years_data="23,24", expandable=False, online=False)
n_clusters, clustered = pyf.cluster_analysis(grid, north_sea_ms_clustering_options())
assert clustered is True
assert n_clusters == 4
assert 4 in grid.Clusters
Live clustering¶
"""Docs: api/clustering.rst — Live clustering"""
import pyflow_acdc as pyf
grid, _ = pyf.cases["NS_MTDC_2025"](years_data="24", expandable=False, online=False)
clustering_options = {
"n_clusters": 2,
"time_series": ["price", "Load"],
"central_market": [],
"thresholds": [0, 0.8],
"correlation_decisions": [False, "1", False],
"cluster_algorithm": "Kmeans",
"print_details": False,
}
n_clusters, clustered = pyf.cluster_analysis(grid, clustering_options)
assert clustered is True
assert n_clusters == 2
assert 2 in grid.Clusters
Exploratory sweep¶
"""Docs: api/clustering.rst — Exploratory clustering sweep"""
import tempfile
import pyflow_acdc as pyf
from pyflow_acdc.Time_series_clustering import run_clustering_analysis
grid, _ = pyf.cases["NS_MTDC_2025"](years_data="24", expandable=False, online=False)
with tempfile.TemporaryDirectory() as save_path:
results = run_clustering_analysis(
grid,
save_path=save_path,
algorithms=["kmeans", "kmeans_medoids", "kmedoids"],
n_clusters_list=[2, 4],
time_series=["price", "Load"],
print_details=False,
ts_options=[None, 0, 0.8],
correlation_decisions=[False, "1", False],
plotting=False,
identifier="doc_example",
)
assert len(results) >= 3
assert set(results["algorithm"]) >= {"kmeans", "kmeans_medoids", "kmedoids"}
Cluster analysis¶
- cluster_analysis(grid, clustering_options)[source]¶
Main entry used inside TEP when
clustering_optionsis passed.
- cluster_TS(grid, n_clusters, time_series=None, central_market=None, algorithm='kmeans', cv_threshold=0, correlation_threshold=0.8, print_details=False, correlation_decisions=None, critical_idx=None, base_critical_ratio=0.5, scaler_type='robust', forced_centers=None, **kwargs)[source]¶
Cluster time-series profiles into representative operating states.
Runs correlation-based reduction (
identify_correlations()) and then the selected clustering algorithm, optionally weighting a set of “critical” rows more heavily.- Parameters:
grid (Grid) – Grid whose time series are clustered.
n_clusters (int) – Number of representative states (clusters) to produce.
time_series (list, optional) – Time-series selection and central-market references.
central_market (list, optional) – Time-series selection and central-market references.
algorithm (str, optional) – One of
'kmeans','kmedoids','ward','pam_hierarchical'(default'kmeans').cv_threshold (float, optional) – Coefficient-of-variation and correlation thresholds for reduction.
correlation_threshold (float, optional) – Coefficient-of-variation and correlation thresholds for reduction.
critical_idx (list, optional) – Indices treated as critical (clustered separately).
base_critical_ratio (float or int, optional) – Fraction (or count) of clusters reserved for critical rows.
scaler_type (str, optional) – Scaler used before clustering (default
'robust').**kwargs – Extra algorithm-specific options, e.g.
random_state,n_init,max_iter(kmeans) ormethod,init,metric(kmedoids).
- identify_correlations(grid, time_series=None, correlation_threshold=0, cv_threshold=0, central_market=None, print_details=False, correlation_decisions=None)[source]¶
Identify highly correlated time series variables.
- Parameters:
grid – Grid object containing time series
correlation_threshold – Correlation coefficient threshold (default: 0.8)
cv_threshold – Minimum variance threshold (default: 0)
- Returns:
- Dictionary containing:
correlation_matrix: Full correlation matrix
high_correlations: List of tuples (var1, var2, corr_value) for highly correlated pairs
groups: List of groups of correlated variables
- Return type:
dict
Precomputed clusters¶
Exploratory analysis¶
See Exploratory sweep for a minimal sweep with
run_clustering_analysis().
- run_clustering_analysis(grid, save_path='clustering_results', algorithms=None, n_clusters_list=None, time_series=None, print_details=False, ts_options=None, correlation_decisions=None, plotting=False, plotting_options=None, identifier=None)[source]¶
Sweep clustering algorithms and cluster counts for exploratory analysis.
Runs
cluster_TS()for each(algorithm, n_clusters)pair, collects quality metrics, optionally saves representative-period plots, and writesclustering_summary_<identifier>.csvundersave_path.- Parameters:
grid (Grid) – Grid with attached time series.
save_path (str) – Output directory for CSV summaries and optional plots.
algorithms (list of str, optional) – Clustering methods passed to
cluster_TS()(default includeskmeans,kmedoids,ward,pam_hierarchical).n_clusters_list (list of int, optional) – Cluster counts to test (defaults to
DEFAULT_CLUSTER_NUMBERS).time_series (list, optional) – TS types to include (empty list keeps grid defaults).
print_details (bool) – Verbose clustering diagnostics from
cluster_TS().ts_options (list, optional) –
[central_market, cv_threshold, correlation_threshold]for filtering.correlation_decisions (list, optional) – Passed through to
identify_correlations().plotting (bool) – When
True, save time-series plots per sweep step.plotting_options (list, optional) –
[variable_name_or_None, file_extension]for plots.identifier (str, optional) – Suffix for output filenames.
- Returns:
One row per successful
(algorithm, n_clusters)run with timing and quality metrics.- Return type:
pandas.DataFrame
Sweeps clustering algorithms and cluster counts on the attached time series, records quality metrics (coefficient of variation, inertia, Davies–Bouldin), and writes
clustering_summary_<identifier>.csvundersave_path. Setplotting=Trueto save representative-period plots while sweeping. Use this to tuneclustering_optionsbefore calling TEP; production solves normally usecluster_analysis()inside the TEP drivers.