Time-Series Clustering ====================== Functions in :mod:`pyflow_acdc.Time_series_clustering`. Representative-period clustering reduces long time-series inputs to a weighted set of scenarios for :func:`~pyflow_acdc.multi_scenario_TEP`, :func:`~pyflow_acdc.multi_period_MS_TEP`, and related TEP drivers. Workflow guides: :doc:`../usage_tep` (TEP and MS TEP), :doc:`../usage_mp_tep` (MP TEP and MP+MS TEP). ``clustering_options`` ---------------------- TEP functions accept a ``clustering_options`` dict, normally processed by :func:`~pyflow_acdc.cluster_analysis`. To reload saved clusters without re-running the algorithms, pass ``precomputed_clusters_path`` (see :ref:`clustering_example_precomputed` and the MS TEP example in :file:`pyflow_tests/doc_examples/tep/02_multi_scenario_tep.py`). Use with ``years_data="23,24"`` on ``NS_MTDC_2025`` so time-series length matches the JSON payload. .. list-table:: :widths: 28 52 :header-rows: 1 * - Key - Role * - ``n_clusters`` - Number of representative periods (scenarios) passed to the clustering algorithm. * - ``time_series`` - TS **types** to include (e.g. ``["price", "Load", "WPP"]``). Series on the grid whose ``type`` is not listed are dropped before clustering. * - ``central_market`` - List of price-zone names whose attached series are kept for market-linked types (``price``, ``Load``, ``PGL_min``, ``PGL_max``, ``a_CG``, ``b_CG``, ``c_CG``). An **empty list** (``[]``) keeps all zones in the grid — this is the usual choice on multi-zone cases such as ``NS_MTDC_2025``. To cluster only around selected hubs, pass e.g. ``["NL", "DE"]``; series tied to other zones are then excluded. * - ``thresholds`` - Two-element list ``[cv_threshold, correlation_threshold]`` used in :func:`~pyflow_acdc.filter_data` and :func:`~pyflow_acdc.identify_correlations``. With ``[0, 0.8]`` (typical): no CV pre-filtering (``0`` disables it), and pairs with ``|correlation| > 0.8`` form correlated groups. When ``cv_threshold > 0``, series whose coefficient of variation **exceeds** that value are removed before clustering. * - ``correlation_decisions`` - Three-element list ``[clean, method, scale_groups]`` for :func:`~pyflow_acdc.identify_correlations``. ``[True, 3, True]`` means: reduce redundant series in each correlated group (**clean**), use **method** ``3`` (PCA representative: keep the member most aligned with the group's first principal component), and **scale** the kept series by ``sqrt(group size)`` so merged information is not under-weighted. Set ``clean`` to ``False`` to skip correlation reduction. Methods ``1`` and ``2`` keep the highest-variance member or replace the group with a single PC1 column respectively (see :func:`~pyflow_acdc.identify_correlations`). * - ``cluster_algorithm`` - e.g. ``kmedoids``, ``kmeans_medoids``, ``Kmeans`` * - ``precomputed_clusters_path`` - JSON path; skips re-clustering when set (see :func:`~pyflow_acdc.load_precomputed_clusters_to_grid`) * - ``print_details`` - When ``True``, print filtering statistics (mean, std, CV per series), excluded columns, correlated groups, deduplication choices, and clustering diagnostics to stdout. Use ``False`` in batch tests to keep output quiet. The four keys highlighted in the doc example work together as a preprocessing pipeline before the chosen ``cluster_algorithm`` runs: restrict which series enter the feature matrix (``time_series``, ``central_market``), optionally drop high-CV or highly correlated columns (``thresholds``, ``correlation_decisions``), then form ``n_clusters`` representative periods. Set ``print_details=True`` while tuning a case; set it ``False`` once options are fixed. Examples -------- Runnable scripts live in ``pyflow_tests/doc_examples/clustering/`` and are executed by ``test_docs_clustering.py``. .. _clustering_example_precomputed: Precomputed clusters ~~~~~~~~~~~~~~~~~~~~ .. literalinclude:: ../../pyflow_tests/doc_examples/clustering/01_precomputed_clusters.py .. _clustering_example_live: Live clustering ~~~~~~~~~~~~~~~ .. literalinclude:: ../../pyflow_tests/doc_examples/clustering/02_live_clustering.py .. _clustering_example_exploratory: Exploratory sweep ~~~~~~~~~~~~~~~~~ .. literalinclude:: ../../pyflow_tests/doc_examples/clustering/03_exploratory_clustering.py Cluster analysis ---------------- .. autofunction:: pyflow_acdc.cluster_analysis Main entry used inside TEP when ``clustering_options`` is passed. .. autofunction:: pyflow_acdc.cluster_TS .. autofunction:: pyflow_acdc.identify_correlations Precomputed clusters -------------------- .. autofunction:: pyflow_acdc.load_precomputed_clusters_to_grid Exploratory analysis -------------------- See :ref:`clustering_example_exploratory` for a minimal sweep with :func:`~pyflow_acdc.run_clustering_analysis`. .. autofunction:: pyflow_acdc.run_clustering_analysis Sweeps clustering algorithms and cluster counts on the attached time series, records quality metrics (coefficient of variation, inertia, Davies–Bouldin), and writes ``clustering_summary_.csv`` under ``save_path``. Set ``plotting=True`` to save representative-period plots while sweeping. Use this to tune ``clustering_options`` before calling TEP; production solves normally use :func:`~pyflow_acdc.cluster_analysis` inside the TEP drivers. .. autofunction:: pyflow_acdc.run_clustering_analysis_and_plot