Skip to content

Distribution Module

Distribution

Distribution(
    name: str,
    data: List[Any],
    distribution_type: str,
    dimensions: List[int],
    x_categories: List[Union[int, float, str]],
    x_label: str,
    y_categories: Optional[List[Union[int, float, str]]],
    y_label: str,
)

Represents a probability distribution or histogram and its metadata.

See histogram.py for Histogram1D and Histogram2D, which facilitate the computation of distributions from data.

Author: Roger B. Dannenberg

Parameters:

  • name (str) –

    The name of the distribution used for plot titles.

  • data (List[Any]) –

    The data points for the distribution.

  • distribution_type (str) –

    The type of distribution. Currently used values are described below. "weights" can mean either probabilities or raw counts.

    • "pitch_class" - weights for 12 pitch classes, possibly weighted by duration.
    • "interval" - weights for pitch intervals, possibly weighted by duration.
    • "pitch_class_interval" - weights for pitch class intervals, possibly weighted by duration. (mod 12, not currently used; should there also be a pitch_class_size based on absolute interval size mod 12? And if so, should there be an interval_class computed from interval mod 12? And interval_size_class based on absolute interval size mod 12? Note that "interval" ignores intervals larger than one octave.)
    • "duration" - weights for durations
    • "interval_size" - weights for interval sizes, possibly weighted by duration.
    • "interval_direction" - proportion of upward intervals for each interval size, possibly weighted by duration.
    • "pitch_class_transition" - weights for pitch class transitions, possibly weighted by duration (2-dimensional distribution).
    • "interval_transition" - weights for interval transitions, possibly weighted by duration (2-dimensional distribution).
    • "duration_transition" - weights for duration transitions, possibly weighted by duration (2-dimensional distribution).
    • "key_correlation" - weights for key correlations, generally correlation with 12 major key profiles followed by correlations with 12 minor key profiles.
    • "symmetric_key_profile" - weights for symmetric key profiles. Key profiles themselves are distributions. Symmetric keys use the same 12 weights for all keys (e.g., Krumhansl-Kessler), simply rotated for each key.
    • "asymmetric_key_profile" - weights for asymmetric key profiles, where each key has its own set of 12 weights (e.g., Bellman-Budge).
    • "root_support_weights" - root support weights (see amads.harmony.root_finding.parncutt)

    This list is open-ended and is currently just informational. The value is not used for plotting or any other purpose.

  • dimensions (List[int]) –

    The dimensions of the distribution, e.g. [12] for a pitch class distribution or [25, 25] for an interval_transition (intervals are from -12 to +12 and include 0 for unison, intervals larger than one octave are ignored).

  • x_categories (List[Union[int, float, str]]) –

    The categories for the x-axis.

  • x_label (str) –

    The label for the x-axis.

  • y_categories (Optional[List[Union[int, float, str]]]) –

    The categories for the y-axis, if any, otherwise None.

  • y_label (str) –

    The label for the y-axis.

Attributes:

  • same as Parameters (above)
Source code in amads/core/distribution.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def __init__(
    self,
    name: str,
    data: List[Any],
    distribution_type: str,
    dimensions: List[int],
    x_categories: List[Union[int, float, str]],
    x_label: str,
    y_categories: Optional[List[Union[int, float, str]]],
    y_label: str,
):
    """
    Initialize a Distribution instance.
    """
    self.name = name
    self.data = data
    self.distribution_type = distribution_type
    self.dimensions = dimensions
    self.x_categories = x_categories
    self.x_label = x_label
    self.y_categories = y_categories
    self.y_label = y_label

Functions

normalize

normalize()

Convert weights or counts to a probability distribution that sums to 1.

Source code in amads/core/distribution.py
157
158
159
160
161
162
def normalize(self):
    """
    Convert weights or counts to a probability distribution that sums to 1.
    """
    self.data = normalize(self.data, "Sum").tolist()
    return self

plot

plot(
    color: Optional[str] = None,
    option: Optional[str] = None,
    show: bool = True,
    fig: Optional[Figure] = None,
    ax: Optional[Axes] = None,
) -> Figure

Virtual plot function for Distribution. Allows standalone plotting of a Distribution (when fig and ax are None), while providing enough extensibility to invoke this plot function or its overwritten variants for subplotting when fig and ax are provided as arguments.

Parameters:

  • color (Optional[str], default: None ) –

    Plot color string specification. In this particular plot function, it is handled in 1-D distributions and ignored in 2-D distributions. None for default option (Distribution.DEFAULT_BAR_COLOR).

  • option (Optional[str], default: None ) –

    Plot style string specification. In this particular plot function, only {"bar", "line"} are valid string arguments that will be handled in a 1-D distribution, while any argument is ignored in 2-D distributions. None for default option ("bar").

  • show (bool, default: True ) –

    Whether to call plt.show() at the end.

  • fig (Figure, default: None ) –

    Provide existing Figure to draw on; if omitted, a new figure is created.

  • ax (Axes, default: None ) –

    Provide existing axes to draw on; if omitted, a new figure and axes are created.

Raises:

  • ValueError

    A ValueError is raised if:

    • ax (axes) but not fig (Figure) is provided
    • dims is not 1 or 2
Notes

Behavior to this specific plot method:

  • 1-D: bar (default) or line when kind is "line"
  • 2-D: heatmap
Source code in amads/core/distribution.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def plot(
    self,
    color: Optional[str] = None,
    option: Optional[str] = None,
    show: bool = True,
    fig: Optional[Figure] = None,
    ax: Optional[Axes] = None,
) -> Figure:
    """
    Virtual plot function for Distribution.
    Allows standalone plotting of a Distribution (when fig and ax are None),
    while providing enough extensibility to invoke this plot function or its
    overwritten variants for subplotting when fig and ax are provided as
    arguments.

    Parameters
    ----------
    color : Optional[str]
        Plot color string specification. In this particular plot function,
        it is handled in 1-D distributions and ignored in 2-D distributions.
        None for default option (Distribution.DEFAULT_BAR_COLOR).
    option : Optional[str]
        Plot style string specification. In this particular plot function,
        only {"bar", "line"} are valid string arguments that will be handled
        in a 1-D distribution, while any argument is ignored in 2-D
        distributions. None for default option ("bar").
    show : bool
        Whether to call ``plt.show()`` at the end.
    fig : Figure
        Provide existing Figure to draw on; if omitted, a new
        figure is created.
    ax : Axes
        Provide existing axes to draw on; if omitted, a new
        figure and axes are created.

    Raises
    ------
    ValueError
        A ValueError is raised if:

        - `ax` (axes) but not `fig` (Figure) is provided
        - `dims` is not 1 or 2

    Notes
    -----
    Behavior to this specific plot method:

    - 1-D: bar (default) or line when kind is "line"
    - 2-D: heatmap
    """
    dims = len(self.dimensions)
    if dims not in (1, 2):
        raise ValueError(
            "Unsupported number of dimensions for Distribution class"
        )

    # Figure/axes handling: either both `fig` and `ax` are provided, or
    # neither; in the latter case, create a new figure/axes pair.
    if fig is None:
        if ax is not None:
            raise ValueError("invalid figure/axis combination")
        fig, ax = plt.subplots()
    else:
        if ax is None:
            raise ValueError("invalid figure/axis combination")

    if dims == 1:
        if color is None:
            color = Distribution.DEFAULT_BAR_COLOR
        if option is None:
            option = "bar"
        x = range(len(self.x_categories))
        # 1-D distributions: draw either a bar chart or a line chart.
        if option == "bar":
            ax.bar(x, self.data, color=color)
        elif option == "line":
            ax.plot(x, self.data, color=color, marker="o")
        else:
            raise ValueError(f"unknown kind for 1D plot: {option}")

        ax.set_xticks(list(x))
        ax.set_xticklabels([str(label) for label in self.x_categories])
        ax.set_xlabel(self.x_label)
        ax.set_ylabel(self.y_label)
        ax.set_title(self.name)

    else:  # dims == 2
        # 2-D distributions: render as a heatmap with a colorbar.
        data = np.asarray(self.data)
        cax = ax.imshow(
            data, cmap="gray_r", aspect="auto", interpolation="nearest"
        )
        fig.colorbar(cax, ax=ax, label="Proportion")

        ax.set_xlabel(self.x_label)
        ax.set_ylabel(self.y_label)
        ax.set_title(self.name)

        ax.set_xticks(range(len(self.x_categories)))
        ax.set_xticklabels([str(label) for label in self.x_categories])
        if self.y_categories is not None:
            ax.set_yticks(range(len(self.y_categories)))
            ax.set_yticklabels([str(label) for label in self.y_categories])

        ax.invert_yaxis()

    fig.tight_layout()
    if show:
        plt.show()
    return fig

plot_multiple classmethod

plot_multiple(
    dists: List[Distribution],
    show: bool = True,
    options: Optional[Union[str, List[str]]] = None,
    colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]

Plot multiple distributions into a single Figure using vertically stacked subplots.

Returns:

  • Figure or None

    A matplotlib Figure when at least one distribution is plotted; otherwise None when dists is empty.

Parameters:

  • dists (list[Distribution]) –

    Distributions to plot. 2-D are rendered as heatmaps; 1-D below them.

  • show (bool, default: True ) –

    Whether to call plt.show() at the end.

  • options (str | list[str] | None, default: None ) –

    plot style per distribution (e.g. "bar" or "line"). If a single string is given, it is broadcast to all distributions. If None, defaults to "bar".

  • colors (str | list[str] | None, default: None ) –

    color option per distribution. If a single string is given, it is broadcast to all 1-D distributions. If None, defaults to the single color Distribution.DEFAULT_BAR_COLOR.

Notes
  • distributions are plotted in the same order they were presented in dists list
  • as long as a Distribution or inherited class has a valid plot function implemented, the relevant plot will be added to the figure at the specified axes.
  • options and colors apply to all distributions
  • Although the original plot function is only limited to option and color being used in the 1-D case, it is not to say that a class inheriting Distribution won't leverage these arguments.
  • You can pass either a list (per-series) or a single string. When a single string is provided, it will be broadcast to all inputs. For example, kinds="line" makes all 1-D plots line charts.
Source code in amads/core/distribution.py
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
@classmethod
def plot_multiple(
    cls,
    dists: List["Distribution"],
    show: bool = True,
    options: Optional[Union[str, List[str]]] = None,
    colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]:
    """
    Plot multiple distributions into a single Figure using vertically
    stacked subplots.

    Returns
    -------
    Figure or None
        A matplotlib Figure when at least one distribution is plotted;
        otherwise None when `dists` is empty.

    Parameters
    ----------
    dists : list[Distribution]
        Distributions to plot. 2-D are rendered as heatmaps; 1-D below them.
    show : bool
        Whether to call ``plt.show()`` at the end.
    options : str | list[str] | None
        plot style per distribution (e.g. "bar" or "line"). If a single
        string is given, it is broadcast to all distributions. If None,
        defaults to "bar".
    colors : str | list[str] | None
        color option per distribution. If a single string is given, it is
        broadcast to all 1-D distributions. If None, defaults to
        the single color Distribution.DEFAULT_BAR_COLOR.

    Notes
    -----
    - distributions are plotted in the same order they were presented in
      dists list
    - as long as a Distribution or inherited class has a valid plot function
      implemented, the relevant plot will be added to the figure at the
      specified axes.
    - `options` and `colors` apply to all distributions
    - Although the original plot function is only limited to
      `option` and `color` being used in the 1-D case, it is not to say
      that a class inheriting Distribution won't leverage these arguments.
    - You can pass either a list (per-series) or a single string. When a
      single string is provided, it will be broadcast to all inputs.
      For example, kinds="line" makes all 1-D plots line charts.
    """
    if not dists:
        return None

    # when single string, broadcast to all distributions
    options = options or ["bar"] * len(dists)
    colors = colors or [Distribution.DEFAULT_BAR_COLOR] * len(dists)
    if isinstance(options, str):
        options = [options] * len(dists)
    if isinstance(colors, str):
        colors = [colors] * len(dists)
    if len(options) != len(dists) or len(colors) != len(dists):
        raise ValueError(
            "kinds/colors must match number of distributions in list case"
        )

    # Create a vertical stack of subplots sized to total count
    fig, axes = plt.subplots(len(dists), 1, squeeze=False)
    axes = axes.ravel()
    # use an axes iterator here
    ax_iter = iter(axes)
    for d, k, c in zip(dists, options, colors):
        ax = next(ax_iter)
        d.plot(color=c, option=k, show=False, fig=fig, ax=ax)

    fig.tight_layout()
    if show:
        plt.show()
    return fig

plot_grouped_1d classmethod

plot_grouped_1d(
    dists: List[Distribution],
    show: bool = True,
    options: Optional[Union[str, List[str]]] = None,
    colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]

Overlay multiple 1-D distributions on a single axes.

This function draws all input 1-D distributions in one matplotlib Axes so that each category (x bin) shows a "group" of values—one per distribution. You can mix plotting styles using the kinds argument (for example, some as bars and others as lines with markers. Colors are controlled via the colors argument.

Parameters:

  • dists (list[Distribution]) –

    1-D distributions to compare in a single plot.

  • show (bool, default: True ) –

    Whether to call plt.show() at the end.

  • options (str | list[str] | None, default: None ) –

    Per-distribution plot style. Allowed values: "bar" or "line". You can provide a single string to apply to all series (broadcast), or a list with length len(dists). If None, all series default to "bar".

  • colors (str | list[str] | None, default: None ) –

    Per-distribution color list. You can provide a single string to apply to all series (broadcast), or a list with length len(dists). If None, a distinct default color palette is applied (rcParams cycle or the tab10 palette).

Returns:

  • Figure or None

    A matplotlib Figure if any distributions are plotted; None when dists is empty.

Constraints
  • Only 1-D distributions are accepted. All inputs must have the same length (number of categories) so they can be grouped per category.
  • The x/y labels and category names are taken from the first distribution in dists. Hence, this function does not support overlaying 1-D distributions with different categories and labels.
How this differs from plot_multiple
  • plot_grouped_1d overlays all 1-D distributions on a single axes to allow:
    1. per-category (bin-by-bin) comparison intuitive and compact for grouped bar graphs
    2. intuitive and compact gradient comparison for overlaid line graphs.

Since all distributions are plotted in a single plot, we can compare all plots within a single legend. - plot_multiple creates a vertical stack of subplots, one per distribution, while leveraging the plot attribute of each Distribution (and also supports 2-D heatmaps).

Source code in amads/core/distribution.py
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
@classmethod
def plot_grouped_1d(
    cls,
    dists: List["Distribution"],
    show: bool = True,
    options: Optional[Union[str, List[str]]] = None,
    colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]:
    """Overlay multiple 1-D distributions on a single axes.

    This function draws all input 1-D distributions in one matplotlib
    Axes so that each category (x bin) shows a "group" of values—one
    per distribution. You can mix plotting styles using the `kinds`
    argument (for example, some as bars and others as lines with
    markers. Colors are controlled via the `colors` argument.

    Parameters
    ----------
    dists : list[Distribution]
        1-D distributions to compare in a single plot.
    show : bool
        Whether to call ``plt.show()`` at the end.
    options : str | list[str] | None
        Per-distribution plot style. Allowed values: "bar" or "line".
        You can provide a single string to apply to all series (broadcast),
        or a list with length `len(dists)`. If None, all series default to
        "bar".
    colors : str | list[str] | None
        Per-distribution color list. You can provide a single string to
        apply to all series (broadcast), or a list with length `len(dists)`.
        If None, a distinct default color palette is applied (rcParams cycle
        or the tab10 palette).

    Returns
    -------
    Figure or None
        A matplotlib Figure if any distributions are plotted; None when
        `dists` is empty.

    Constraints
    -----------
    - Only 1-D distributions are accepted. All inputs must have the same
      length (number of categories) so they can be grouped per category.
    - The x/y labels and category names are taken from the first
      distribution in `dists`. Hence, this function does not support
      overlaying 1-D distributions with different categories and labels.

    How this differs from plot_multiple
    -----------------------------------
    - plot_grouped_1d overlays all 1-D distributions on a single axes
      to allow:
        1. per-category (bin-by-bin) comparison intuitive and compact
           for grouped bar graphs
        2. intuitive and compact gradient comparison for overlaid line
           graphs.

      Since all distributions are plotted in a single plot, we can
      compare all plots within a single legend.
    - plot_multiple creates a vertical stack of subplots, one per
      distribution, while leveraging the plot attribute of each
      Distribution (and also supports 2-D heatmaps).
    """
    # Validate inputs
    if not dists:
        return None
    if any(len(d.dimensions) != 1 for d in dists):
        raise ValueError(
            "All distributions must be 1-D for grouped plotting"
        )
    # number of categories for each plot in the 1d distribution
    dimension = dists[0].dimensions[0]
    if any(d.dimensions[0] != dimension for d in dists):
        raise ValueError("All 1-D distributions must have the same length")
    # labels and categories will need to be the same...
    # or else some of the data visualization for axes will be misleading
    # since this function does not support plotting multiple axes labels
    # and categories on the same plot
    if any(
        d.x_label != dists[0].x_label or d.y_label != dists[0].y_label
        for d in dists
    ):
        raise ValueError("All 1-D distributions must have same axes labels")
    if any(
        d.x_categories != dists[0].x_categories
        or d.y_categories != dists[0].y_categories
        for d in dists
    ):
        raise ValueError(
            "All 1-D distributions must have same axes categories"
        )

    # when single string, broadcast to all
    if isinstance(options, str):
        options = [options] * len(dists)
    if isinstance(colors, str):
        colors = [colors] * len(dists)
    if options is None:
        options = ["bar"] * len(dists)
    if colors is None:
        # get the default ListedColormap; get_cmap does not always
        # return an object with .colors, so we have to ignore the type:
        base_colors = plt.get_cmap("tab10").colors  # type: ignore
        colors = [
            base_colors[i % len(base_colors)] for i in range(len(dists))
        ]
    if len(options) != len(dists) or len(colors) != len(dists):
        raise ValueError(
            "kinds and colors must match number of distributions"
        )

    bar_graph_info = None
    line_graph_info = None
    # partition bar graphs and line graphs to be plotted separately
    # (so that line graphs don't each take up a bin themselves)
    if isinstance(options, list):
        bar_graph_info = [
            (dist, color)
            for dist, kind, color in zip(dists, options, colors)
            if kind == "bar"
        ]
        line_graph_info = [
            (dist, color)
            for dist, kind, color in zip(dists, options, colors)
            if kind in ("line", "plot")
        ]

    fig, ax = plt.subplots()

    # Grouped bar arithmetic (unit bar width, grouped per category)
    # must have at least 1 bin for the line plot to be valid
    n = max(len(bar_graph_info), 1)
    # bar_width does not matter here, since everything in the grouped bar
    # graph is scaled according to this variable
    bar_width = 1
    x_coords = np.arange(dimension) * bar_width * n
    bottom_half, upper_half = n // 2, n - n // 2
    width_idxes = range(-bottom_half, upper_half + 1)
    is_even_offset = ((n + 1) % 2) * bar_width / 2

    # setting plot axes
    ax.set_xticks(x_coords)
    ax.set_xticklabels([str(d) for d in dists[0].x_categories])
    ax.set_xlabel(dists[0].x_label)
    ax.set_ylabel(dists[0].y_label)
    ax.set_title("Grouped Histogram Plot for 1-D Distributions")

    for width_idx, (dist, color) in zip(width_idxes, bar_graph_info):
        x_axis = x_coords + width_idx * bar_width + is_even_offset
        ax.bar(
            x_axis, dist.data, width=bar_width, label=dist.name, color=color
        )

    for dist, color in line_graph_info:
        ax.plot(
            x_coords, dist.data, color=color, marker="o", label=dist.name
        )

    ax.legend()
    fig.tight_layout()
    if show:
        plt.show()
    return fig

Histogram Module

histogram

Compute histograms and distributions.

This module provides Histogram1D and Histogram2D classes for computing one-dimensional and two-dimensional histograms, respectively. Histograms can be normalized to form probability distributions.

The bins attribute can be directly assigned to the data attribute of the Distribution class in core.distribution.

Histogram bins can be specified either by their centers or their boundaries. When centers are provided:

  • the number of centers gives the number of bins
  • if ignore_extrema is False, the first and last bins are open-ended, counting all values below the first center and above the last center. boundaries can be computed from centers using either linear or logarithmic interpolation. If provided, boundaries can be of length len(centers) + 1, in which case the first and last values are ignored (since the bins are open-ended); otherwise, boundaries have length len(centers) - 1.
  • if ignore_extrema is True, the first and last bins are closed, and values outside the bin boundaries are ignored. In this case, boundaries must be provided and have length len(centers) + 1.

When centers are not provided, boundaries must be provided:

  • the number of bins is len(boundaries) - 1
  • bin centers can be computed as arithmetic or geometric means of boundaries.
  • if ignore_extrema is False, the upper and lower boundaries are ignored, making the first and last bins open-ended.
  • if ignore_extrema is True, the first and last bins are closed, and values outside the bin boundaries are ignored.

Classes

Histogram1D

Histogram1D(
    bin_centers: Optional[list[float]] = None,
    bin_boundaries: Optional[list[float]] = None,
    interpolation: str = "linear",
    ignore_extrema: bool = False,
    initial_value: float = 0.0,
)

Class for computing one-dimensional histograms.

Parameters:

  • bin_centers (list of float, default: None ) –

    Centers of the histogram bins.

  • bin_boundaries (list of float, default: None ) –

    boundaries of the histogram bins.

  • interpolation (str, default: 'linear' ) –

    Interpolation method for missing bin_centers or bin_boundaries. "linear" to use the average of neighboring values, "log" for geometric mean.

  • ignore_extrema (bool, default: False ) –

    If True, values below the lowest bin edge and above the highest bin edge are ignored. If False, they are counted in the first and last bins, respectively. Default is False.

  • initial_value (float, default: 0.0 ) –

    The initial bin values are all set to this value (default is 0). This avoids a divide-by-a-zero-total problem when normalizing bins that are all zero. This can also avoid zero-probability bins by giving all bins a non-zero "prior." The divide-by-zero problem is avoided in any case: When normalizing and all bins are zero, the bin values are left at zero.

Attributes:

  • bin_boundaries (list of float) –

    Boundaries of the histogram bins. If ignore_extrema is True, bin_boundaries has length len(bin_centers) + 1 and surround all bins. If ignore_extrema is False, bin_boundaries has length len(bin_centers) - 1 and the first and last bins are open-ended, so bin_boundaries are boundaries between bins only.

  • bin_centers (list of float) –

    Centers of the histogram bins (used for plot labels)

  • bins (list of float) –

    (weighted) counts or probability of data points in each bin.

  • ignore_extrema (bool) –

    If True, values outside the bin boundaries are ignored.

Source code in amads/core/histogram.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
def __init__(
    self,
    bin_centers: Optional[list[float]] = None,
    bin_boundaries: Optional[list[float]] = None,
    interpolation: str = "linear",
    ignore_extrema: bool = False,
    initial_value: float = 0.0,
):
    if not bin_centers and not bin_boundaries:
        raise ValueError(
            "Must provide either bin_centers or " "bin_boundaries."
        )
    if not bin_boundaries:
        if ignore_extrema:
            raise ValueError(
                "When ignore_extrema is True, "
                "bin_boundaries must be provided."
            )
        centers = cast(list[float], bin_centers)
        bin_boundaries = centers_to_boundaries(centers, interpolation)
    elif bin_centers:  # we have both bin_boundaries and bin_centers
        blen = len(bin_boundaries)
        clen = len(bin_centers)
        if ignore_extrema:
            if blen != clen + 1:
                raise ValueError(
                    "When ignore_extrema is False, "
                    "len(bin_boundaries) must be len(bin_centers) + 1"
                )
        elif blen == clen + 1:  # allowed, but we trim the boundaries:
            bin_boundaries = bin_boundaries[1:-1]
        elif blen != clen - 1:
            raise ValueError(
                "When ignore_extrema is False, "
                "len(bin_boundaries) must be len(bin_centers) + 1 "
                "or len(bin_centers) - 1"
            )
    if not bin_centers:
        bin_centers = boundaries_to_centers(bin_boundaries, interpolation)
        if ignore_extrema:
            bin_boundaries = bin_boundaries[1:-1]

    # now, we need len(bin_boundaries) to respect ignore_extrema
    blen = len(bin_boundaries)
    clen = len(bin_centers)
    assert (not ignore_extrema and (blen == clen - 1)) or (
        (ignore_extrema) and (blen == clen + 1)
    )

    self.ignore_extrema = ignore_extrema
    self.bins = [initial_value] * len(bin_centers)
    self.bin_centers = bin_centers
    self.bin_boundaries = bin_boundaries
Functions
find_bin
find_bin(value: float)

find the bin index for a given value such that i indexes the next boundary above value. If the value is greater or equal to the highest boundary, len(bin_boundaries) is returned.

Source code in amads/core/histogram.py
210
211
212
213
214
215
216
217
218
219
220
def find_bin(self, value: float):
    """
    find the bin index for a given value such that i indexes
    the next boundary above value. If the value is greater or
    equal to the highest boundary, len(bin_boundaries) is returned.
    """
    i = 0  # for the strange case of len(bin_boundaries) == 0
    for i in range(len(self.bin_boundaries)):
        if self.bin_boundaries[i] > value:
            return i
    return len(self.bin_boundaries)
add_point
add_point(data: float, weight: float = 1.0)

Record one count or weight update to the histogram

Parameters:

  • data (float) –

    value to be recorded in the histogram

  • weight (float, default: 1.0 ) –

    weight to add to the appropriate bin (default is 1.0)

Returns:

  • Optional[int]

    bin number where the data point was recorded or None if data was out of bounds

Source code in amads/core/histogram.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def add_point(self, data: float, weight: float = 1.0):
    """Record one count or weight update to the histogram
    Parameters
    ----------
    data : float
        value to be recorded in the histogram
    weight : float
        weight to add to the appropriate bin (default is 1.0)

    Returns
    -------
    Optional[int]
        bin number where the data point was recorded or
        None if data was out of bounds
    """
    # prevent a Histogram2D from using this method:
    if isinstance(self.bins[0], list):
        raise ValueError("Histogram2D must use add_point_2d method")
    i = self.find_bin(data)
    if self.ignore_extrema:
        if i == 0 or i == len(self.bins):
            return None  # out of bounds
        else:
            i -= 1  # bin[0] corresponds to bounds[1:2]
    self.bins[i] += weight
    return i
normalize
normalize()

Convert the histogram into a probability distribution. If all bins are zero, the resulting bins remain at zero.

Source code in amads/core/histogram.py
249
250
251
252
253
254
255
256
def normalize(self):
    """
    Convert the histogram into a probability distribution.
    If all bins are zero, the resulting bins remain at zero.
    """
    total = sum(self.bins)
    if total > 0:
        self.bins = [b / total for b in self.bins]

Histogram2D

Histogram2D(
    bin_centers: Optional[list[float]] = None,
    bin_boundaries: Optional[list[float]] = None,
    interpolation: str = "linear",
    ignore_extrema: bool = False,
    initial_value: float = 0.0,
)

Bases: Histogram1D

Class for computing two-dimensional histograms.

Parameters:

  • bin_centers (list of float, default: None ) –

    Centers of the histogram bins.

  • bin_boundaries (list of float, default: None ) –

    boundaries of the histogram bins.

  • interpolation (str, default: 'linear' ) –

    Interpolation method for missing bin_centers or bin_boundaries. "linear" to use the average of neighboring values, "log" for geometric mean.

  • ignore_extrema (bool, default: False ) –

    If True, values below the lowest bin edge and above the highest bin edge are ignored. If False, they are counted in the first and last bins, respectively. Default is False.

Attributes:

  • bin_boundaries (list of float) –

    Boundaries of the histogram bins.

  • bin_centers (list of float) –

    Centers of the histogram bins (used for plot labels)

  • bins (list of float) –

    (weighted) counts or probability of data points in each bin.

  • ignore_extrema (bool) –

    If True, values outside the bin boundaries are ignored.

Source code in amads/core/histogram.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
def __init__(
    self,
    bin_centers: Optional[list[float]] = None,
    bin_boundaries: Optional[list[float]] = None,
    interpolation: str = "linear",
    ignore_extrema: bool = False,
    initial_value: float = 0.0,
):
    # Histogram1D takes care of the messy establishment of
    # centers and boundaries, which are the same for 2D:
    super().__init__(
        bin_centers, bin_boundaries, interpolation, ignore_extrema
    )
    # now we just have to fix bins to be 2D:
    self.bins = [
        [initial_value] * len(self.bins) for _ in range(len(self.bins))
    ]
Functions
find_bin
find_bin(value: float)

find the bin index for a given value such that i indexes the next boundary above value. If the value is greater or equal to the highest boundary, len(bin_boundaries) is returned.

Source code in amads/core/histogram.py
210
211
212
213
214
215
216
217
218
219
220
def find_bin(self, value: float):
    """
    find the bin index for a given value such that i indexes
    the next boundary above value. If the value is greater or
    equal to the highest boundary, len(bin_boundaries) is returned.
    """
    i = 0  # for the strange case of len(bin_boundaries) == 0
    for i in range(len(self.bin_boundaries)):
        if self.bin_boundaries[i] > value:
            return i
    return len(self.bin_boundaries)
add_point
add_point(data: float, weight: float = 1.0)

Record one count or weight update to the histogram

Parameters:

  • data (float) –

    value to be recorded in the histogram

  • weight (float, default: 1.0 ) –

    weight to add to the appropriate bin (default is 1.0)

Returns:

  • Optional[int]

    bin number where the data point was recorded or None if data was out of bounds

Source code in amads/core/histogram.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def add_point(self, data: float, weight: float = 1.0):
    """Record one count or weight update to the histogram
    Parameters
    ----------
    data : float
        value to be recorded in the histogram
    weight : float
        weight to add to the appropriate bin (default is 1.0)

    Returns
    -------
    Optional[int]
        bin number where the data point was recorded or
        None if data was out of bounds
    """
    # prevent a Histogram2D from using this method:
    if isinstance(self.bins[0], list):
        raise ValueError("Histogram2D must use add_point_2d method")
    i = self.find_bin(data)
    if self.ignore_extrema:
        if i == 0 or i == len(self.bins):
            return None  # out of bounds
        else:
            i -= 1  # bin[0] corresponds to bounds[1:2]
    self.bins[i] += weight
    return i
add_point_2d
add_point_2d(
    data1: Optional[float],
    data2: float,
    weight: float = 1.0,
    prev: Optional[int] = None,
)

Record one count or weight update to the histogram

A typical use is to record consecutive elements of a sequence as data1 along with the next element as data2. In this case, data2 will become data1 in the next call, so the returned bin index for data2 can be provided as prev in the next call to avoid re-compmuting the bin index for data1.

To further support this use case, if data1 is None, the histogram is not changed, but the bin index for data2 is still computed and returned. Thus, you can pass None for data1 and the first element as data2 to get things started.

If the histogram is not updated (because data1 is None or because data1 or data2 are out of bounds and ignore_extrema is True), None is returned, in which case data1 should be passed as None in the next call as if starting a new sequence.

Parameters:

  • data1 (float) –

    value for dimension 1 (or None to skip)

  • data2 (float) –

    value for dimension 2

  • weight (float, default: 1.0 ) –

    weight to add to the appropriate bin (default is 1.0)

  • prev (Optional(int), default: None ) –

    optional previous bin index for data1; if provided, this value is used instead of recomputing the bin index for data1.

Returns:

  • int

    bin number for data2 if data were used to add to the histogram, else None (which means the bin must be calculated).

Source code in amads/core/histogram.py
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
def add_point_2d(
    self,
    data1: Optional[float],
    data2: float,
    weight: float = 1.0,
    prev: Optional[int] = None,
):
    """Record one count or weight update to the histogram

    A typical use is to record consecutive elements of a sequence
    as data1 along with the next element as data2. In this case, data2
    will become data1 in the next call, so the returned bin index for
    data2 can be provided as prev in the next call to avoid re-compmuting
    the bin index for data1.

    To further support this use case, if data1 is None, the histogram
    is not changed, but the bin index for data2 is still computed and
    returned. Thus, you can pass None for data1 and the first element
    as data2 to get things started.

    If the histogram is not updated (because data1 is None or because
    data1 or data2 are out of bounds and ignore_extrema is True),
    None is returned, in which case data1 should be passed as None in
    the next call as if starting a new sequence.

    Parameters
    ----------
    data1 : float
        value for dimension 1 (or None to skip)
    data2 : float
        value for dimension 2
    weight : float
        weight to add to the appropriate bin (default is 1.0)
    prev : Optional(int)
        optional previous bin index for data1; if provided, this
        value is used instead of recomputing the bin index for data1.

    Returns
    -------
    int
        bin number for data2 if data were used to add to
        the histogram, else None (which means the bin must
        be calculated).
    """
    i = None  # index for data1
    if data1 is not None:
        if prev:
            i = prev
        else:
            i = self.find_bin(data1)
            if i == 0 or i == len(self.bins):
                if self.ignore_extrema:
                    return None  # out of bounds

    j = self.find_bin(data2)  # index for data2
    if j == 0 or j == len(self.bins) + 1:
        if self.ignore_extrema:
            return None

    if i is not None:
        self.bins[i][j] += weight
    return j

Functions

boundaries_to_centers

boundaries_to_centers(
    boundaries: list[float], interpolation: str = "linear"
) -> list[float]

Convert bin boundaries to bin centers.

The lower and upper boundaries are only used to compute the centers of the bins in between, so the returned list has length len(boundaries) - 1, so the lower bin will count all values below boundaries[1], and the upper bin will count all values above boundaries[-2].

If interpolation is linear, the center between two boundaries x1 and x2 is (x1 + x2) / 2. If interpolation is 'log', the center is sqrt(x1 * x2).

Parameters:

  • boundaries (list[float]) –

    List of bin boundaries.

  • interpolation (str, default: 'linear' ) –

    "linear" for arithmetic mean, "log" for geometric mean.

Returns:

  • list[float]

    List of bin centers.

Source code in amads/core/histogram.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def boundaries_to_centers(
    boundaries: list[float], interpolation: str = "linear"
) -> list[float]:
    """
    Convert bin boundaries to bin centers.

    The lower and upper boundaries are only used to compute the
    centers of the bins in between, so the returned list has length
    len(boundaries) - 1, so the lower bin will count all values
    below boundaries[1], and the upper bin will count all values
    above boundaries[-2].

    If interpolation is linear, the center between two boundaries x1
    and x2 is (x1 + x2) / 2. If interpolation is 'log', the center is
    sqrt(x1 * x2).

    Parameters
    ----------
    boundaries: list[float]
        List of bin boundaries.
    interpolation: str
        "linear" for arithmetic mean, "log" for geometric mean.

    Returns
    -------
    list[float]
        List of bin centers.
    """
    if interpolation == "linear":
        centers = [
            (boundaries[i] + boundaries[i + 1]) / 2
            for i in range(len(boundaries) - 1)
        ]
    elif interpolation == "log":
        centers = [
            math.sqrt(boundaries[i] * boundaries[i + 1])
            for i in range(len(boundaries) - 1)
        ]
    else:
        raise ValueError("interpolation must be 'linear' or 'log'")
    return centers

centers_to_boundaries

centers_to_boundaries(
    centers: list[float], interpolation: str = "linear"
) -> list[float]

Convert bin centers to bin boundaries.

The returned list has length len(centers) - 1, with the first and last bins being open-ended.

To get a closed interval around upper or lower centers, simply add an additional center below or above and ignore the resulting values. In the case of a distribution, to truly throw out the outliers, you will need to extract the desired sub-vector or sub-matrix and re-normalize.

If interpolation is "linear", the boundary between two centers x1 and x2 is (x1 + x2) / 2. If interpolation is "log", the boundary is sqrt(x1 * x2).

Parameters:

  • centers (list[float]) –

    List of bin centers.

  • interpolation (str, default: 'linear' ) –

    "linear" for arithmetic mean, "log" for geometric mean.

Returns:

  • list[float]

    List of bin boundaries.

Source code in amads/core/histogram.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def centers_to_boundaries(
    centers: list[float], interpolation: str = "linear"
) -> list[float]:
    """
    Convert bin centers to bin boundaries.

    The returned list has length len(centers) - 1, with the first and
    last bins being open-ended.

    To get a closed interval around upper or lower centers, simply add an
    additional center below or above and ignore the resulting values. In
    the case of a distribution, to truly throw out the outliers, you will
    need to extract the desired sub-vector or sub-matrix and re-normalize.

    If interpolation is "linear", the boundary between two centers
    x1 and x2 is (x1 + x2) / 2. If interpolation is "log", the boundary
    is sqrt(x1 * x2).

    Parameters
    ----------
    centers : list[float]
        List of bin centers.
    interpolation : str
        "linear" for arithmetic mean, "log" for geometric mean.

    Returns
    -------
    list[float]
        List of bin boundaries.
    """
    # strangely, this is the same function as boundaries_to_centers:
    return boundaries_to_centers(centers, interpolation)