Distribution Module¶
Distribution
¶
Distribution(
name: str,
data: List[Any],
distribution_type: str,
dimensions: List[int],
x_categories: List[Union[int, float, str]],
x_label: str,
y_categories: Optional[List[Union[int, float, str]]],
y_label: str,
)
Represents a probability distribution or histogram and its metadata.
See histogram.py for Histogram1D and Histogram2D, which facilitate the computation of distributions from data.
Author: Roger B. Dannenberg
Parameters:
-
name(str) –The name of the distribution used for plot titles.
-
data(List[Any]) –The data points for the distribution.
-
distribution_type(str) –The type of distribution. Currently used values are described below. "weights" can mean either probabilities or raw counts.
- "pitch_class" - weights for 12 pitch classes, possibly weighted by duration.
- "interval" - weights for pitch intervals, possibly weighted by duration.
- "pitch_class_interval" - weights for pitch class intervals, possibly weighted by duration. (mod 12, not currently used; should there also be a pitch_class_size based on absolute interval size mod 12? And if so, should there be an interval_class computed from interval mod 12? And interval_size_class based on absolute interval size mod 12? Note that "interval" ignores intervals larger than one octave.)
- "duration" - weights for durations
- "interval_size" - weights for interval sizes, possibly weighted by duration.
- "interval_direction" - proportion of upward intervals for each interval size, possibly weighted by duration.
- "pitch_class_transition" - weights for pitch class transitions, possibly weighted by duration (2-dimensional distribution).
- "interval_transition" - weights for interval transitions, possibly weighted by duration (2-dimensional distribution).
- "duration_transition" - weights for duration transitions, possibly weighted by duration (2-dimensional distribution).
- "key_correlation" - weights for key correlations, generally correlation with 12 major key profiles followed by correlations with 12 minor key profiles.
- "symmetric_key_profile" - weights for symmetric key profiles. Key profiles themselves are distributions. Symmetric keys use the same 12 weights for all keys (e.g., Krumhansl-Kessler), simply rotated for each key.
- "asymmetric_key_profile" - weights for asymmetric key profiles, where each key has its own set of 12 weights (e.g., Bellman-Budge).
- "root_support_weights" - root support weights (see
amads.harmony.root_finding.parncutt)
This list is open-ended and is currently just informational. The value is not used for plotting or any other purpose.
-
dimensions(List[int]) –The dimensions of the distribution, e.g. [12] for a pitch class distribution or [25, 25] for an interval_transition (intervals are from -12 to +12 and include 0 for unison, intervals larger than one octave are ignored).
-
x_categories(List[Union[int, float, str]]) –The categories for the x-axis.
-
x_label(str) –The label for the x-axis.
-
y_categories(Optional[List[Union[int, float, str]]]) –The categories for the y-axis, if any, otherwise None.
-
y_label(str) –The label for the y-axis.
Attributes:
-
same as Parameters (above)–
Source code in amads/core/distribution.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
Functions¶
normalize
¶
normalize()
Convert weights or counts to a probability distribution that sums to 1.
Source code in amads/core/distribution.py
157 158 159 160 161 162 | |
plot
¶
plot(
color: Optional[str] = None,
option: Optional[str] = None,
show: bool = True,
fig: Optional[Figure] = None,
ax: Optional[Axes] = None,
) -> Figure
Virtual plot function for Distribution. Allows standalone plotting of a Distribution (when fig and ax are None), while providing enough extensibility to invoke this plot function or its overwritten variants for subplotting when fig and ax are provided as arguments.
Parameters:
-
color(Optional[str], default:None) –Plot color string specification. In this particular plot function, it is handled in 1-D distributions and ignored in 2-D distributions. None for default option (Distribution.DEFAULT_BAR_COLOR).
-
option(Optional[str], default:None) –Plot style string specification. In this particular plot function, only {"bar", "line"} are valid string arguments that will be handled in a 1-D distribution, while any argument is ignored in 2-D distributions. None for default option ("bar").
-
show(bool, default:True) –Whether to call
plt.show()at the end. -
fig(Figure, default:None) –Provide existing Figure to draw on; if omitted, a new figure is created.
-
ax(Axes, default:None) –Provide existing axes to draw on; if omitted, a new figure and axes are created.
Raises:
-
ValueError–A ValueError is raised if:
ax(axes) but notfig(Figure) is provideddimsis not 1 or 2
Notes
Behavior to this specific plot method:
- 1-D: bar (default) or line when kind is "line"
- 2-D: heatmap
Source code in amads/core/distribution.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
plot_multiple
classmethod
¶
plot_multiple(
dists: List[Distribution],
show: bool = True,
options: Optional[Union[str, List[str]]] = None,
colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]
Plot multiple distributions into a single Figure using vertically stacked subplots.
Returns:
-
Figure or None–A matplotlib Figure when at least one distribution is plotted; otherwise None when
distsis empty.
Parameters:
-
dists(list[Distribution]) –Distributions to plot. 2-D are rendered as heatmaps; 1-D below them.
-
show(bool, default:True) –Whether to call
plt.show()at the end. -
options(str | list[str] | None, default:None) –plot style per distribution (e.g. "bar" or "line"). If a single string is given, it is broadcast to all distributions. If None, defaults to "bar".
-
colors(str | list[str] | None, default:None) –color option per distribution. If a single string is given, it is broadcast to all 1-D distributions. If None, defaults to the single color Distribution.DEFAULT_BAR_COLOR.
Notes
- distributions are plotted in the same order they were presented in dists list
- as long as a Distribution or inherited class has a valid plot function implemented, the relevant plot will be added to the figure at the specified axes.
optionsandcolorsapply to all distributions- Although the original plot function is only limited to
optionandcolorbeing used in the 1-D case, it is not to say that a class inheriting Distribution won't leverage these arguments. - You can pass either a list (per-series) or a single string. When a single string is provided, it will be broadcast to all inputs. For example, kinds="line" makes all 1-D plots line charts.
Source code in amads/core/distribution.py
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 | |
plot_grouped_1d
classmethod
¶
plot_grouped_1d(
dists: List[Distribution],
show: bool = True,
options: Optional[Union[str, List[str]]] = None,
colors: Optional[Union[str, List[str]]] = None,
) -> Optional[Figure]
Overlay multiple 1-D distributions on a single axes.
This function draws all input 1-D distributions in one matplotlib
Axes so that each category (x bin) shows a "group" of values—one
per distribution. You can mix plotting styles using the kinds
argument (for example, some as bars and others as lines with
markers. Colors are controlled via the colors argument.
Parameters:
-
dists(list[Distribution]) –1-D distributions to compare in a single plot.
-
show(bool, default:True) –Whether to call
plt.show()at the end. -
options(str | list[str] | None, default:None) –Per-distribution plot style. Allowed values: "bar" or "line". You can provide a single string to apply to all series (broadcast), or a list with length
len(dists). If None, all series default to "bar". -
colors(str | list[str] | None, default:None) –Per-distribution color list. You can provide a single string to apply to all series (broadcast), or a list with length
len(dists). If None, a distinct default color palette is applied (rcParams cycle or the tab10 palette).
Returns:
-
Figure or None–A matplotlib Figure if any distributions are plotted; None when
distsis empty.
Constraints
- Only 1-D distributions are accepted. All inputs must have the same length (number of categories) so they can be grouped per category.
- The x/y labels and category names are taken from the first
distribution in
dists. Hence, this function does not support overlaying 1-D distributions with different categories and labels.
How this differs from plot_multiple
- plot_grouped_1d overlays all 1-D distributions on a single axes
to allow:
- per-category (bin-by-bin) comparison intuitive and compact for grouped bar graphs
- intuitive and compact gradient comparison for overlaid line graphs.
Since all distributions are plotted in a single plot, we can compare all plots within a single legend. - plot_multiple creates a vertical stack of subplots, one per distribution, while leveraging the plot attribute of each Distribution (and also supports 2-D heatmaps).
Source code in amads/core/distribution.py
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
Histogram Module¶
histogram
¶
Compute histograms and distributions.
This module provides Histogram1D and Histogram2D classes for computing one-dimensional and two-dimensional histograms, respectively. Histograms can be normalized to form probability distributions.
The bins attribute can be directly assigned to the data attribute of
the Distribution class in core.distribution.
Histogram bins can be specified either by their centers or their boundaries. When centers are provided:
- the number of centers gives the number of bins
- if ignore_extrema is False, the first and last bins are open-ended, counting all values below the first center and above the last center. boundaries can be computed from centers using either linear or logarithmic interpolation. If provided, boundaries can be of length len(centers) + 1, in which case the first and last values are ignored (since the bins are open-ended); otherwise, boundaries have length len(centers) - 1.
- if ignore_extrema is True, the first and last bins are closed, and values outside the bin boundaries are ignored. In this case, boundaries must be provided and have length len(centers) + 1.
When centers are not provided, boundaries must be provided:
- the number of bins is len(boundaries) - 1
- bin centers can be computed as arithmetic or geometric means of boundaries.
- if ignore_extrema is False, the upper and lower boundaries are ignored, making the first and last bins open-ended.
- if ignore_extrema is True, the first and last bins are closed, and values outside the bin boundaries are ignored.
Classes¶
Histogram1D
¶
Histogram1D(
bin_centers: Optional[list[float]] = None,
bin_boundaries: Optional[list[float]] = None,
interpolation: str = "linear",
ignore_extrema: bool = False,
initial_value: float = 0.0,
)
Class for computing one-dimensional histograms.
Parameters:
-
bin_centers(list of float, default:None) –Centers of the histogram bins.
-
bin_boundaries(list of float, default:None) –boundaries of the histogram bins.
-
interpolation(str, default:'linear') –Interpolation method for missing bin_centers or bin_boundaries. "linear" to use the average of neighboring values, "log" for geometric mean.
-
ignore_extrema(bool, default:False) –If True, values below the lowest bin edge and above the highest bin edge are ignored. If False, they are counted in the first and last bins, respectively. Default is False.
-
initial_value(float, default:0.0) –The initial bin values are all set to this value (default is 0). This avoids a divide-by-a-zero-total problem when normalizing bins that are all zero. This can also avoid zero-probability bins by giving all bins a non-zero "prior." The divide-by-zero problem is avoided in any case: When normalizing and all bins are zero, the bin values are left at zero.
Attributes:
-
bin_boundaries(list of float) –Boundaries of the histogram bins. If ignore_extrema is True, bin_boundaries has length len(bin_centers) + 1 and surround all bins. If ignore_extrema is False, bin_boundaries has length len(bin_centers) - 1 and the first and last bins are open-ended, so bin_boundaries are boundaries between bins only.
-
bin_centers(list of float) –Centers of the histogram bins (used for plot labels)
-
bins(list of float) –(weighted) counts or probability of data points in each bin.
-
ignore_extrema(bool) –If True, values outside the bin boundaries are ignored.
Source code in amads/core/histogram.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
Functions¶
find_bin
¶
find_bin(value: float)
find the bin index for a given value such that i indexes the next boundary above value. If the value is greater or equal to the highest boundary, len(bin_boundaries) is returned.
Source code in amads/core/histogram.py
210 211 212 213 214 215 216 217 218 219 220 | |
add_point
¶
add_point(data: float, weight: float = 1.0)
Record one count or weight update to the histogram
Parameters:
-
data(float) –value to be recorded in the histogram
-
weight(float, default:1.0) –weight to add to the appropriate bin (default is 1.0)
Returns:
-
Optional[int]–bin number where the data point was recorded or None if data was out of bounds
Source code in amads/core/histogram.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
normalize
¶
normalize()
Convert the histogram into a probability distribution. If all bins are zero, the resulting bins remain at zero.
Source code in amads/core/histogram.py
249 250 251 252 253 254 255 256 | |
Histogram2D
¶
Histogram2D(
bin_centers: Optional[list[float]] = None,
bin_boundaries: Optional[list[float]] = None,
interpolation: str = "linear",
ignore_extrema: bool = False,
initial_value: float = 0.0,
)
Bases: Histogram1D
Class for computing two-dimensional histograms.
Parameters:
-
bin_centers(list of float, default:None) –Centers of the histogram bins.
-
bin_boundaries(list of float, default:None) –boundaries of the histogram bins.
-
interpolation(str, default:'linear') –Interpolation method for missing bin_centers or bin_boundaries. "linear" to use the average of neighboring values, "log" for geometric mean.
-
ignore_extrema(bool, default:False) –If True, values below the lowest bin edge and above the highest bin edge are ignored. If False, they are counted in the first and last bins, respectively. Default is False.
Attributes:
-
bin_boundaries(list of float) –Boundaries of the histogram bins.
-
bin_centers(list of float) –Centers of the histogram bins (used for plot labels)
-
bins(list of float) –(weighted) counts or probability of data points in each bin.
-
ignore_extrema(bool) –If True, values outside the bin boundaries are ignored.
Source code in amads/core/histogram.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
Functions¶
find_bin
¶
find_bin(value: float)
find the bin index for a given value such that i indexes the next boundary above value. If the value is greater or equal to the highest boundary, len(bin_boundaries) is returned.
Source code in amads/core/histogram.py
210 211 212 213 214 215 216 217 218 219 220 | |
add_point
¶
add_point(data: float, weight: float = 1.0)
Record one count or weight update to the histogram
Parameters:
-
data(float) –value to be recorded in the histogram
-
weight(float, default:1.0) –weight to add to the appropriate bin (default is 1.0)
Returns:
-
Optional[int]–bin number where the data point was recorded or None if data was out of bounds
Source code in amads/core/histogram.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
add_point_2d
¶
add_point_2d(
data1: Optional[float],
data2: float,
weight: float = 1.0,
prev: Optional[int] = None,
)
Record one count or weight update to the histogram
A typical use is to record consecutive elements of a sequence as data1 along with the next element as data2. In this case, data2 will become data1 in the next call, so the returned bin index for data2 can be provided as prev in the next call to avoid re-compmuting the bin index for data1.
To further support this use case, if data1 is None, the histogram is not changed, but the bin index for data2 is still computed and returned. Thus, you can pass None for data1 and the first element as data2 to get things started.
If the histogram is not updated (because data1 is None or because data1 or data2 are out of bounds and ignore_extrema is True), None is returned, in which case data1 should be passed as None in the next call as if starting a new sequence.
Parameters:
-
data1(float) –value for dimension 1 (or None to skip)
-
data2(float) –value for dimension 2
-
weight(float, default:1.0) –weight to add to the appropriate bin (default is 1.0)
-
prev(Optional(int), default:None) –optional previous bin index for data1; if provided, this value is used instead of recomputing the bin index for data1.
Returns:
-
int–bin number for data2 if data were used to add to the histogram, else None (which means the bin must be calculated).
Source code in amads/core/histogram.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
Functions¶
boundaries_to_centers
¶
boundaries_to_centers(
boundaries: list[float], interpolation: str = "linear"
) -> list[float]
Convert bin boundaries to bin centers.
The lower and upper boundaries are only used to compute the centers of the bins in between, so the returned list has length len(boundaries) - 1, so the lower bin will count all values below boundaries[1], and the upper bin will count all values above boundaries[-2].
If interpolation is linear, the center between two boundaries x1 and x2 is (x1 + x2) / 2. If interpolation is 'log', the center is sqrt(x1 * x2).
Parameters:
-
boundaries(list[float]) –List of bin boundaries.
-
interpolation(str, default:'linear') –"linear" for arithmetic mean, "log" for geometric mean.
Returns:
-
list[float]–List of bin centers.
Source code in amads/core/histogram.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
centers_to_boundaries
¶
centers_to_boundaries(
centers: list[float], interpolation: str = "linear"
) -> list[float]
Convert bin centers to bin boundaries.
The returned list has length len(centers) - 1, with the first and last bins being open-ended.
To get a closed interval around upper or lower centers, simply add an additional center below or above and ignore the resulting values. In the case of a distribution, to truly throw out the outliers, you will need to extract the desired sub-vector or sub-matrix and re-normalize.
If interpolation is "linear", the boundary between two centers x1 and x2 is (x1 + x2) / 2. If interpolation is "log", the boundary is sqrt(x1 * x2).
Parameters:
-
centers(list[float]) –List of bin centers.
-
interpolation(str, default:'linear') –"linear" for arithmetic mean, "log" for geometric mean.
Returns:
-
list[float]–List of bin boundaries.
Source code in amads/core/histogram.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |