arviz_base.references_to_dataset

arviz_base.references_to_dataset#

arviz_base.references_to_dataset(references, ds, sample_dims=None, ref_dim=None)[source]#

Generate an Dataset compabible with ds from references.

Cast common formats to provide references to a compatible Dataset. This function does not aim to be exhaustive, anything somewhat peculiar or complex will probably be better off building a Dataset manually instead.

Parameters:
referencesscalar or 1D array_like or dict or xarray.DataArray or xarray.Dataset

References to cast into a compatible dataset.

  • scalar inputs are interpreted as a reference line in each variable+coordinate not in sample_dims combination.

  • array-like inputs are interpreted as multiple reference lines in each variable+coordinate not in sample_dims combination. All subset having the same references and all references linked to every subset.

  • dict inputs are interpreted as array-like with each array matched to the variable corresponding to that dictionary key.

  • DataArray inputs are interpreted as an array-like if unnamed or as a single key dictionary if named.

  • Dataset inputs are returned as is but won’t raise an error.

dsxarray.Dataset

Dataset containing the data references should be compatible with.

sample_dimsiterable of hashable, optional

Sample dimensions in ds. The dimensions in the output will be the dimensions in ds minus sample_dims plus optionally a “ref_line_dim” for non-scalar references.

ref_dimstr or list, optional

Names for the new dimensions created during reference value broadcasting. Defaults to None. By default, “ref_dim” is added for 1D references and “ref_dim_x” for N-dimensional references when broadcasting over one or more variables.

Returns:
xarray.Dataset

A Dataset containing a subset of the variables, dimensions, and coordinate names from ds, with additional “ref_dim” dimensions added when multiple references are requested for one or more variables.

See also

xarray.Dataset

Dataset constructor

Examples

Generate a reference dataset with 0 compatible with the centered eight example data:

from arviz_base import load_arviz_data, references_to_dataset
idata = load_arviz_data("centered_eight")
references_to_dataset(0, idata.posterior.dataset)
<xarray.Dataset> Size: 608B
Dimensions:  (school: 8)
Coordinates:
    chain    int64 8B 0
    draw     int64 8B 0
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
Data variables:
    mu       int64 8B 0
    theta    (school) int64 64B 0 0 0 0 0 0 0 0
    tau      int64 8B 0
Attributes:
    created_at:                 2022-10-13T14:37:37.315398
    arviz_version:              0.13.0.dev0
    inference_library:          pymc
    inference_library_version:  4.2.2
    sampling_time:              7.480114936828613
    tuning_steps:               1000

Generate a reference dataset with different references for each variable:

references_to_dataset({"mu": -1, "tau": 1, "theta": 0}, idata.posterior.dataset)
<xarray.Dataset> Size: 600B
Dimensions:  (ref_dim: 1, school: 8)
Coordinates:
  * ref_dim  (ref_dim) int64 8B 0
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
Data variables:
    mu       (ref_dim) int64 8B -1
    theta    (school, ref_dim) int64 64B 0 0 0 0 0 0 0 0
    tau      (ref_dim) int64 8B 1

Or a similar case but with different number of references for each variable:

ref_ds = references_to_dataset(
    {"mu": [-1, 0, 1], "tau": [1, 10], "theta": 0},
    idata.posterior.dataset
)
ref_ds
<xarray.Dataset> Size: 776B
Dimensions:  (ref_dim: 3, school: 8)
Coordinates:
  * ref_dim  (ref_dim) int64 24B 0 1 2
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
Data variables:
    mu       (ref_dim) int64 24B -1 0 1
    theta    (school, ref_dim) float64 192B 0.0 nan nan 0.0 ... nan 0.0 nan nan
    tau      (ref_dim) float64 24B 1.0 10.0 nan

Once we have a compatible dataset, we can for example compute the probability of the samples being above the reference value(s):

(idata.posterior.dataset > ref_ds).mean()
<xarray.Dataset> Size: 24B
Dimensions:  ()
Data variables:
    mu       float64 8B 0.8893
    theta    float64 8B 0.2809
    tau      float64 8B 0.3377