arviz_base.references_to_dataset#
- arviz_base.references_to_dataset(references, ds, sample_dims=None, ref_dim=None)[source]#
Generate an
Dataset
compabible with ds from references.Cast common formats to provide references to a compatible Dataset. This function does not aim to be exhaustive, anything somewhat peculiar or complex will probably be better off building a Dataset manually instead.
- Parameters:
- referencesscalar or 1D array_like or
dict
orxarray.DataArray
orxarray.Dataset
References to cast into a compatible dataset.
scalar inputs are interpreted as a reference line in each variable+coordinate not in sample_dims combination.
array-like inputs are interpreted as multiple reference lines in each variable+coordinate not in sample_dims combination. All subset having the same references and all references linked to every subset.
dict inputs are interpreted as array-like with each array matched to the variable corresponding to that dictionary key.
DataArray inputs are interpreted as an array-like if unnamed or as a single key dictionary if named.
Dataset inputs are returned as is but won’t raise an error.
- ds
xarray.Dataset
Dataset containing the data references should be compatible with.
- sample_dimsiterable of hashable, optional
Sample dimensions in ds. The dimensions in the output will be the dimensions in ds minus sample_dims plus optionally a “ref_line_dim” for non-scalar references.
- ref_dim
str
orlist
, optional Names for the new dimensions created during reference value broadcasting. Defaults to None. By default, “ref_dim” is added for 1D references and “ref_dim_x” for N-dimensional references when broadcasting over one or more variables.
- referencesscalar or 1D array_like or
- Returns:
xarray.Dataset
A Dataset containing a subset of the variables, dimensions, and coordinate names from ds, with additional “ref_dim” dimensions added when multiple references are requested for one or more variables.
See also
xarray.Dataset
Dataset constructor
Examples
Generate a reference dataset with 0 compatible with the centered eight example data:
from arviz_base import load_arviz_data, references_to_dataset idata = load_arviz_data("centered_eight") references_to_dataset(0, idata.posterior.dataset)
<xarray.Dataset> Size: 608B Dimensions: (school: 8) Coordinates: chain int64 8B 0 draw int64 8B 0 * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon' Data variables: mu int64 8B 0 theta (school) int64 64B 0 0 0 0 0 0 0 0 tau int64 8B 0 Attributes: created_at: 2022-10-13T14:37:37.315398 arviz_version: 0.13.0.dev0 inference_library: pymc inference_library_version: 4.2.2 sampling_time: 7.480114936828613 tuning_steps: 1000
Generate a reference dataset with different references for each variable:
references_to_dataset({"mu": -1, "tau": 1, "theta": 0}, idata.posterior.dataset)
<xarray.Dataset> Size: 600B Dimensions: (ref_dim: 1, school: 8) Coordinates: * ref_dim (ref_dim) int64 8B 0 * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon' Data variables: mu (ref_dim) int64 8B -1 theta (school, ref_dim) int64 64B 0 0 0 0 0 0 0 0 tau (ref_dim) int64 8B 1
Or a similar case but with different number of references for each variable:
ref_ds = references_to_dataset( {"mu": [-1, 0, 1], "tau": [1, 10], "theta": 0}, idata.posterior.dataset ) ref_ds
<xarray.Dataset> Size: 776B Dimensions: (ref_dim: 3, school: 8) Coordinates: * ref_dim (ref_dim) int64 24B 0 1 2 * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon' Data variables: mu (ref_dim) int64 24B -1 0 1 theta (school, ref_dim) float64 192B 0.0 nan nan 0.0 ... nan 0.0 nan nan tau (ref_dim) float64 24B 1.0 10.0 nan
Once we have a compatible dataset, we can for example compute the probability of the samples being above the reference value(s):
(idata.posterior.dataset > ref_ds).mean()
<xarray.Dataset> Size: 24B Dimensions: () Data variables: mu float64 8B 0.8893 theta float64 8B 0.2809 tau float64 8B 0.3377