General purpose modeling tools

“Tools” can include, inter alia:

  • Codes for retrieving data from specific data sources and adapting it for use with message_ix_models.

  • Codes for modifying scenarios; although tools for building models should go in message_ix_models.model.

On other pages:

On this page:

Exogenous data (tools.exo_data)

Generic tools for working with exogenous data sources.

MEASURES

Supported measures.

SOURCES

Known sources for data.

DemoSource(source, source_kw)

Example source of exogenous population and GDP data.

ExoDataSource(source, source_kw)

Base class for sources of exogenous data.

iamc_like_data_for_query(path, query, *[, ...])

Load data from path in IAMC-like format and transform to Quantity.

prepare_computer(context, c[, source, ...])

Prepare c to compute GDP, population, or other exogenous data.

register_source(cls)

Register ExoDataSource cls as a source of exogenous data.

class message_ix_models.tools.exo_data.DemoSource(source, source_kw)[source]

Example source of exogenous population and GDP data.

Parameters:
  • source (str) – Must be like test s1, where “s1” is a scenario ID from (“s0”…”s4”).

  • source_kw (dict) – Must contain an element “measure”, one of MEASURES.

id: str = 'DEMO'

Identifier for this particular source.

static random_data()[source]

Generate some random data with n, y, s, and v dimensions.

message_ix_models.tools.exo_data.MEASURES = ('GDP', 'POP')

Supported measures. Subclasses of ExoDataSource may provide support for other measures.

Todo

Store this in a separate code list or concept scheme.

message_ix_models.tools.exo_data.SOURCES: Dict[str, Type[ExoDataSource]] = {'ADVANCE': <class 'message_ix_models.project.advance.data.ADVANCE'>, 'DEMO': <class 'message_ix_models.tools.exo_data.DemoSource'>, 'GEA': <class 'message_ix_models.project.gea.data.GEA'>, 'GFEI': <class 'message_ix_models.tools.gfei.GFEI'>, 'IEA EEI': <class 'message_ix_models.tools.iea.eei.IEA_EEI'>, 'IEA_EWEB': <class 'message_ix_models.tools.iea.web.IEA_EWEB'>, 'SHAPE': <class 'message_ix_models.project.shape.data.SHAPE'>, 'SSP': <class 'message_ix_models.project.ssp.data.SSPOriginal'>, 'SSP update': <class 'message_ix_models.project.ssp.data.SSPUpdate'>}

Known sources for data. Use register_source() to add to this collection.

message_ix_models.tools.exo_data.iamc_like_data_for_query(path: Path, query: str, *, archive_member: str | None = None, drop: List[str] | None = None, non_iso_3166: Literal['keep', 'discard'] = 'discard', replace: dict | None = None, unique: str = 'MODEL SCENARIO VARIABLE UNIT', **kwargs) AttrSeries[source]

Load data from path in IAMC-like format and transform to Quantity.

The steps involved are:

  1. Read the data file; use pyarrow for better performance.

  2. Immediately apply query to reduce the data to be handled in subsequent steps.

  3. Assert that Model, Scenario, Variable, and Unit are unique; store the unique values. This means that query must result in data with unique values for these dimensions.

  4. Transform “Region” labels to ISO 3166-1 alpha-3 codes using iso_3166_alpha_3().

  5. Drop entire time series without such codes; for instance “World”.

  6. Transform to a pd.Series with “n” and “y” index levels; ensure the latter are int.

  7. Transform to Quantity with units.

The result is cached.

Parameters:
  • archive_member (bool, optional) – If given, path may be an archive with 2 or more members. The member named by archive_member is extracted and read.

  • non_iso_3166 (bool, optional) – If “discard” (default), “region” labels that are not ISO 3166-1 country names are discarded, along with associated data. If “keep”, such labels are kept.

Data returned by this function is cached using cached(); see also SKIP_CACHE.

message_ix_models.tools.exo_data.register_source(cls: Type[ExoDataSource]) Type[ExoDataSource][source]

Register ExoDataSource cls as a source of exogenous data.

message_ix_models.tools.exo_data.prepare_computer(context, c: Computer, source='test', source_kw: Mapping | None = None, *, strict: bool = True) Tuple[Key, ...][source]

Prepare c to compute GDP, population, or other exogenous data.

Check each ExoDataSource in SOURCES to determine whether it recognizes and can handle source and source_kw. If a source is identified, add tasks to c that retrieve and process data into a Quantity with, at least, dimensions \((n, y)\).

Parameters:
  • source (str) – Identifier of the source, possibly with other information to be handled by a ExoDataSource.

  • source_kw (dict, optional) –

    Keyword arguments for a Source class. These can include indexers, selectors, or other information needed by the source class to identify the data to be returned.

    If the key “measure” is present, it should be one of MEASURES.

  • strict (bool, optional) – Raise an exception if any of the keys to be added already exist.

Return type:

tuple of Key

Raises:

ValueError – if no source is registered which can handle source and source_kw.

The first returned key, like {measure}:n-y, triggers the following computations:

  1. Load data by invoking a ExoDataSource.

  2. Aggregate on the \(n\) (node) dimension according to Config.regions.

  3. Interpolate on the \(y\) (year) dimension according to Config.years.

Additional key(s) include:

  • {measure}:n-y:y0 indexed: same as {measure}:n-y, indexed to values as of \(y_0\) (the first model year).

See particular data source classes, like SSPOriginal, for particular examples of usage.

Todo

Extend to also prepare to compute values indexed to a particular \(n\).

class message_ix_models.tools.exo_data.ExoDataSource(source: str, source_kw: Mapping)[source]

Base class for sources of exogenous data.

abstract __call__() AttrSeries[source]

Return the data.

The Quantity returned by this method must have dimensions \((n, y) \cup \text{extra_dims}\). If the original/upstream/raw data has different dimensionality (fewer or more dimensions; different dimension IDs), the code must transform these, make appropriate selections, etc.

abstract __init__(source: str, source_kw: Mapping) None[source]

Handle source and source_kw.

An implementation must:

  • Raise ValueError if it does not recognize or cannot handle the arguments in source or source_kw.

  • Recognize and handle (if possible) a “measure” keyword in source_kw from MEASURES.

It may:

  • Transform these into other values, for instance by mapping certain values to others, applying regular expressions, or other operations.

  • Store those values as instance attributes for use in __call__().

  • Set name and/or extra_dims to control the behaviour of prepare_computer().

  • Log messages that give information that may help to debug a ValueError for source or source_kw that cannot be handled.

It should not actually load data or perform any time- or memory-intensive operations; these should only be triggered by __call__().

aggregate: bool = True

True if transform() should aggregate data on the \(n\) dimension.

extra_dims: Tuple[str, ...] = ()

Optional additional dimensions for the returned Key/Quantity. If not set by __init__(), the dimensions are \((n, y)\).

id: str = ''

Identifier for this particular source.

interpolate: bool = True

True if transform() should interpolate data on the \(y\) dimension.

name: str = ''

Optional name for the returned Key/Quantity. If not set by __init__(), then the “measure” keyword is used.

raise_on_extra_kw(kwargs) None[source]

Helper for subclasses to handle the source_kw argument.

  1. Store aggregate and interpolate, if they remain in kwargs.

  2. Raise ValueError if there are any other, unhandled keyword arguments in kwargs.

transform(c: Computer, base_key: Key) Key[source]

Prepare c to transform raw data from base_key.

base_key identifies the Quantity that is returned by __call__(). Before the data is returned, transform() allows the data source to add additional tasks or computations to c that further transform the data. (These operations may be done in __call__() directly, but transform() allows use of other genno operators and conveniences.)

The default implementation:

  1. If aggregate is True, aggregates the data ( genno.operator.aggregate()) on the \(n\) dimension using the key “n::groups”.

  2. If interpolate is True, interpolates the data ( genno.operator.interpolate()) on the \(y\) dimension using “y::coords”.

ADVANCE data (tools.advance)

Deprecated since version 2023.11: Use project.advance instead.

get_advance_data([query])

Return data from the ADVANCE Work Package 2 data snapshot at LOCATION.

advance_data(variable[, query])

Return a single ADVANCE data variable as a genno.Quantity.

message_ix_models.tools.advance.LOCATION = ('advance', 'advance_compare_20171018-134445.csv.zip')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

This is a location relative to a parent directory. The specific parent directory depends on whether message_data is available:

Without message_data:

The code finds the data within (4) Other, system-specific (“local”) directories (see discussion there for how to configure this location). Users should:

  1. Visit https://tntcat.iiasa.ac.at/ADVANCEWP2DB/dsd?Action=htmlpage&page=about and register for access to the data.

  2. Log in.

  3. Download the snapshot with the file name given in LOCATION to a subdirectory advance/ within their local data directory.

With message_data:

The code finds the data within (3) data/ directory in the message_data repository. The snapshot is stored directly in the repository using Git LFS.

Handle data from the ADVANCE project.

message_ix_models.tools.advance.DIMS = ['model', 'scenario', 'region', 'variable', 'unit', 'year']

Standard dimensions for data produced as snapshots from the IIASA ENE Program “WorkDB”.

message_ix_models.tools.advance._read_workdb_snapshot(path: Path, name: str) Series[source]

Read the data file.

The expected format is a ZIP archive at path containing a member at name in CSV format, with columns corresponding to DIMS, except for “year”, which is stored as column headers (‘wide’ format). (This corresponds to an older version of the “IAMC format,” without more recent additions intended to represent sub-annual time resolution using a separate column.)

Deprecated since version 2023.11: Use iamc_like_data_for_query() instead.

Data returned by this function is cached using cached(); see also SKIP_CACHE.

message_ix_models.tools.advance.advance_data(variable: str, query: str | None = None) AttrSeries[source]

Return a single ADVANCE data variable as a genno.Quantity.

Deprecated since version 2023.11: Use ADVANCE through exo_data.prepare_computer() instead.

Parameters:

query (str, optional) – Passed to get_advance_data().

Returns:

with the dimensions DIMS and name variable. If the units of the data for variable are consistent and parseable by pint, the returned Quantity has these units; otherwise units are discarded and the returned Quantity is dimensionless.

Return type:

genno.Quantity

message_ix_models.tools.advance.get_advance_data(query: str | None = None) Series[source]

Return data from the ADVANCE Work Package 2 data snapshot at LOCATION.

Deprecated since version 2023.11: Use ADVANCE through exo_data.prepare_computer() instead.

Parameters:

query (str, optional) – Passed to pandas.DataFrame.query() to limit the returned values.

Returns:

with a pandas.MultiIndex having the levels DIMS.

Return type:

pandas.Series

Data returned by this function is cached using cached(); see also SKIP_CACHE.

IAMC data structures (tools.iamc)

Tools for working with IAMC-structured data.

message_ix_models.tools.iamc.describe(data: DataFrame, extra: str | None = None) StructureMessage[source]

Generate SDMX structure information from data in IAMC format.

Parameters:
  • data – Data in “wide” or “long” IAMC format.

  • extra (str, optional) – Extra text added to the description of each Codelist.

Returns:

The message contains one Codelist for each of the MODEL, SCENARIO, REGION, VARIABLE, and UNIT dimensions. Codes for the VARIABLE code list have annotations with id="preferred-unit-measure" that give the corresponding UNIT Code(s) that appear with each VARIABLE.

Return type:

sdmx.message.StructureMessage

World Bank structures (tools.wb)

Tools for World Bank data.

message_ix_models.tools.wb.assign_income_groups(cl_node: sdmx.model.common.Codelist, cl_income_group: sdmx.model.common.Codelist, method: str = 'population', replace: Dict[str, str] | None = None) None[source]

Annotate cl_node with income groups.

Each node is assigned an Annotation with id="wb-income-group", according to the income groups of its children (countries), as reflected in cl_income_group (see get_income_group_codelist()).

Parameters:
  • method ("population" or "count") –

    Method for aggregation:

    • "population" (default): the WB World Development Indicators (WDI) 2020 population for each country is used as a weight, so that the node’s income group is the income group of the plurality of the population of its children.

    • "count": each country is weighted equally, so that the node’s income group is the mode (most frequently occurring value) of its childrens’.

  • replace (dict) – Mapping from wb-income-group annotation text appearing in cl_income_group to texts to be attached to cl_node. Mapping two keys to the same value effectively combines or aggregates those groups. See make_map().

Example

Annotate the R12 node list with income group information, mapping high income countries (HIC) and upper-middle income countries (UMC) into one group and aggregating by population.

>>> cl_node = get_codelist(f"node/R12")
>>> cl_ig = get_income_group_codelist()
>>> replace = make_map({"HIC": "HMIC", "UMC": "HMIC"})
>>> assign_income_groups(cl_node, cl_ig, replace=replace)
>>> cl_node["R12_NAM"].get_annotation(id="wb-income-group").text
HMIC
message_ix_models.tools.wb.fetch_codelist(id: str) sdmx.model.common.Codelist[source]

Retrieve code lists related to the WB World Development Indicators.

In principle this could be done with sdmx.Client("WB_WDI").codelist(id), but the World Bank SDMX REST API does not support queries for a specific code list. See https://datahelpdesk.worldbank.org/knowledgebase/articles/1886701-sdmx-api-queries.

fetch_codelist() retrieves http://api.worldbank.org/v2/sdmx/rest/codelist/WB/, the structure message containing all code lists; and extracts and returns the one with the given id.

message_ix_models.tools.wb.get_income_group_codelist() sdmx.model.common.Codelist[source]

Return a Codelist with World Bank income group information.

The returned code list is a modified version of the one with URN …Codelist=WB:CL_REF_AREA_WDI(1.0), via fetch_codelist().

This is augmented with information about the income group and lending category concepts as described at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519

The information is stored two ways:

  • Existing codes in the list like “HIC: High income” that designate groups of countries are associated with child codes that are designated as members of that country. These can be accessed at Code.child.

  • Existing codes in the list like “ABW: Aruba” are annotated with:

    These can be accessed using Code.annotations, Code.get_annotation, and other methods.

message_ix_models.tools.wb.make_map(source: Dict[str, str], expand_key_urn: bool = True, expand_value_urn: bool = False) Dict[str, str][source]

Prepare the replace parameter of assign_income_groups().

The result has one (key, value) for each in source.

Parameters:
  • expand_key_urn (bool) – If True (the default), replace each key from source with the URN for the code in CL_REF_AREA_WDI with id=key.

  • expand_value_urn (bool) – If True, replace each value from source with the URN for the code in CL_REF_AREA_WDI with id=value.