General purpose modeling tools (`tools`)

“Tools” can include, inter alia:

Codes for retrieving data from specific data sources and adapting it for use with message_ix_models.
Codes for modifying scenarios; although tools for building models should go in message_ix_models.model.

On other pages:

On this page:

Exogenous data (`tools.exo_data`)

Generic tools for working with exogenous data sources.

The tools in this module support use of data from arbitrary sources and formats in model-building code. For each source/format, a subclass of ExoDataSource adds tasks to a genno.Computer that retrieve/load and transform the source data into genno.Quantity.

An example using one such class, message_ix_models.project.advance.data.ADVANCE.

from genno import Computer

from message_ix_models.project.advance.data import ADVANCE

# Keyword arguments corresponding to ADVANCE.Options
kw = dict(
    measure="Transport|Service demand|Road|Passenger|LDV",
    model="MESSAGE",
    scenario="ADV3TRAr2_Base",
)

# Add tasks to retrieve and transform data
c = Computer()
keys = c.apply(ADVANCE, context=context, **kw)

# Retrieve some of the data
q_result = c.get(keys[0])

# Pass the data into further calculations
c.add("derived", "mul", keys[1], k_other)

`MEASURES`	Measures recognized by some data sources.
`SOURCES`	Registered sources for data.
`BaseOptions`([aggregate, interpolate, ...])	Options for a concrete ExoDataSource subclass.
`DemoSource`(args, *kwargs)	Example source of exogenous population and GDP data.
`ExoDataSource`(args, *kwargs)	Abstract class for sources of exogenous data.
`add_structure`(c, *, context[, strict])	Add structural information to c.
`prepare_computer`(context, c[, source, ...])	Prepare c to compute GDP, population, or other exogenous data.
`register_source`(cls, *[, id])	Register `ExoDataSource` cls as a source of exogenous data.

class message_ix_models.tools.exo_data.BaseOptions(aggregate: bool = True, interpolate: bool = True, measure: str = '', name: str = '', dims: tuple[str, ...] = ('n', 'y'))[source]

Options for a concrete ExoDataSource subclass.

See ExoDataSource.Options.

aggregate: bool = True: True if ExoDataSource.transform() should aggregate data on the \(n\) dimension.

dims: tuple[str, ...] = ('n', 'y'): Dimensions for the returned Key/Quantity.

classmethod from_args(source_id: str | ExoDataSource, *args, **kwargs)[source]

Construct an instance from keyword arguments.

Parameters:: source_id – For backwards-compatibility with prepare_computer().

interpolate: bool = True: True if ExoDataSource.transform() should interpolate data on the \(y\) dimension.

measure: str = '': Identifier for the primary measure of retrieved/returned data.

name: str = '': Name for the returned Key/Quantity.

class message_ix_models.tools.exo_data.DemoSource(*args, **kwargs)[source]

Example source of exogenous population and GDP data.

class Options(aggregate: bool = True, interpolate: bool = True, measure: str = '', name: str = '', dims: tuple[str, ...] = ('n', 'y'), scenario: str = '')[source]

get() → AnyQuantity[source]

Return the data.

Implementations in concrete classes may load data from file, retrieve from remote sources or local caches, generate data, or anything else.

The Quantity returned by this method must have dimensions corresponding to key. If the original/upstream/raw data has different dimensionality (fewer or more dimensions; different dimension IDs), a concrete class must transform these, make appropriate selections, etc.

static random_data() → AnyQuantity[source]: Generate some random data with n, y, s, and v dimensions.

message_ix_models.tools.exo_data.MEASURES = ('GDP', 'POP'): Measures recognized by some data sources. Concrete ExoDataSource subclasses may provide support for other measures.

Todo

Store this in a separate code list or concept scheme.

message_ix_models.tools.exo_data.SOURCES: dict[str, type[ExoDataSource]] = {'ADVANCE': <class 'message_ix_models.project.advance.data.ADVANCE'>, 'BACI': <class 'message_ix_models.tools.cepii.BACI'>, 'DemoSource': <class 'message_ix_models.tools.exo_data.DemoSource'>, 'GEA': <class 'message_ix_models.project.gea.data.GEA'>, 'GFEI': <class 'message_ix_models.tools.gfei.GFEI'>, 'IEA_EEI': <class 'message_ix_models.tools.iea.eei.IEA_EEI'>, 'IEA_EWEB': <class 'message_ix_models.tools.iea.web.IEA_EWEB'>, 'PRICE_EMISSION': <class 'message_ix_models.model.emissions.PRICE_EMISSION'>, 'SHAPE': <class 'message_ix_models.project.shape.data.SHAPE'>, 'SSPOriginal': <class 'message_ix_models.project.ssp.data.SSPOriginal'>, 'SSPUpdate': <class 'message_ix_models.project.ssp.data.SSPUpdate'>}: Registered sources for data. Use register_source() to add to this collection.

message_ix_models.tools.exo_data.add_structure(c: Computer, *, context: Context, strict: bool = True) → None[source]

Add structural information to c.

Helper for ExoDataSource.add_tasks() and prepare_computer().

The added tasks include:

“context”: context, if not already set.
“n::codes”: get_codes() for the node code list according to Config.regions.
“n::groups”: codelist_to_groups() called on “n::codes”.
“y”: list of periods according to Config.years, if not already set.
“y0”: first element of “y”.
“y::coords”: dict mapping str("y") to the elements of “y”.
“yv::coords”: dict mapping str("yv") to the elements of “y”.
“y0::coord”: dict mapping str("y") to “y0”.

Parameters:: strict – if True, raise exceptions if the keys to be added are already in c.

message_ix_models.tools.exo_data.register_source(cls: type[ExoDataSource], *, id: str | None = None) → type[ExoDataSource][source]: Register ExoDataSource cls as a source of exogenous data.

class message_ix_models.tools.exo_data.ExoDataSource(*args, **kwargs)[source]

Abstract class for sources of exogenous data.

As an abstract class ExoDataSource must be subclassed to be used. Concrete subclasses must implement at least the get() method that performs the loading of the raw data when executed, and may override others, as described below.

The class method ExoDataSource.add_tasks() adds tasks to a genno.Computer. It returns a genno.Key that refers to the loaded and transformed data. This method usually should not be modified for subclasses.

The behaviour of a subclass can be customized in these ways:

Create a subclass of BaseOptions and set it as the Options class attribute.
Override __init__(), which receives keyword arguments via add_tasks().
Override transform(), which is called to add further tasks which will transform the data.

See the documentation for these methods and attributes for further details.

Options

Class defining per-instance options understood by this data source.

An concrete class may override this with a subclass of BaseOptions. That subclass may change the default values of any attributes of BaseOptions, or add others.

alias of BaseOptions

__init__(*args, **kwargs) → None[source]

Create an instance and prepare info for transform()/get().

The base implementation:

Sets options—if not already set—by passing kwargs to Options.
Raises an exception if there are other/unhandled args or kwargs.
If key is not set, constructs it with:
- Name name or measure in lower case.
- Dimensions dims.
Subclasses may pre-empt this behaviour by setting key statically or dynamically.

A concrete class implementation must:

Set options, either directly or by calling super().__init__() with or without keyword arguments.
Set key, either directly or by calling super().__init__(). In the latter case, it may set name, measure, and/or dims to control the behaviour.
Raise an exception if unrecognized or invalid kwargs are passed.

and may:

Transform kwargs or options arguments into other values, for instance by mapping certain values to others, applying regular expressions, or other operations.
Store those values as instance attributes for use in get().
Log messages that give information that helps to debug exceptions.

It must not perform any time- or memory-intensive operations, such as actually loading or fetching data. Those operations should be in get().

classmethod _where() → list[str | Path][source]

Helper for __init__() methods in concrete classes.

Return where

If use_test_data is True, also append "test".

classmethod add_tasks(c: Computer, *args, context: Context | None = None, strict: bool = True, **kwargs) → tuple[source]

Add tasks to c to provide and transform the data.

The first returned key is key, and will trigger the following tasks:

Load or retrieve data by invoking ExoDataSource.get().
If BaseOptions.aggregate is True, aggregate on the \(n\) (node) dimension according to Config.regions.
If BaseOptions.interpolate is True, interpolate on the \(y\) (year) dimension according to Config.years.

Steps (2) and (3) are added by transform() and may differ in concrete classes.

Other returned keys include further transformations:

key + "y0_indexed": same as key, but indexed to the values as of the first model period.

Other keys that are created but not returned can be accessed on c:

key + "message_ix_models.foo.bar.CLASS": the raw data, with a tag from the fully-qualified name of the ExoDataSource class.

To support the loading and transformation of data, add_structure() is first called with c.

Todo

Add option/tasks to index to a particular label on the \(n\) dimension.

Parameters:

context – Passed to add_structure().
strict – Passed to add_structure().

Return type:

tuple of Key

abstractmethod get() → AnyQuantity[source]

Return the data.

Implementations in concrete classes may load data from file, retrieve from remote sources or local caches, generate data, or anything else.

The Quantity returned by this method must have dimensions corresponding to key. If the original/upstream/raw data has different dimensionality (fewer or more dimensions; different dimension IDs), a concrete class must transform these, make appropriate selections, etc.

key: Key: Key for the returned Quantity. This may either be set statically on a concrete subclass, or created via __init__().

options: BaseOptions

Instance of the Options class.

A concrete class that overrides Options should redefine this attribute, to facilitate type checking.

transform(c: Computer, base_key: Key) → Key[source]

Add tasks to c to transform raw data from base_key.

base_key refers to the Quantity returned by get(). Via add_tasks(), transform() adds additional tasks to c that further transform the data. (Such operations may be done in get() directly, but transform() allows use of genno operators and conveniences.)

In the default implementation:

If aggregate is True, aggregate the data ( genno.operator.aggregate()) on the \(n\) dimension using the key “n::groups”.
If interpolate is True, interpolate the data ( genno.operator.interpolate()) on the \(y\) dimension using “y::coords”.

Concrete classes may override this method to, for instance, change how aggregate and interpolate are handled, or add further steps. Such overrides may call the base implementation, or not.

Returns:: referring to the data from base_key after any transformation. This may be the same as base_key.
Return type:: Key

use_test_data: bool = False: True to allow the class to look up and use test data. If no test data exists, this setting has no effect. See _where().

where: list[str | Path] = []: where keyword argument to path_fallback(). See _where().

message_ix_models.tools.exo_data.prepare_computer(context, c: Computer, source='test', source_kw: Mapping | None = None, *, strict: bool = True) → tuple[Key, ...][source]

Prepare c to compute GDP, population, or other exogenous data.

Check each ExoDataSource in SOURCES to determine whether it recognizes and can handle source and source_kw. If a source is identified, add tasks to c that retrieve and process data into a Quantity with, at least, dimensions \((n, y)\).

Deprecated since version 2025-06-06: Use ExoDataSource.add_tasks() instead. See exo_data.

Return type:: tuple of Key
Raises:: ValueError – if no source is registered which can handle source and source_kw.

Deprecated since version 2025-06-06: Use c.apply(SOURCE.add_tasks, …) as shown above.

IAMC data structures (`tools.iamc`)

Tools for working with IAMC-structured data.

message_ix_models.tools.iamc.compare(left: DataFrame, right: DataFrame, atol: float = 0.001, ignore=list[str | re.Pattern]) → list[str][source]

Compare IAMC-structured data in left and right.

The returned messages may include:

“No left data for model=’…’, scenario=’…’”.
“No right data for model=’…’, scenario=’…’”.
“variable=’…’: no left data”
“variable=’…’: no right data”
“variable=’…’: ### missing left entries”
“variable=’…’: ### missing right entries”
“variable=’…’: units mismatch: ‘…’ != ‘…’”
“variable=’…’: ### of ### values with |diff| > {atol}”
“### matching of ### left and ### right values”

Parameters:

left – Data frames with columns ‘model’, ‘scenario’, ‘region’, ‘variable’, and ‘year’ (that is, in ‘long’ IAMC structure).
right – Data frames with columns ‘model’, ‘scenario’, ‘region’, ‘variable’, and ‘year’ (that is, in ‘long’ IAMC structure).
atol – Absolute tolerance for differences.
ignore – Collection of regular expressions

Returns:

A collection of messages describing differences between left and right.

Return type:

list of str

message_ix_models.tools.iamc.describe(data: DataFrame, extra: str | None = None) → StructureMessage[source]

Generate SDMX structure information from data in IAMC format.

Parameters:

data – Data in “wide” or “long” IAMC format.
extra (str, optional) – Extra text added to the description of each Codelist.

Returns:

The message contains one Codelist for each of the MODEL, SCENARIO, REGION, VARIABLE, and UNIT dimensions. Codes for the VARIABLE code list have annotations with id="preferred-unit-measure" that give the corresponding UNIT Code(s) that appear with each VARIABLE.

Return type:

sdmx.message.StructureMessage

message_ix_models.tools.iamc.iamc_like_data_for_query(path: pathlib.Path, query: str, *, archive_member: str | None = None, drop: list[str] | None = None, non_iso_3166: Literal['keep', 'discard'] = 'discard', replace: dict | None = None, unique: str = 'MODEL SCENARIO VARIABLE UNIT', **kwargs) → AnyQuantity[source]

Load data from path in an IAMC-like format and transform to Quantity.

The steps involved are:

Read the data file. Additional kwargs are passed to pandas.read_csv(). By default (unless kwargs explicitly give a different value), pyarrow is used for better performance.
Pass the result through to_quantity(), with the parameters query, drop, non_iso_3166, replace, and unique.
Cache the result using cached. Subsequent calls with the same arguments will yield the cached result rather than repeating steps (1) and (2).

Parameters:: archive_member (bool, optional) – If given, path may be a tar or ZIP archive with 1 or more members. The member named by archive_member is extracted and read using tarfile.TarFile or zipfile.ZipFile.
Returns:: of the same structure returned by to_quantity().
Return type:: genno.Quantity

Data returned by this function is cached using cached(); see also SKIP_CACHE.

message_ix_models.tools.iamc.to_quantity(data: pd.DataFrame, *, query: str | None = None, drop: list[str] | None = None, non_iso_3166: Literal['keep', 'discard'] = 'discard', replace: dict | None = None, unique: str = 'MODEL SCENARIO VARIABLE UNIT') → AnyQuantity[source]

Convert data in IAMC ‘wide’ structure to genno.Quantity.

data is processed via the following steps:

Drop columns given in drop, if any.

Apply query. This is done early to reduce the data handled in subsequent steps. The query string must use the original column names (with matching case) as appearing in data (or, for iamc_like_data_for_query(), in the file at path).

Apply replacements from replace, if any.

Drop columns that are entirely empty.

Rename all columns/dimensions to upper case.

Assert that the unique columns each contain exactly 1 unique value, then drop these columns. This means that query must result in data with unique values for these dimensions.

Transform “REGION” codes via iso_3166_alpha_3() to an “n” dimension containing ISO 3166-1 alpha-3 codes. If non_iso_3166, preserve codes that do not appear in the standard.

Drop entire time series where (7) does not yield an “n” code.

Transform to pandas.Series with “n” and “y” index levels; ensure the latter are int.

Transform to Quantity and attach units.

Parameters:

data – Data frame in IAMC ‘wide’ format. The column names “Model”, “Scenario”, “Region”, “Variable”, and “Unit” may be in any case.
query – Query to select a subset of data, passed to pandas.DataFrame.query().
drop – Identifiers of columns in data, passed to pandas.DataFrame.drop().
non_iso_3166 – If “discard” (default), “region” labels that are not ISO 3166-1 country names are discarded, along with associated data. If “keep”, such labels are kept.
replace – Replacements for values in columns, passed to pandas.DataFrame.replace().
unique – Columns which must contain unique values. These columns are dropped from the result.

Returns:

with at least dimensions ("n", "y"), and then a subset of ("MODEL", "SCENARIO", "VARIABLE", "UNIT")—only those dimensions not indicated by unique. If “UNIT” is in unique, the quantity has the given, unique units; otherwise, it is dimensionless.

Return type:

genno.Quantity

Policies (`tools.policy`)

Policies.

class message_ix_models.tools.policy.Policy[source]

Base class for policies.

This class has no attributes or public methods. Other modules in message_ix_models:

should subclass Policy to represent different kinds of policy.
may add attributes, methods, etc. to aid with the implementation of those policies in concrete scenarios.
in contrast, may use minimal subclasses as mere flags to be interpreted by other code.

The default implementation of hash() returns a value the same for every instance of a subclass. This means that two instances of the same subclass hash equal. See Config.policy.

message_ix_models.tools.policy.single_policy_of_type(collection: Collection[Policy], cls: type[T]) → T | None[source]: Return a single member of collection of type cls.

World Bank structures (`tools.wb`)

Tools for World Bank data.

message_ix_models.tools.wb.assign_income_groups(cl_node: sdmx.model.common.Codelist, cl_income_group: sdmx.model.common.Codelist, method: str = 'population', replace: dict[str, str] | None = None) → None[source]

Annotate cl_node with income groups.

Each node is assigned an Annotation with id="wb-income-group", according to the income groups of its children (countries), as reflected in cl_income_group (see get_income_group_codelist()).

Parameters:

method ("population" or "count") –
Method for aggregation:
- "population" (default): the WB World Development Indicators (WDI) 2020 population for each country is used as a weight, so that the node’s income group is the income group of the plurality of the population of its children.
- "count": each country is weighted equally, so that the node’s income group is the mode (most frequently occurring value) of its childrens’.
replace (dict) – Mapping from wb-income-group annotation text appearing in cl_income_group to texts to be attached to cl_node. Mapping two keys to the same value effectively combines or aggregates those groups. See make_map().

Example

Annotate the R12 node list with income group information, mapping high income countries (HIC) and upper-middle income countries (UMC) into one group and aggregating by population.

>>> cl_node = get_codelist(f"node/R12")
>>> cl_ig = get_income_group_codelist()
>>> replace = make_map({"HIC": "HMIC", "UMC": "HMIC"})
>>> assign_income_groups(cl_node, cl_ig, replace=replace)
>>> cl_node["R12_NAM"].get_annotation(id="wb-income-group").text
HMIC

message_ix_models.tools.wb.fetch_codelist(id: str) → sdmx.model.common.Codelist[source]

Retrieve code lists related to the WB World Development Indicators.

In principle this could be done with sdmx.Client("WB_WDI").codelist(id), but the World Bank SDMX REST API does not support queries for a specific code list. See https://datahelpdesk.worldbank.org/knowledgebase/articles/1886701-sdmx-api-queries.

fetch_codelist() retrieves http://api.worldbank.org/v2/sdmx/rest/codelist/WB/, the structure message containing all code lists; and extracts and returns the one with the given id.

message_ix_models.tools.wb.get_income_group_codelist() → sdmx.model.common.Codelist[source]

Return a Codelist with World Bank income group information.

The returned code list is a modified version of the one with URN …Codelist=WB:CL_REF_AREA_WDI(1.0), via fetch_codelist().

This is augmented with information about the income group and lending category concepts as described at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519

The information is stored two ways:

Existing codes in the list like “HIC: High income” that designate groups of countries are associated with child codes that are designated as members of that country. These can be accessed at Code.child.
Existing codes in the list like “ABW: Aruba” are annotated with:
- id="wb-income-group": the URN of the income group code, for instance “urn:sdmx:org.sdmx.infomodel.codelist.Code=WB:CL_REF_AREA_WDI(1.0).HIC”. This is an unambiguous reference to a code in the same list.
- id="wb-lending-category": the name of the lending category, if any.
These can be accessed using Code.annotations, Code.get_annotation, and other methods.

message_ix_models.tools.wb.make_map(source: dict[str, str], expand_key_urn: bool = True, expand_value_urn: bool = False) → dict[str, str][source]

Prepare the replace parameter of assign_income_groups().

The result has one (key, value) for each in source.

Parameters:

expand_key_urn (bool) – If True (the default), replace each key from source with the URN for the code in CL_REF_AREA_WDI with id=key.
expand_value_urn (bool) – If True, replace each value from source with the URN for the code in CL_REF_AREA_WDI with id=value.

Tools for scenario manipulation

`add_AFOLU_CO2_accounting`	Add `land_output` and set entries for accounting AFOLU emissions of CO2.
`add_CO2_emission_constraint`	Add bound for generic relation at the global level.
`add_FFI_CO2_accounting`	Add accounting possibility for CO2 emissions from FFI.
`add_alternative_TCE_accounting`	Add structure and data for emission constraints.
`add_budget`	Add a budget constraint to a given region.
`add_dac`	Created on Mon Mar 20 15:41:32 2023
`add_emission_trajectory`	Modify scen to include an emission bound.
`add_tax_emission`	Add a global CO2 price to scen.
`inter_pipe`	Inter-pipe tools.
`remove_emission_bounds`	Remove `bound_emission` and `tax_emission` data from a scenario.
`update_h2_blending`	Revise hydrogen-blending constraints.

General purpose modeling tools (tools)

Exogenous data (tools.exo_data)

IAMC data structures (tools.iamc)

Policies (tools.policy)

World Bank structures (tools.wb)

Tools for scenario manipulation

General purpose modeling tools (`tools`)

Exogenous data (`tools.exo_data`)

IAMC data structures (`tools.iamc`)

Policies (`tools.policy`)

World Bank structures (`tools.wb`)