General purpose modeling tools
“Tools” can include, inter alia:
Codes for retrieving data from specific data sources and adapting it for use with
message_ix_models
.Codes for modifying scenarios; although tools for building models should go in
message_ix_models.model
.
On other pages:
On this page:
Exogenous data (tools.exo_data
)
Generic tools for working with exogenous data sources.
Supported measures. |
|
Known sources for data. |
|
|
Example source of exogenous population and GDP data. |
|
Base class for sources of exogenous data. |
|
Load data from path in IAMC-like format and transform to |
|
Prepare c to compute GDP, population, or other exogenous data. |
|
Register |
- class message_ix_models.tools.exo_data.DemoSource(source, source_kw)[source]
Example source of exogenous population and GDP data.
- Parameters:
- message_ix_models.tools.exo_data.MEASURES = ('GDP', 'POP')
Supported measures. Subclasses of
ExoDataSource
may provide support for other measures.Todo
Store this in a separate code list or concept scheme.
- message_ix_models.tools.exo_data.SOURCES: Dict[str, Type[ExoDataSource]] = {'ADVANCE': <class 'message_ix_models.project.advance.data.ADVANCE'>, 'DEMO': <class 'message_ix_models.tools.exo_data.DemoSource'>, 'GEA': <class 'message_ix_models.project.gea.data.GEA'>, 'GFEI': <class 'message_ix_models.tools.gfei.GFEI'>, 'IEA EEI': <class 'message_ix_models.tools.iea.eei.IEA_EEI'>, 'IEA_EWEB': <class 'message_ix_models.tools.iea.web.IEA_EWEB'>, 'SHAPE': <class 'message_ix_models.project.shape.data.SHAPE'>, 'SSP': <class 'message_ix_models.project.ssp.data.SSPOriginal'>, 'SSP update': <class 'message_ix_models.project.ssp.data.SSPUpdate'>, 'iea-future-of-trucks': <class 'message_ix_models.model.transport.data.IEA_Future_of_Trucks'>, 'transport MERtoPPP': <class 'message_ix_models.model.transport.data.MERtoPPP'>}
Known sources for data. Use
register_source()
to add to this collection.
- message_ix_models.tools.exo_data.iamc_like_data_for_query(path: Path, query: str, *, archive_member: str | None = None, drop: List[str] | None = None, non_iso_3166: Literal['keep', 'discard'] = 'discard', replace: dict | None = None, unique: str = 'MODEL SCENARIO VARIABLE UNIT', **kwargs) AttrSeries [source]
Load data from path in IAMC-like format and transform to
Quantity
.The steps involved are:
Read the data file; use pyarrow for better performance.
Immediately apply query to reduce the data to be handled in subsequent steps.
Assert that Model, Scenario, Variable, and Unit are unique; store the unique values. This means that query must result in data with unique values for these dimensions.
Transform “Region” labels to ISO 3166-1 alpha-3 codes using
iso_3166_alpha_3()
.Drop entire time series without such codes; for instance “World”.
Transform to a pd.Series with “n” and “y” index levels; ensure the latter are int.
Transform to
Quantity
with units.
The result is
cached
.- Parameters:
archive_member (
bool
, optional) – If given, path may be an archive with 2 or more members. The member named by archive_member is extracted and read.non_iso_3166 (
bool
, optional) – If “discard” (default), “region” labels that are not ISO 3166-1 country names are discarded, along with associated data. If “keep”, such labels are kept.
Data returned by this function is cached using
cached()
; see alsoSKIP_CACHE
.
- message_ix_models.tools.exo_data.register_source(cls: Type[ExoDataSource]) Type[ExoDataSource] [source]
Register
ExoDataSource
cls as a source of exogenous data.
- message_ix_models.tools.exo_data.prepare_computer(context, c: Computer, source='test', source_kw: Mapping | None = None, *, strict: bool = True) Tuple[Key, ...] [source]
Prepare c to compute GDP, population, or other exogenous data.
Check each
ExoDataSource
inSOURCES
to determine whether it recognizes and can handle source and source_kw. If a source is identified, add tasks to c that retrieve and process data into aQuantity
with, at least, dimensions \((n, y)\).- Parameters:
source (
str
) – Identifier of the source, possibly with other information to be handled by aExoDataSource
.source_kw (
dict
, optional) –Keyword arguments for a Source class. These can include indexers, selectors, or other information needed by the source class to identify the data to be returned.
If the key “measure” is present, it should be one of
MEASURES
.strict (
bool
, optional) – Raise an exception if any of the keys to be added already exist.
- Return type:
- Raises:
ValueError – if no source is registered which can handle source and source_kw.
The first returned key, like
{measure}:n-y
, triggers the following computations:Load data by invoking a
ExoDataSource
.Aggregate on the \(n\) (node) dimension according to
Config.regions
.Interpolate on the \(y\) (year) dimension according to
Config.years
.
Additional key(s) include:
{measure}:n-y:y0 indexed
: same as{measure}:n-y
, indexed to values as of \(y_0\) (the first model year).
See particular data source classes, like
SSPOriginal
, for particular examples of usage.Todo
Extend to also prepare to compute values indexed to a particular \(n\).
- class message_ix_models.tools.exo_data.ExoDataSource(source: str, source_kw: Mapping)[source]
Base class for sources of exogenous data.
- abstract __call__() AttrSeries [source]
Return the data.
The Quantity returned by this method must have dimensions \((n, y) \cup \text{extra_dims}\). If the original/upstream/raw data has different dimensionality (fewer or more dimensions; different dimension IDs), the code must transform these, make appropriate selections, etc.
- abstract __init__(source: str, source_kw: Mapping) None [source]
Handle source and source_kw.
An implementation must:
Raise
ValueError
if it does not recognize or cannot handle the arguments in source or source_kw.Recognize and handle (if possible) a “measure” keyword in source_kw from
MEASURES
.
It may:
Transform these into other values, for instance by mapping certain values to others, applying regular expressions, or other operations.
Store those values as instance attributes for use in
__call__()
.Set
name
and/orextra_dims
to control the behaviour ofprepare_computer()
.Log messages that give information that may help to debug a
ValueError
for source or source_kw that cannot be handled.
It should not actually load data or perform any time- or memory-intensive operations; these should only be triggered by
__call__()
.
- aggregate: bool = True
True
iftransform()
should aggregate data on the \(n\) dimension.
- extra_dims: Tuple[str, ...] = ()
Optional additional dimensions for the returned
Key
/Quantity
. If not set by__init__()
, the dimensions are \((n, y)\).
- interpolate: bool = True
True
iftransform()
should interpolate data on the \(y\) dimension.
- name: str = ''
Optional name for the returned
Key
/Quantity
. If not set by__init__()
, then the “measure” keyword is used.
- raise_on_extra_kw(kwargs) None [source]
Helper for subclasses to handle the source_kw argument.
Store
aggregate
andinterpolate
, if they remain in kwargs.Raise
ValueError
if there are any other, unhandled keyword arguments in kwargs.
- transform(c: Computer, base_key: Key) Key [source]
Prepare c to transform raw data from base_key.
base_key identifies the
Quantity
that is returned by__call__()
. Before the data is returned,transform()
allows the data source to add additional tasks or computations to c that further transform the data. (These operations may be done in__call__()
directly, buttransform()
allows use of othergenno
operators and conveniences.)The default implementation:
If
aggregate
isTrue
, aggregates the data (genno.operator.aggregate()
) on the \(n\) dimension using the key “n::groups”.If
interpolate
isTrue
, interpolates the data (genno.operator.interpolate()
) on the \(y\) dimension using “y::coords”.
ADVANCE data (tools.advance
)
Deprecated since version 2023.11: Use project.advance
instead.
|
Return data from the ADVANCE Work Package 2 data snapshot at |
|
Return a single ADVANCE data variable as a |
- message_ix_models.tools.advance.LOCATION = ('advance', 'advance_compare_20171018-134445.csv.zip')
Built-in immutable sequence.
If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.
If the argument is a tuple, the return value is the same object.
This is a location relative to a parent directory.
The specific parent directory depends on whether message_data
is available:
- Without
message_data
: The code finds the data within (4) Other, system-specific (“local”) directories (see discussion there for how to configure this location). Users should:
Visit https://tntcat.iiasa.ac.at/ADVANCEWP2DB/dsd?Action=htmlpage&page=about and register for access to the data.
Log in.
Download the snapshot with the file name given in
LOCATION
to a subdirectoryadvance/
within their local data directory.
- With
message_data
: The code finds the data within (3) data/ directory in the message_data repository. The snapshot is stored directly in the repository using Git LFS.
Handle data from the ADVANCE project.
- message_ix_models.tools.advance.DIMS = ['model', 'scenario', 'region', 'variable', 'unit', 'year']
Standard dimensions for data produced as snapshots from the IIASA ENE Program “WorkDB”.
- message_ix_models.tools.advance._read_workdb_snapshot(path: Path, name: str) Series [source]
Read the data file.
The expected format is a ZIP archive at path containing a member at name in CSV format, with columns corresponding to
DIMS
, except for “year”, which is stored as column headers (‘wide’ format). (This corresponds to an older version of the “IAMC format,” without more recent additions intended to represent sub-annual time resolution using a separate column.)Deprecated since version 2023.11: Use
iamc_like_data_for_query()
instead.Data returned by this function is cached using
cached()
; see alsoSKIP_CACHE
.
- message_ix_models.tools.advance.advance_data(variable: str, query: str | None = None) AttrSeries [source]
Return a single ADVANCE data variable as a
genno.Quantity
.Deprecated since version 2023.11: Use
ADVANCE
throughexo_data.prepare_computer()
instead.- Parameters:
query (
str
, optional) – Passed toget_advance_data()
.- Returns:
with the dimensions
DIMS
and name variable. If the units of the data for variable are consistent and parseable bypint
, the returned Quantity has these units; otherwise units are discarded and the returned Quantity is dimensionless.- Return type:
- message_ix_models.tools.advance.get_advance_data(query: str | None = None) Series [source]
Return data from the ADVANCE Work Package 2 data snapshot at
LOCATION
.Deprecated since version 2023.11: Use
ADVANCE
throughexo_data.prepare_computer()
instead.- Parameters:
query (
str
, optional) – Passed topandas.DataFrame.query()
to limit the returned values.- Returns:
with a
pandas.MultiIndex
having the levelsDIMS
.- Return type:
Data returned by this function is cached using
cached()
; see alsoSKIP_CACHE
.
IAMC data structures (tools.iamc
)
Tools for working with IAMC-structured data.
- message_ix_models.tools.iamc.describe(data: DataFrame, extra: str | None = None) StructureMessage [source]
Generate SDMX structure information from data in IAMC format.
- Parameters:
data – Data in “wide” or “long” IAMC format.
extra (
str
, optional) – Extra text added to the description of each Codelist.
- Returns:
The message contains one
Codelist
for each of the MODEL, SCENARIO, REGION, VARIABLE, and UNIT dimensions. Codes for the VARIABLE code list have annotations withid="preferred-unit-measure"
that give the corresponding UNIT Code(s) that appear with each VARIABLE.- Return type:
World Bank structures (tools.wb
)
Tools for World Bank data.
- message_ix_models.tools.wb.assign_income_groups(cl_node: sdmx.model.common.Codelist, cl_income_group: sdmx.model.common.Codelist, method: str = 'population', replace: Dict[str, str] | None = None) None [source]
Annotate cl_node with income groups.
Each node is assigned an
Annotation
withid="wb-income-group"
, according to the income groups of its children (countries), as reflected in cl_income_group (seeget_income_group_codelist()
).- Parameters:
method (
"population"
or"count"
) –Method for aggregation:
"population"
(default): the WB World Development Indicators (WDI) 2020 population for each country is used as a weight, so that the node’s income group is the income group of the plurality of the population of its children."count"
: each country is weighted equally, so that the node’s income group is the mode (most frequently occurring value) of its childrens’.
replace (
dict
) – Mapping from wb-income-group annotation text appearing in cl_income_group to texts to be attached to cl_node. Mapping two keys to the same value effectively combines or aggregates those groups. Seemake_map()
.
Example
Annotate the R12 node list with income group information, mapping high income countries (HIC) and upper-middle income countries (UMC) into one group and aggregating by population.
>>> cl_node = get_codelist(f"node/R12") >>> cl_ig = get_income_group_codelist() >>> replace = make_map({"HIC": "HMIC", "UMC": "HMIC"}) >>> assign_income_groups(cl_node, cl_ig, replace=replace) >>> cl_node["R12_NAM"].get_annotation(id="wb-income-group").text HMIC
- message_ix_models.tools.wb.fetch_codelist(id: str) sdmx.model.common.Codelist [source]
Retrieve code lists related to the WB World Development Indicators.
In principle this could be done with
sdmx.Client("WB_WDI").codelist(id)
, but the World Bank SDMX REST API does not support queries for a specific code list. See https://datahelpdesk.worldbank.org/knowledgebase/articles/1886701-sdmx-api-queries.fetch_codelist()
retrieves http://api.worldbank.org/v2/sdmx/rest/codelist/WB/, the structure message containing all code lists; and extracts and returns the one with the given id.
- message_ix_models.tools.wb.get_income_group_codelist() sdmx.model.common.Codelist [source]
Return a
Codelist
with World Bank income group information.The returned code list is a modified version of the one with URN
…Codelist=WB:CL_REF_AREA_WDI(1.0)
, viafetch_codelist()
.This is augmented with information about the income group and lending category concepts as described at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519
The information is stored two ways:
Existing codes in the list like “HIC: High income” that designate groups of countries are associated with child codes that are designated as members of that country. These can be accessed at
Code.child
.Existing codes in the list like “ABW: Aruba” are annotated with:
id="wb-income-group"
: the URN of the income group code, for instance “urn:sdmx:org.sdmx.infomodel.codelist.Code=WB:CL_REF_AREA_WDI(1.0).HIC”. This is an unambiguous reference to a code in the same list.id="wb-lending-category"
: the name of the lending category, if any.
These can be accessed using
Code.annotations
,Code.get_annotation
, and other methods.