lp_diag
: basic diagnostics of linear program (LP) problems
Description
LPdiag
provides basic information about the LP programming problems defined by corresponding MPS-format files.
The diagnostics focuses on the implied numerical properties of the underlying optimization problem.
In this context, the term outlier
denotes the model entities having values in either lower or upper tail of the corresponding value distribution.
The tails are defined by the corresponding orders of magnitudes defined as \(int(alog(abs(val)))\), where val
stands for the value of the corresponding coefficient.
The default values of the tails are equal to \((-6, 6)\), respectively; they can be redefined, if desired.
The rule of thumb says: the maximum and minimum orders of magnitudes of the LP matrix coefficients passed to optimization should differ by at most four.
LPdiag
helps to achieve such a goal by providing info on outliers.
Such info can be used e.g., for:
reconsideration of measurement units of the corresponding variables and relations,
consideration of replacing small (in relations to other coefficients in the same row or column) elements by zero,
splitting the corresponding rows and/or columns,
verification of the coefficients’ values.
Features
The current LPdiag
version provides the following information:
characteristics of the problem (including numbers of rows, columns, non-zero coefficients and distributions of their values),
distributions of diverse values characterizing the LP matrix,
location (row and column) of each outlier,
ranges of values of other coefficients in each such row or column, as well as the corresponding bounds (LHS, RHS for rows, lower and upper bounds for columns).
The functionality of LPdiag
will be gradually enhanced to meet actual needs of the message_ix
modelers.
Usage
The tool analyzes provided MPS-format files.
We provide several small MPS files for testing local installations in message_ix/tests/data/lp_diag/
, as well as becoming familiar with LPdiag
.
The small MPS files are structured as follows:
aez.mps
: agro-ecological zones, medium size.diet.mps
: classical small LP.jg_korh.mps
: tiny testing problem.lotfi.mps
: classical medium size.error_*.mps
: various MPS-specs testing error-handling logic in the code.
Hints on generating MPS files are provided below.
Feel free to store arbitrary large MPS files in message_ix/tools/lp_diag/data/mps/
, but note that these should not be committed to GitHub.
We suggest the following steps for becoming familiar with LPdiag
and then use it for analysis of actual MPS files:
becoming familiar with
LPdiag
,prepare MPS file,
actual analysis.
We outline each of these steps below.
Becoming familiar with LPdiag
Note that LPdiag
should be run at the terminal prompt.
Navigate to the folder
message_ix/tools/lp_diag
.For initial testing run the following command, which will run analysis of the default (pre-specified) MPS provided in the test_mps folder. Other provided MPS example can be run by using the
--mps
option explained below.:message-ix lp-diag
To display the available
LPdiag
options run:$ message-ix lp-diag --help Usage: message-ix lp-diag [OPTIONS] Diagnostics of basic properties of LP problems stored in the MPS format. Examples: message-ix lp-diag message-ix lp-diag --help message-ix lp-diag --mps aez.mps --outp foo.txt Options: --wdir PATH Working directory. --mps PATH MPS file name or path. -L, --lo-tail INTEGER Magnitude order of the lower tail (default: -7). -U, --up-tail INTEGER Magnitude order of the upper tail (default: 5). --outp PATH Path for file output. --help Show this message and exit.
Further details about the optional parameters:
--wdir: specification of the desired work-directory (by default the work-directory is the same, in which
LPdiag
is located).--mps: name of the MPS file to be analysed; if the file is not located in the work-directory, then the name should include the path to the file (see the example above).
--outp: name of the file to which the output shall be redirected. By default the output is listed to the stdout, i.e., to the terminal window unless the redirection is included in the command. Optionally, the output can be redirected to a specified file. Such redirection can be specified by either using the
--outp file_name
option, as illustrated by the second example shown above (in the output resulting from using the-h
option), or by including the redirection in the corresponding command, e.g.,:message-ix lp-diag -h > foo.txt
- --lo-tail, --up-tail: These are passed to
LPdiag.print_statistics()
. To obtain the numbers of coefficients at every magnitude in the MPS file, specify equal or overlapping values:
message-ix lp-diag -L 0 -U 0 --mps file.mps
- --lo-tail, --up-tail: These are passed to
Generation of the MPS file in the message_ix
environment
The MPS-format is the oldest but still widely used for specification of the LP problems. Most modeling environments provide various ways of the MPS file generation.
In the message_ix
environment one can generate the MPS file e.g., upon solving a message_ix.Scenario
by defining in message_ix.Scenario.solve()
the writemps
option together with the desired name of the MPS file.
The MPS file will then be generated and deposited in the message_ix/model/
directory.
Details are available in the GAMS-Documentation
Example of specification of the corresponding option:
scenario.solve(solve_options={"writemps": "<file_name>.mps"})
Actual analysis
For actual analysis one needs to specify the corresponding MPS file in a command run (still in the directory message_ix/tools/lp_diag
):
message-ix lp-diag --mps loc/name
…where loc
and name
stand for the path to the directory where the MPS-file is located, and name
stands for the corresponding file-name, respectively.
Other option(s) can be included in the command, as explained above.
If the output redirection is desired (e.g., for results to be shared or composed of many lines), then run:
message-ix lp-diag --mps loc/name --outp outfile.txt
Extensions in the file names are optional. An alternative way of output redirection is explained above.
Summary of the provided analysis results
The results are composed of the following elements:
Info on the work-directory.
Info during reading the MPS file:
Should a syntax error occur during reading the file, then the corresponding exception is thrown with the corresponding details.
Basic info during processing of each MPS section.
Basic attributes of the read MPS.
Distribution of values of the objective (goal function) coefficients.
Distribution of \(abs(val)\) of the matrix elements.
Distribution of values of \(int(log10(abs(values)))\).
Distribution of values of \(int(log10(abs(values)))\) sorted by magnitudes of values (magnitudes of zero-occurrences skipped).
For each (lower and upper) tail of the matrix coefficient values of the corresponding sub-matrix:
Distributions of diverse values (\(value, abs(val), log10(abs(val))\)) of the matrix elements.
For each order of magnitude: number of elements
Row-wise location of each outlier with:
info on other coefficients in the same row, and
order of magnitude of the row’s LHS and RHS.
Column-wise location of each outlier with:
info on other coefficients in the same column, and
order of magnitude of the column’s lower and upper bounds.
The processing start- and end-times.
API reference
Analyse MPS-format files.
- class message_ix.tools.lp_diag.LPdiag[source]
Process the MPS-format input file and provide its basic diagnostics.
The diagnostics currently include:
handling formal errors of the MPS file
basic statistics of the matrix coefficients.
- add_bnd(words: List[str], n_line: int)[source]
Process current line of the BOUNDS section.
The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.
- add_coeff(words: List[str], n_line: int)[source]
Process current line of the COLUMNS section.
The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.
- add_range(words: str, n_line: int)[source]
Process current line of the RANGES section.
The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.
- add_rhs(words: List[str], n_line: int)[source]
Process current line of the RHS section.
The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.
- add_row(words: List[str], n_line: int)[source]
Process current line of the ROWS section.
While processing the ROWS section the row attributes are initialized to the default (for the corresponding row type) values. The attributes are updated for optionally defined values in the (also optional) RHS and RANGES sections. The interpretation of the MPS-format (in particular of values in the RANGES section) follows the original MPS standard, see e.g., “Advanced Linear Programming,” by Bruce A. Murtagh. or the standard summary at https://lpsolve.sourceforge.net/5.5/mps-format.htm .
- get_entity_info(mat_row: Series, by_row: bool = True) Tuple[int, str] [source]
Return info on the entity (row or col) defining the given matrix coefficient.
Each row of the dataFrame contains the definition (composed of the row_seq, col_seq, value, log(value)) of one matrix coefficient. The function returns seq_id and name of either row or col of the currently considered coeff.
- Parameters:
mat_row (
pandas.Series
) – Record of the df with the data of currently processed element.by_row (
bool
) – True/False for returning the seq_id and name of the corresponding row/col.
- get_entity_range(seq_id: int, by_row: bool = True) str [source]
Return formatted ranges of feasible values of either a row or a column.
The returned values of ranges are either ‘none’ (for plus/minus infinity) or int(log10(abs(val))) for other values. Small values, defined as abs(value) < 1e-10, are represented by 0.
- locate_outliers(small: bool = True, thresh: int = -7, max_rec: int = 500)[source]
Locations of outliers, i.e., elements having small/large coefficient values.
Locations of outliers (in the term of the matrix coefficient values). The provided ranges of values in the corresponding row/col indicate potential of the simple scaling.
- print_statistics(lo_tail: int = -7, up_tail: int = 6)[source]
Basic statistics of the matrix coefficients.
Focus on distributions of magnitudes of non-zero coefficients represented by values of int(log10(abs(coeff))). Additionally, tails (low and upp) of the distributions are reported.
- row_att(row_seq: int, row_name: str, row_type: str, sec_name: str, val: float = 0.0)[source]
Process values defined in ROWS, RHS and RANGES sections
The corresponding row attributes are stored or updated.
While processing the ROWS section the row attributes are initialized to the default (for the corresponding row type) values. The attributes are updated for optionally defined values in the (also optional) RHS and RANGES sections. The interpretation of the MPS-format (in particular of values in the RANGES section) follows the original MPS standard, see e.g., “Advanced Linear Programming,” by Bruce A. Murtagh. or the standard summary at https://lpsolve.sourceforge.net/5.5/mps-format.htm .
- Parameters:
row_seq (
int
) – Position of row in dictionaries and the matrix df.row_name (
str
) – Row name (defined in the ROWS section).row_type (
str
) – Row type (defined in the ROWS section).sec_name (
str
) – Identifies the MPS section: either ‘rows’ (for initialization) or ‘rhs’ or ‘ranges’ (for updates).val (
float
) – Value of the row attribute defining either lo_bnd or up_bnd of the row (the type checked while processing the MPS section).