lp_diag: basic diagnostics of linear program (LP) problems

Description

LPdiag provides basic information about the LP programming problems defined by corresponding MPS-format files. The diagnostics focuses on the implied numerical properties of the underlying optimization problem.

In this context, the term outlier denotes the model entities having values in either lower or upper tail of the corresponding value distribution. The tails are defined by the corresponding orders of magnitudes defined as \(int(alog(abs(val)))\), where val stands for the value of the corresponding coefficient. The default values of the tails are equal to \((-6, 6)\), respectively; they can be redefined, if desired.

The rule of thumb says: the maximum and minimum orders of magnitudes of the LP matrix coefficients passed to optimization should differ by at most four. LPdiag helps to achieve such a goal by providing info on outliers. Such info can be used e.g., for:

  • reconsideration of measurement units of the corresponding variables and relations,

  • consideration of replacing small (in relations to other coefficients in the same row or column) elements by zero,

  • splitting the corresponding rows and/or columns,

  • verification of the coefficients’ values.

Features

The current LPdiag version provides the following information:

  • characteristics of the problem (including numbers of rows, columns, non-zero coefficients and distributions of their values),

  • distributions of diverse values characterizing the LP matrix,

  • location (row and column) of each outlier,

  • ranges of values of other coefficients in each such row or column, as well as the corresponding bounds (LHS, RHS for rows, lower and upper bounds for columns).

The functionality of LPdiag will be gradually enhanced to meet actual needs of the message_ix modelers.

Usage

The tool analyzes provided MPS-format files. We provide several small MPS files for testing local installations in message_ix/tests/data/lp_diag/, as well as becoming familiar with LPdiag. The small MPS files are structured as follows:

  • aez.mps: agro-ecological zones, medium size.

  • diet.mps: classical small LP.

  • jg_korh.mps: tiny testing problem.

  • lotfi.mps: classical medium size.

  • error_*.mps: various MPS-specs testing error-handling logic in the code.

Hints on generating MPS files are provided below. Feel free to store arbitrary large MPS files in message_ix/tools/lp_diag/data/mps/, but note that these should not be committed to GitHub.

We suggest the following steps for becoming familiar with LPdiag and then use it for analysis of actual MPS files:

  • becoming familiar with LPdiag,

  • prepare MPS file,

  • actual analysis.

We outline each of these steps below.

Becoming familiar with LPdiag

Note that LPdiag should be run at the terminal prompt.

  • Navigate to the folder message_ix/tools/lp_diag.

  • For initial testing run the following command, which will run analysis of the default (pre-specified) MPS provided in the test_mps folder. Other provided MPS example can be run by using the --mps option explained below.:

    message-ix lp-diag
    
  • To display the available LPdiag options run:

    $ message-ix lp-diag --help
    Usage: message-ix lp-diag [OPTIONS]
    
      Diagnostics of basic properties of LP problems stored in the MPS format.
    
      Examples:
        message-ix lp-diag
        message-ix lp-diag --help
        message-ix lp-diag --mps aez.mps --outp foo.txt
    
    Options:
      --wdir PATH            Working directory.
      --mps PATH             MPS file name or path.
      -L, --lo-tail INTEGER  Magnitude order of the lower tail (default: -7).
      -U, --up-tail INTEGER  Magnitude order of the upper tail (default: 5).
      --outp PATH            Path for file output.
      --help                 Show this message and exit.
    

Further details about the optional parameters:

  • --wdir: specification of the desired work-directory (by default the work-directory is the same, in which LPdiag is located).

  • --mps: name of the MPS file to be analysed; if the file is not located in the work-directory, then the name should include the path to the file (see the example above).

  • --outp: name of the file to which the output shall be redirected. By default the output is listed to the stdout, i.e., to the terminal window unless the redirection is included in the command. Optionally, the output can be redirected to a specified file. Such redirection can be specified by either using the --outp file_name option, as illustrated by the second example shown above (in the output resulting from using the -h option), or by including the redirection in the corresponding command, e.g.,:

    message-ix lp-diag -h > foo.txt
    
  • --lo-tail, --up-tail: These are passed to LPdiag.print_statistics().

    To obtain the numbers of coefficients at every magnitude in the MPS file, specify equal or overlapping values:

    message-ix lp-diag -L 0 -U 0 --mps file.mps
    

Generation of the MPS file in the message_ix environment

The MPS-format is the oldest but still widely used for specification of the LP problems. Most modeling environments provide various ways of the MPS file generation.

In the message_ix environment one can generate the MPS file e.g., upon solving a message_ix.Scenario by defining in message_ix.Scenario.solve() the writemps option together with the desired name of the MPS file. The MPS file will then be generated and deposited in the message_ix/model/ directory. Details are available in the GAMS-Documentation

Example of specification of the corresponding option:

scenario.solve(solve_options={"writemps": "<file_name>.mps"})

Actual analysis

For actual analysis one needs to specify the corresponding MPS file in a command run (still in the directory message_ix/tools/lp_diag):

message-ix lp-diag --mps loc/name

…where loc and name stand for the path to the directory where the MPS-file is located, and name stands for the corresponding file-name, respectively. Other option(s) can be included in the command, as explained above.

If the output redirection is desired (e.g., for results to be shared or composed of many lines), then run:

message-ix lp-diag --mps loc/name --outp outfile.txt

Extensions in the file names are optional. An alternative way of output redirection is explained above.

Summary of the provided analysis results

The results are composed of the following elements:

  • Info on the work-directory.

  • Info during reading the MPS file:

    • Should a syntax error occur during reading the file, then the corresponding exception is thrown with the corresponding details.

    • Basic info during processing of each MPS section.

  • Basic attributes of the read MPS.

  • Distribution of values of the objective (goal function) coefficients.

  • Distribution of \(abs(val)\) of the matrix elements.

  • Distribution of values of \(int(log10(abs(values)))\).

  • Distribution of values of \(int(log10(abs(values)))\) sorted by magnitudes of values (magnitudes of zero-occurrences skipped).

  • For each (lower and upper) tail of the matrix coefficient values of the corresponding sub-matrix:

    • Distributions of diverse values (\(value, abs(val), log10(abs(val))\)) of the matrix elements.

    • For each order of magnitude: number of elements

    • Row-wise location of each outlier with:

      1. info on other coefficients in the same row, and

      2. order of magnitude of the row’s LHS and RHS.

    • Column-wise location of each outlier with:

      1. info on other coefficients in the same column, and

      2. order of magnitude of the column’s lower and upper bounds.

  • The processing start- and end-times.

API reference

Analyse MPS-format files.

class message_ix.tools.lp_diag.LPdiag[source]

Process the MPS-format input file and provide its basic diagnostics.

The diagnostics currently include:

  • handling formal errors of the MPS file

  • basic statistics of the matrix coefficients.

add_bnd(words: list[str], n_line: int)[source]

Process current line of the BOUNDS section.

The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.

Parameters:
  • words (str) – Words of the current line.

  • n_line (int) – Sequence number of the current MPS line.

add_coeff(words: list[str], n_line: int)[source]

Process current line of the COLUMNS section.

The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.

Parameters:
  • words (str) – Words of the current line.

  • n_line (int) – Sequence number of the current MPS line.

add_range(words: str, n_line: int)[source]

Process current line of the RANGES section.

The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.

Parameters:
  • words (str) – Words of the current line.

  • n_line (int) – Sequence number of the current MPS line.

add_rhs(words: list[str], n_line: int)[source]

Process current line of the RHS section.

The section defines both column names and values of the matrix coefficients. One line can have either one or two matrix elements.

Parameters:
  • words (str) – Words of the current line.

  • n_line (int) – Sequence number of the current MPS line.

add_row(words: list[str], n_line: int)[source]

Process current line of the ROWS section.

While processing the ROWS section the row attributes are initialized to the default (for the corresponding row type) values. The attributes are updated for optionally defined values in the (also optional) RHS and RANGES sections. The interpretation of the MPS-format (in particular of values in the RANGES section) follows the original MPS standard, see e.g., “Advanced Linear Programming,” by Bruce A. Murtagh. or the standard summary at https://lpsolve.sourceforge.net/5.5/mps-format.htm .

Parameters:
  • words (str) – Words of the current line.

  • n_line (int) – Sequence number of the current MPS line.

get_entity_info(mat_row: Series, by_row: bool = True) tuple[int, str][source]

Return info on the entity (row or col) defining the given matrix coefficient.

Each row of the dataFrame contains the definition (composed of the row_seq, col_seq, value, log(value)) of one matrix coefficient. The function returns seq_id and name of either row or col of the currently considered coeff.

Parameters:
  • mat_row (pandas.Series) – Record of the df with the data of currently processed element.

  • by_row (bool) – True/False for returning the seq_id and name of the corresponding row/col.

get_entity_range(seq_id: int, by_row: bool = True) str[source]

Return formatted ranges of feasible values of either a row or a column.

The returned values of ranges are either ‘none’ (for plus/minus infinity) or int(log10(abs(val))) for other values. Small values, defined as abs(value) < 1e-10, are represented by 0.

Parameters:
  • seq_id (int) – Sequence number of either row or col.

  • by_row (bool) – True/False for returning the seq_id and name of the corresponding row/col.

locate_outliers(small: bool = True, thresh: int = -7, max_rec: int = 500)[source]

Locations of outliers, i.e., elements having small/large coefficient values.

Locations of outliers (in the term of the matrix coefficient values). The provided ranges of values in the corresponding row/col indicate potential of the simple scaling.

Parameters:
  • small (bool) – True/False for threshold of either small or large coefficients

  • thresh (int) – Magnitude of the threshold (in: int(log10(abs(coeff))), i.e. -7 denotes values < 10^(-6)).

  • max_rec (int) – Maximum number of processed coefficients.

plot_hist()[source]

Plot histograms.

Note

Not implemented.

print_statistics(lo_tail: int = -7, up_tail: int = 6)[source]

Basic statistics of the matrix coefficients.

Focus on distributions of magnitudes of non-zero coefficients represented by values of int(log10(abs(coeff))). Additionally, tails (low and upp) of the distributions are reported.

Parameters:
  • lo_tail (int) – Magnitude order of the low-tail (-7 denotes values < 10^(-6)).

  • up_tail (int) – Magnitude order of the upper-tail (6 denotes values >= 10^6).

read_mps(fname)[source]

Process the MPS file.

row_att(row_seq: int, row_name: str, row_type: str, sec_name: str, val: float = 0.0)[source]

Process values defined in ROWS, RHS and RANGES sections

The corresponding row attributes are stored or updated.

While processing the ROWS section the row attributes are initialized to the default (for the corresponding row type) values. The attributes are updated for optionally defined values in the (also optional) RHS and RANGES sections. The interpretation of the MPS-format (in particular of values in the RANGES section) follows the original MPS standard, see e.g., “Advanced Linear Programming,” by Bruce A. Murtagh. or the standard summary at https://lpsolve.sourceforge.net/5.5/mps-format.htm .

Parameters:
  • row_seq (int) – Position of row in dictionaries and the matrix df.

  • row_name (str) – Row name (defined in the ROWS section).

  • row_type (str) – Row type (defined in the ROWS section).

  • sec_name (str) – Identifies the MPS section: either ‘rows’ (for initialization) or ‘rhs’ or ‘ranges’ (for updates).

  • val (float) – Value of the row attribute defining either lo_bnd or up_bnd of the row (the type checked while processing the MPS section).