We are interested in solving linear, or matrix, systems of arbitrary shape and size,

(1) A*x= b

where the solution obtained from a naïve solution method is badly behaved.

This bad behavior can happen for various reasons, such as duplicated data, but the interesting case is when the model matrix, A, has a poor condition number (I.e., some of the singular values of A drop to small values), and the right-hand side has substantial observation error. This situation comes up commonly in “inverse problems”. Inverse problems are those in which we are trying to “back up” a natural process to an earlier time, or across some distance in space, or both. For example, in solving the “inverse heat problem” we are trying to determine what the heat map of some device was at an earlier time.

For the following example, the matrix, A, will contain a
simple heat diffusion “model” (details are not important) of size 13 rows by 17
columns, and x_{t} will contain the “true”
original heat profile, which will be a simple step function. Then

(2)
b_{t} = A*x_{t} will
contain the ideal, or “true”, observations, and

(3)
b = b_{t} + e
will be the more realistic observations with random error added.

Then when (1) is solved for x, we hope it be usefully close
to the original, x_{t} .

Here is a graph of the ideal, or true x_{t}:

And here is a graph of b, which contains realistic error.

Here is an example solution to (1), using the Pseudoinverse
of A: x = A^{+}b.

Since that is a horrible, useless result, we will apply our methods to the solution process. We will try to explain our method now.

The natural way of solving any hard linear system is to start with the widely available Singular Value Decomposition:

(4)
A = U*S*V^{T}

Where U*U^{T} = I, U^{T}*U=I, V*V^{T}
=I, V^{T}*V=I, and S is a diagonal matrix of decreasing “singular
values”. We will ignore the minor aggravation of whether U and V and the full,
square matrices of theory, or a reduced shape which takes advantage of the
chance to save space by reducing the sizes of U and V when there are singular
values which are zero. Those matters are only of concern to the code developer.

We solve the system (1) as follows:

(5) A * x = b

(6)
U * S * V^{T
}* x = b

(7)
S * V^{T }* x = U^{T }* b

(8)
V^{T} *^{ }x = S^{+} * U^{T }*
b = Picard Condition Vector

(9)
x = V * S^{+} * U^{T }* b

Here S^{+} is the pseudo-inverse of S, meaning that
0 or negligible values of S[i] are treated instead as
being infinity (∞).

This “pseudoinverse” approach only helps when the difficulty
is the presence of one or more “simple” singularities due to such problems as
duplicated rows. Sometimes that is enough. In statistical contexts the matrix V
* S^{+} * U^{T}_{ }is referred to as the “pseudo-inverse
of A”.

The next step in trying to solve (1) is to set MORE elements
of S to zero (or ∞, whichever way you want to think of it).
For example, if the average accuracy of the elements of A is, say, 4
significant digits, then one might set all the singular values at the end of S
to zero that add together to less than 0.0001 times the average element of A,
or 0.0001 times S[1]. But this doesn’t really help in general because the
problem is usually with the error in b, not the error in A.

For our purposes zero singular values are not a difficulty… we always treat them as the pseudoinverse
does. The more important case is SMALL singular values.

Papers have been written for over thirty years on how to
automatically resolve the problem of excessively small singular values, or
“ill-conditioning” in linear equation systems. These problems are characterized
as having wildly unacceptable solutions, especially when the solution is
expected to be non-negative. And, even irrelevantly
small changes to the observed data, b, can cause the solution to be wildly
different yet, and still unacceptable.

Our interest in dealing with this ill-conditioning is in the
right hand side of (8), which we have labelled “PCV”
for Picard Condition Vector. Picard’s condition states that the elements of the
PCV must converge toward zero for the
system to be solvable stably. When the system (1) is ill-conditioned, that fact
will show here by the PCV growing inappropriately in its latter elements. This
happens because the elements of U^{T }*
b drop to zero slower that the singular values, S[i],
drop to zero. In fact, what happens is that the S[i]
march steadily toward zero, as that is the nature of the singular values in these case, whereas the latter elements of U^{T }* b
remain at a significantly high level due to noise in A and/or b. Here is an
example PCV for the small inverse heat problem we mentioned.

Obviously there is a real problem here! Next we zoom in to see the first few values:

All appears to behave well with the PCV for the first half
dozen elements. Then something wrong happens. Below is a spread sheet of the
singular values and the elements of U^{T }* b. The PCV is computed by
dividing the elements of U^{T }* b
in the second column by the singular values in the first column. That
quotient is in the third column. You can see that the singular values are
marching nicely to zero. But the U^{T }* b values seem to plateau at
random plus and minus values a little over 0.0001 in magnitude. Since the
latter values in U^{T }* b should – by Picard – have been two or three
orders of magnitude smaller than they are,
we can attribute maybe 99% of their magnitude to noise in A and b. In
fact, since U (and U^{T}) have
all ones for their singular values, the latter element of U^{T }* b are irrelevantly transformed noise samples of b. So we can take the “latter values” of U^{T }* b to
be of the same magnitude as the original error in b.

NOW… the crux of our situation: if we knew an algorithm to
automatically tell WHERE to start in the second column
we could automatically estimate the root-mean-square, of the error in b – it is
the same as the RMS of these latter values in U^{T }* b. That would be
very helpful because it would let us then determine how much to regularize the
solution process (more on this later). You will notice that the choice of where
to start counting the second column estimates as error samples is not
particularly critical on the high side: we consider rows 6 to 13 to be error
samples; or with little change, consider rows 7 to 13 to be error samples; and
in this problem we could also use 8 to 13, and even further. But often we will
not have to luxury of so many rows of this table to work with. Many cases of
small systems will have only a couple of error samples: so
they are precious and must not be wasted. On the other hand, estimating the
usable rows as 4 or 5 instead of 6 or 7 could be catastrophic to the solution
as that choice would unjustifiably increase the RMS of the “error samples”.
This would cause us to over-regularize the problem.

So you may be asking, if the “usable rank” of this problem is clearly around 5 to 7, why not just try setting all singular values past 5, 6, and 7 to zero and select one of those answers? For two reasons: we need an AUTOMATIC process. How would we choose between those three automatically? That is a difficult matter. And second, NONE of those three may be very good. We can do better than that, and automatically. (However, for the record, in this particular example zeroing singular value past rank 5 produces a useful answer almost as good as autosol’s result. This is equivalent to using a Generalized Inverse with a Singular Value cutoff of 0.004. But using a cutoff of 0.005 would product a bad solution.)

Before talking about our algorithm for determining “usable rank”, I want to emphasize that the above example is unusually clear. The PCV can vary drastically. I have seen it be roughly level for a few values, then steadily rise. It can oscillate between near zero and very positive and negative large values. It is very unpredictable. So our methods must be aware of the complexity and pitfalls.

We have developed for autosol two
heuristic algorithms for choosing that point in the PCV at which noise starts
dominating the elements of U^{T}*b. We have tuned the parameters with
many problems for many years. The first algorithm, called splita
in the code, uses a pair of weighted moving averages. One lazily follows
successive values of the PCV which are **larger** than its current value.
The second lazily follows successive values of the PCV which are **smaller**
than its current value, but also VERY lazily follows larger values of the PCV.
When the first average becomes substantially larger than the second average
then autosol declares that point to be the end of any
possible useful data.

The second algorithm, called splitb
in the code, re-analyses the PCV data, up to the upper rank limit determined by
splita, using a fixed width, unweighted average of
the SQUARES of the elements of the PCV. This allows a closer look at when the
values get too large. The fixed width that is chosen depends on problem size.
We look carefully at these sums and stop when one successive moving sum is
substantially larger than the previous sum. This result from splitb becomes what we call the “usable rank”, and values
beyond that in U^{T}*b are taken to be samples of the error in the
original RHS, b.

So how do we use these noise values to find a solution? Simply: the RMS (root-mean-square) of them becomes the parameter SIGMA that we use to proceed with the solution. This process of ”Tikhonov regularization” can be presented several ways, but this straightforward method seems adequate:

To the original A*x=b, we add equations asking the x values to be zero. That is, we add

(10)x_{i} = 0.0 , for all indices i

But we weight these extra equations by a small value which is usually called lambda, or λ.

(11)λ x_{i} = 0.0 , for all indices i

We write this enhanced system in matrix form like this:

(12) A * x = b

(13) λ * I * x = 0 ( I is an appropriately sized identity matrix)

If λ is zero, there is no change to x. But the larger lambda
is, the more we will force the computed vector, x, from its nominal
pseudoinverse result toward a solution of all zeros. We choose which x we want with the
“discrepancy method”: that is, we search for a value of λ for which the
resulting solution, x(λ), satisfies this (where x(0) is the pseudoinverse
solution, for which λ is not present):

(14)
RMS( x(λ) -x(0) ) = sigma

That is, we, or rather, autosol asks that
the new, auto-regularized solution deviate from the unregularized solution by
exactly SIGMA on average (in RMS sense).

And that does it. The “Gordian knot” has been untied. This is the
resulting auto-regularized solution for the inverse heat problem above:

Further, if the user so asks, we append a non-negativity algorithm
to the above solution process. This is similar to the
classic book, “Solving Least Squares Problems” by Lawson & Hanson. Basically, we pick a negative result, and
zero its column. Then we re-solve. That value will now be zero, and the
solution will have adapted to that change. Then we pick another to zero out.
The question is, what ORDER should these be picked? The original algorithm has
some difficulties and has been modified many ways. Our choice is to force the most
negative x_{i }to become zero. We iterate on this until there are no
non-roundoff negative solution values. This is the result with the nonnegative
option:

Note that the negative values didn’t just disappear: the solution
adjusted better represent the flat region in the first 10 samples. This may not
look like a great solution, but in most applications involving these
ill-conditioned systems, this is a fabulous automatic solution.

Finally, some of our packages include a further routine which
allows you more control over the shape of the solution. In the Python package
we include Arlshape() which allows the user to state
three constraints, in any combination:

1.
Non-negativity

Or not

2.
Non-decreasing (that is, increasing or flat
between every successive pair of solution values)

Or Non-increasing (that is, decreasing or flat
between every successive pair of solution values)

Or neither

3.
Concave up (or straight)

Or Concave down (or straight)

Or neither

When we ask for Non-negative and Non-increasing,
we get the following wonderful solution, in red, overlaying the ideal solution
in blue.

Return to Our Main Page.