A
Comprehensive Introduction to Our Methods and Software for
Solving IllConditioned Systems of
Linear Equations
Original material by Rondall Jones, 2020 (see rejtrix.net )
Presentation by Carl
Kleffner, 2021
(Using Python for examples when specifics are
needed.)
1. A typical problem without difficulties
1*x + 1*y = 2
1*x  1*y = 0
The answer is x=1, y=1.
2. Missing Variable.
1*x + 0*y = 1
2*x + 0*y = 2
Our programs return x=1, y=0. This is the
minimum norm solution.
3. Effectively missing equation.
1*x + 1*y = 2
0*x + 0*y = 0
Our programs return x=1, y=1. This is the
minimum norm solution.
4. Dependent but consistent equations.
1*x + 1*y = 2
2*x + 2*y = 4
Our programs return x=1, y=1. This is the
minimum norm solution.
5. Dependent and inconsistent
equations.
1*x + 1*y = 2
2*x + 2*y = 6
Our programs recognize the problem as illconditioned,
apply an automatic regularization method and give x=1.2625,
y=1.2625. This is reasonable behavior.
6. Independent, consistent, but
unreasonable equations.
1*x + 1.00*y = 2
1*x + 1.01*y = 3
A simple solution returns x=98, y=101, which is probably NOT what
the user expected.
Our programs recognize the problem as illconditioned, apply an
automatic regularization method and return x=1.1165, y=1.1334. This
is the sort of behavior which our programs have been specifically designed to
do.
7. An underdetermined system.
1x + 2y = 2
Our programs return the minimumnorm solution, x=0.4,
y=0.8. This is actually an easy problem.
8. An overdetermined system
1*x + 2*y = 15.1
2*x + 2*y = 15.9
1*x + 1*y = 6.5
Our programs return x=0.7276. y=7.2069. When
you substitute these values into the equations, the resulting right side values
are: 15.141, 15.869, and 6.479, which is excellent!
These are all illustrative, or “toy”
problems. Our programs usually are used to solve much more involved
problems!
Introduction. The
Discrete Picard Condition was probably popularized most by Per Christian
Hansen. For example, see The discrete picard condition for discrete
illposed problems in BIT, Volumne 30, Number 4. Here is the
abstract for that paper:
We investigate
the approximation properties of regularized solutions to discrete illposed
least squares problems. A necessary condition for obtaining good
regularized solutions is that the Fourier coefficients of the righthand
side, when expressed in terms of the generalized SVD associated with the
regularization problem, on the average decay to zero faster than the
generalized singular values. This is the discrete Picard condition. We
illustrate the importance of this condition theoretically as well as
experimentally.
This topic has
also been discussed in sources such as the book “Regularization of Inverse Problems”
by Heinz W. Engl et al, in Theorem 2.8. (Note: the need for Fourier
coefficients to converge has been understood for many years. We used this
method in the 1980’s before the popularization of the concept, and it was well
known to any advanced mathematics student far before that. But it was not
widely discussed before the popularization by Hansen.)
The impact of
this statement is significant for anyone attempting to regularize an
illconditioned matrix system. But it is not obscure. If we are solving
Ax = b
by using the SVD then we have
USV^{T}x = b
or
X= VS^{+}U^{T}b.
(S^{+} is the same as
S^{1} except zero is used when S_{i} is zero.)
Here the Fourier
coefficients are the elements of U^{T}*b. If the Picard condition is
satisfied then the elements of U^{T}b decrease faster than
the elements of S, so the elements of the “quotient” vector S^{+}*U^{T}*b
then decrease. If instead the elements of S^{+}*U^{T}*b
do something else – such as begin to drop then reverse and grow –
then the Picard condition is not satisfied (and viceversa).
Application. The value
of noting this condition is, as the abstract says, that the condition must be
satisfied in order to obtain “good regularized solutions”. Why is this
condition often not satisfied? Usually because the elements of U^{T}*b
do not continue to decrease because random errors in b prevent the numeric
cancellation that should occur to make the element get smaller. For example, an
illconditioned set of equations might produce a U^{T}b vector
that decreases to a certain level, say about 0.001, then just randomly varies
around that level. On the other hand, the elements of S my decrease straight to
zero, or to roundoff level (maybe 10^{14}).
So what must be
done if the Picard condition is not satisfied, so we can get a usable solution?
We must change the matrix, A, or the right side vector, b, or both, in some way.
For example, Tikhonov regularization can be viewed as modifying A (by
increasing the small singular values) or as modifying b. (To view Tikhonov as a
modification to b, just compute the usual regularized solution, x_{r},
then compute the corresponding right hand side Ax_{r}, and compare
this to the original b.)
In other words,
we must change either the model, or the observed data, or both.
Considerations. Note that
Picard condition does not say that the system cannot be regularized to produce
a useful solution; rather it says that an illconditioned system must be
regularized (some way or another) if a useful solution is to be obtained. A
solution based on a vector S^{+}*U^{T}*b which does not
decrease is generally not useful.
We use this
Picard condition in our AUTOREG automatic regularization scheme. We have developed
heuristics which we use to decide exactly when to declare a system of equations
as clearly illconditioned, and to decide a “usable rank”. This decision is
harder than you may expect, because the vector S^{+}U^{T}b seldom
smoothly decreases, even when the system is well conditioned. So we have to
look instead at a moving average of the values in S^{+}U^{T}b (or
their squares) and look for a minimum, followed by significant growth in the
moving average. The point where the average grows above the minimum by a
certain factor helps locate the point where error is evidently
dominating U^{T}b. Thus a “usable rank” is determined, and an
error estimate for the right side vector can be made, and this facilitates
Tikhonov regularization. See details and other
documentation in this web site for more information.
Other details. We do not
mean to imply that there are no other issues of interest here. There are. For
example, note that we have no information yet on how closely a regularized
solution may be to the “true” solution. Yet this may be very important to know
in some situations. That is a topic for further study.
I have a small garden that gets a lot attention. I was concerned
about proper fertilization, as it seemed that various commercial fertilizers
used very different formulas for their products. So I studied the subject a bit
and came up with this set of preferences for what a garden fertilizer should
contain for each element, by pounds, for 250 square feet:
3.0 
N 
2.5 
P 
0.0 
K 
2.3 
Ca 
8.0 
S 
0.2 
Fe 
0.5 
Mg 
Now, my problem was that I didn’t have access to individual chemicals,
such as plan iron (Fe). What I had was various bags of commercial fertilizer
with various compounds in various amounts. Here are photos of couple of them:
Note the percentages of each element (by weight). They
have conveniently given percentages of each ELEMENT, not just per chemical,
like iron sulfate. What I need is some combination of my onhand fertilizers
that would give me something close to the proportions I need. Here is a table
of the things I had on hand when I was trying to figure this all out, with the
stated amounts if, any, of Nitrogen(N), Phosphorus (P), Potassium (K), Calcium
(Ca), Sulfur(S), Iron(Fe) and Magnesium (Mg) (see rightmost column).
Gypsum 
Sulph90% 
HypoN 
3xP 
AceStarter 
Ironite 
Copperas 
Epsom 
Mur_Of_K 
MiraGro1 
MiraGro2 
Plantone 

0 
0 
0.21 
0 
0.2 
0.01 
0 
0 
0 
0.1 
0.24 
0.05 
N 
0 
0 
0 
0.45 
0.27 
0 
0 
0 
0 
0.1 
0.08 
0.03 
P 
0 
0 
0 
0 
0.05 
0.01 
0 
0 
0.6 
0.1 
0.16 
0.03 
K 
0.21 
0 
0 
0 
0 
0.12 
0 
0 
0 
0 
0 
0.03 
Ca 
0.16 
0.9 
0 
0 
0 
0.1 
0.11 
0.27 
0 
0.2 
0 
0.01 
S 
0 
0 
0 
0 
0.01 
0.045 
0.19 
0 
0 
0 
0 
0 
Fe 
0 
0 
0 
0 
0 
0.1 
0 
0.2 
0 
0 
0 
0.01 
Mg 
So here we have a nice linear algebra problem: the grid of numbers
above (without the labels) is the matrix, and the red column in the earlier
table is the desired right hand side. Solving this problem will give us the
amount of each of these commercial products that we should use to achieve
exactly the amount of each element I need… a proper balance of seven elements,
not just two or three.
Note that it turns out this is NOT an illconditioned problem for
which we need regularization. But there can still be problems. Here is the
solution I got from autorej (presented as a column vector, as is traditional):
8.368921 
6.140388 
3.679031 
1.521761 
4.197724 
4.043942 
0.12608 
0.382606 
2.07644 
3.108546 
3.921425 
1.908453 
So we do have a problem. There is no way possible to add a
NEGATIVE amount of the 7^{th} and 9^{th} fertilizers.
How should we solve this situation? We could be more careful and delete just
the most negative fertilizer. But let’s try just deleting those two
fertilizers, leaving 10 columns. Any reasonable solver will give this answer:
9.066108 
6.988927 
10.1148 
2.198512 
6.467905 
3.007132 
0.937665 
1.53124 
1.47252 
1.17538 
So now we have TWO MORE fertilizers that need to be deleted. This
is a common problem in real life. The proportions of the elements that I want
are not easily obtained from the products on my shelf. But autorej has a
nonnegative solution.
Let’s
let it worry about the algorithmic issues. Here is its solution:
10.95238 
6.063144 
14.28571 
5.555556 
0 
0 
1.052632 
2.500000 
0 
0 
0 
0 
For a few years now I have used this or a similar combination…
depending on what new products I have, or what products I had to delete due to
being out of it. The vegetable garden has done well with it. And the grass has
done fantastic. But this particular formula is not right for everything: fruit
trees should not have this much Nitrogen. So I have to reformulate a bit to cut
the Nitrogen down when making fertilizer for my small orchard (20 modest
trees).
For about a year I interacted with a very interesting engineer
living in India. I will call him C.H. He was working with well logging mostly.
This means that instruments are lowered down an oil well which is being
drilled, for the purpose of detecting properties of the surrounding earth. The
instrument package emits various probing signals, and the surrounding earth
responds in some fashion, depending on the nature of the probing signal:
various wavelengths of radiation, and presumably certain radioactive materials,
and I am not sure what else. Plus, I assume it measures easy to measure things
such as temperature.
I am not sure exactly what the emitters and sensors were, because
it didn’t matter to me. I was just trying to help him solve the difficult
equation systems that resulted. One reason they were difficult is that this
arrangement is in part a classic indirect sensing problem. Indirect sensing
problems are often illconditioned due to the nature of trying reverse nature.
In this case signals are emitted, which are basically a “diffusive” process.
The signals spread out, and the observer can only measure certain effects of
that diffusive process. Reversing nature’s stable processes such as heat flow
is generally difficult, inaccurate, and unstable. Trying to calculate an
earlier heat pattern in a material from an observed later, diffused heat patten
is perhaps the most iconic example of such calculations, and inverse heat
problems are perfect example problems of illconditioned systems. So they are
used in testing algorithms like autorej uses.
Here is one of his matrix problems (a.k.a. linear algebra systems)
that we worked on). The last column is the observations.
56.22 
1070 
1.95 
4.78 
0.015 
303.44 
28.8 
8.4 
8.57566 
3.91 
4.08 
1.22 
2.64 
0.25 
4.51 
3.2 
2.77 
1.85502 
0 
0 
0 
0 
0 
4000 
180 
250 
19.5 
0 
0 
0 
0.001 
0 
2000 
8 
35 
4.75051 
0 
0.001 
0 
0.001 
0 
300 
17 
8 
1.53051 
0 
0 
0 
0.001 
0 
0 
0.1 
6 
0.065506 
Here
is the Picard Condition Vector for this problem (see the Tutorial for more on
the PCV):
[0.00448 0.00669 0.04127 0.03892 0.34695 0.37723]
This is not a bad PCV at all, but it is a little odd as
the first several values are very small, and the first, which is often the
largest, is in fact the smallest. But I have seen this behavior in other
contexts as well. Here is the solution (that would come any reasonable equation
solver). I will show it graphically because the graphs convey the situation
well here.
Kind of an interesting graph. The problem is that this is not at
all what C.H. expected. The scale of the vertical axis is also too small.
Discussing this, I realized that it was known that the solution values should
add to 1. Autorej’s solve_constrained() accepts one constraint
equation that is to be obeyed exactly: not as a normal least
squares equation. To ask the sum to be 1.0 we set all the coefficients to 1,
creating a sum, and set the right hand side value to 1, which is the sum we
want.
[
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ], with b=1.0
Here
is the result:
Wow! What a difference! Almost shockingly different… and much more
reasonable and as expected. So be sure and ask yourself, if you are having
difficulty getting a good solution, if you have included all the data you have.
I should mention to you that the algorithms in autorej could be
much more general than provided here, but we have chosen to include only the
most common features. For example, my C++ library allows any number of such
“equality” constraints, and any number of inequality constraints, such as
equations that ask a particular value to be no less than 10.0, for example, or
for the sum to be not over 5, etc.
Our house was over 25 years old when the (exterior) stucco
seriously needed some patching and a new color coat. Unfortunately, the
builder, who was long gone, had used his private, justright blend of standard
stucco colors, and we could not determine what that mix was. So I decided to try to match the color myself. Looking
back, I wish we had just picked the closest match of the standard colors, as
when you coat a whole house with a color it likely will look different than you
imagined from looking at tiny samples. But I was determined to do it. So I photographed the appropriate page of the booklet
of standard colors in El Rey’s sampler of 75 colors. Then I took a photograph
of a typical section of our stucco wall. I took all these photos at the same
time and place with the same camera and the same lighting. This was done by
taping the sample color page to the wall and taking the photos at the same
time. Here is a recreation of the four sample colors I decided to work with,
with the last being the current stucco color I want to find a new mix for.
LUZ GRAY CORAL ADOBE desired
I know some of these look very
similar. But if you could see it on a whole wall there would be visible
differences.
Now, how do
we turn this into some linear equations to solve? Using PaintShop it is easy to point the cursor at a color
and see the RedGreenBlue values for that color. Each color is usually
represented by a single 8bit byte, so each color has a range from 0 to 255. So
(0,0,0) for (R,G,B) would be black, and (255,255,255) would be the brightest
white.
Here are the
RBG values for the four samples, plus the color I need to find a mix for. (By
the way, you can insert comment lines in autorej’s input
file by putting the hashtag symbol (#) as the first character.)
#Luz 
Gray 
Coral 
Adobe 
Desired 
123 
190 
165 
116 
135 
84 
190 
136 
77 
92 
70 
190 
132 
59 
77 
Want we want
is the weightings for the first four colors that will reproduce the DESIRED RGB
values in the last column. So there is the
linear algebra problem to solve! We just put the bottom four lines in a
tabseparated file and use it as input to autorej’s P.read_problem() function. Here is the result (as a row)
from P.solve():
[ 0.44108058 0.11496714 0.32839188 0.41729469
]
Oops! We forgot to tell autorej that the solution values must be nonnegative!
Try again with P.solv_nonneg(). Here is its
answer:
[
1.09399306 0. 0.00225098 0. ]
When we look
at the table above we can see what is
happening. The desired elements of column 5 are almost exactly 10%
more (that is, a bit brighter) than the corresponding elements of Luz
in column 1. Solve_nonneg() figured out that the
best way to do that adjustment is to add a little bit of the
substantially lighter Coral in column three to column one. That is, 0.00225
bags of column three to 1.09 bags of Luz.( Of course for realworld convenience
the largest number here should be scaled to 1 bag, giving 1 bag of Luz to
.00206 bags of Coral. )
Note that the
RGB value obtained by our solver is precisely (134.9, 92.2, 76.9) which rounds
to exactly the color what we wanted.
In the interest
of full disclosure, I should tell you that when creating this example I couldn’t find the final solution that I used
for actually stuccoing nearly 15 years ago, so I recreated the data here to
make an appropriate example, with the same number of “input” colors and roughly
the same hues.
In the actual
case the amount of the dark color I used (Santa Fe Brown, not Luz) in the
computed solution was close to half of the amount of Gray. But not wanting to
cause the workers extra trouble, and possibly have successive wheelbarrow mixes
vary due to their likely inexact measurements, I just told the workers to mix
Santa Fe Brown and Gray equally. What a stupid mistake. The result was much
darker than I actually wanted. It was not
practical to fix, but fortunately the sun bleached it a little over the years so it still looks OK. The math was good, but
I didn’t have the conviction to insist on an
inconvenient mix. I hope I learned that lesson.
For years before I became interested in Python I
had a web site offering somewhat similar software for C++ as autorej is for Python. One day a fellow from Finland
contacted me and asked for the source code. (Sometimes I offered the code, but
most times I offered apps that used the code.) I of course emailed it to him,
but I asked casually what his application was. It said playing MineSweeper! I pointed out that if he got answers that were
between 0 and 1 for a hidden square that probably he should consider those to
be probabilities. He assured me he was only interested sure things: 0 for no
bomb in a square, 1 for a bomb.
So when prepping these examples for Python and users (and anyone
else who is interested), I decided to try my hand at solving Minesweeper
with linear equations, as I had never bothered to ask the customer HOW exactly
he did that. I guess I figured it was straightforward. Let’s take
this little segment which I choose from a must larger problem because it is a
legitimate standalone problem in itself except for the three bottom squares in
the rightmost column which we will not be able to determine.
So what are the equations? Let’s refer to the elements of the above grid as we
would elements of a matrix that size, e.g., A_{11}, A_{12}, etc.
And call the NUMBER which is (or should be) in that square #A_{11},
etc. Now first, the unknowns for our problem are the values in the blank,
unexploded squares. For example, we want to know how many bombs (aero or one)
are at A_{38}.After some thought I realized there are two types:
1. Each
square with a bomb, such as square (2,2), is a simple equation, e.g., #A_{22}=1.0
2. Each
number which is next (by left, right, up, down, or diagonal) to an unknown
square also gives us an equation, e.g., from A_{23} we get
#A_{22}
+ #A_{32} + #A_{33} = 2
So we have to go create all those
equations. Here is my is my matrix file as
read by autorej’s read_problem()
method (except for the last column which was helpful during building the file
but cannot be present in the input file). Also note the comment lines marked
with “#”.
Now….. before trying to solve this
problem to find the numbers in the blank squares, are at least SOME of them in
this first try, we should ask ourselves exactly WHAT we expect to see. Do we
have any right to expect to get back ONLY INTEGERS? Actually,
we would like that, but we have not inserted any constraints into
the problem that would even encourage that, much less require it. The
calculation process will begin by computing the Singular Value Decomposition of
our matrix. Here are the singular values for this matrix:
[3.66882986e+00
3.29958015e+00 2.74052834e+00
2.44205520e+00
2.33557658e+00
1.85017432e+00 1.51369996e+00
1.28268753e+00
1.15207922e+00
9.33783720e01 6.28601369e01
5.72988469e01
4.98618121e01
4.38608280e01 2.59847585e16
1.82895960e16
1.16177952e16
3.02411221e17 1.69841264e33
0.00000000e+00]
No hint here of any likelihood for integer answer!! When sending
the C++ code to my customer I pointed out that if he got answers that were
between 0 and 1 for a hidden square that probably he should consider those to
be probabilities. He assured me he was only interested sure things: 0 for no
bomb in a square, 1 for a bomb.
Here
are the answers that Autorej normal solver
gives:
[4.04451282e16
1.00000000e+00 1.03922202e15 1.51275221e02
1.00000000e+00 1.00000000e+00
4.05824099e16 4.41513009e03
1.36256212e15
2.74346447e01 2.74346447e01 1.32495853e15
2.74346447e01 8.62826776e01 8.62826776e01 0.00000000e+00]
Most of the answers are indeed VERY CLOSE to zero or 1. But the 4^{th} and
10^{th} are about 0.2. What does that mean? It
means we should try autorej’s Nonnegative
solver. Here is the answer that that gives:
[1. 1.
0. 0. 0.
1.
0.
0. 1. 1.
0. 0.
0.
0. 0. 0.
0. 1.02359314 0.97640686
0. ]
Now that’s pretty good!! Even the two that are
slightly off of one are very near to one.
Terrific!
If we
go into the problem and mark the new squares that are now solved with a
red symbol we have this:
All the remaining blank squares are actually
empty. But I wasn’t actually playing the game by this point so I was not able to click them to show they are
empty. Our problem setup had no information about certain squares, such as the
last three in the last column.
Arls.py is a complete package for AUTOMATICALLY solving difficult
linear algebraic systems (or “matrix problems”). Basically, these methods
completely free the user who has a difficult, usually illconditioned, problem
from worrying about :
· choosing
a rank to use
· choosing
a level of singular value to zero when using a Generalized Inverse
· estimating
the error in the matrix coefficients
· estimating
the error in the righthand side
· choosing
the Tikhonov regularization parameter, lambda
Arls
for Python is available on the Python Package Index, PyPI.
You
should be able to install it on you Python installation with this command:
pip install arls
How to Use arls. We
assume you have some equations to solve, and can use Python. Put
the coefficients in rows of matrix A, and put the right hand side values in a
column, b. It does not matter whether the number of rows equals the number of
unknowns, or is larger or smaller than the number of unknowns, as arls’s routines handle all
cases. Singularity or illconditioning are expected and considered
normal behavior by these methods.
· If
the number of rows is less than or equal to the number of unknowns and the data
is consistent and accurate, then the solution will typically be exact except
for roundoff error, as any good standard linear equation solver would do.
· If
the number of rows is larger than the number of unknowns, or the data is
significantly in error, then a best fit, or “leastsquares” solution will be
found.
· If
the system of equations had serious problems such as unintended dependency
between rows, or “singularities”, or illconditioning, then arls will usually be able to address those issues
automatically and still return a reasonable solution. This is the interesting
case for arls, and the reason for its
existence. Arls was developed to
automatically solve socalled “illconditioned” systems which often otherwise
require special handling or are considered impossible to solve. Of course,
when arls’ algorithms produce a solution of such
a difficult problem the residual error (that is, the values in Axb, or the
single value, norm(Axb)) will of necessity be larger than what the residual
would have been if a standard leastsquares solver had been able to handle the
problem. So always check the result for reasonableness and appropriateness to
your technical needs.
Case
1. Once you have
created your matrix, A, and the corresponding righthandside, b, we have the
problem
A x = b
To
solve it, import arls’s solvers and
call arls:
from arls import arls, arlseq, arlsgt, arlsnn, arlsall
x = arls1(A,b)[0]
Case 2. If
your system of equations results in some negative values, and that is not
acceptable, then just call our nonnegative solver:
from arls import arls, arlseq, arlsgt, arlsnn, arlsall
x = arlsnn(A,b)[0]
Case 3. If
you have a special situation, such as needing the solution to add to, say,
100.0 percent, or the first value to start at exactly 0., or any such exact requirement
(not just more regular “leastsquares” equations) then put those requirements
as rows in a separate set of equations, Ex = f and then solve the dual system,
by calling our “equality constraint” solver:
from arls import arls, arlseq, arlsgt, arlsnn, arlsall
x = arlseq(A,b,E,f)[0]
For
example, if there are 5 unknowns, to require the solution to add to 100, use
this:
E = [
1, 1, 1, 1, 1]
f =
[100] (or,
just use f = 100. Arls will convert the
number to a vector of length 1.)
Note: Depending on what you put in Ex = f, it may be impossible to
satisfy all the equations. For example, you might accidentally say x[0] = 1.,
and also x[0] = 2. In such cases arlseq() will
attempt to retain the equations that affect the result most. (The exact
algorithm isn’t worth explaining here.) You can check the status of the
equality constraints by examining the residual:
R =
E*x – f (where
x is the solution returned by arlseq() )
Normally all elements of R will be zero. If R[3], for example, is
nonzero, then equation 3 will have a nonzero value. You should try
to resolve what is going on with these constraints.
Case
4. If you have some special need for certain unknowns
to satisfy given bounds, then put those requirements in a separate set of
equations, Gx >= h. Be sure to express them all in “greater than” form, such
as x[3]>1.0. If you need to use a “less than” equation, such as x[0] <
2.0, then multiply the equation on both sides by 1 and you will get a “greater
than” constraint instead of a “less than” constraint”:
x[0]
>= 2.0.
Note
that using this technique you can bound a value from below and above, such as
X[3]
>= 2
x[3]
<=
4 (
a.k.a. x[3] >= 4
)
If we
assume there are, say, 5 unknowns, then these two constraints can be
represented by
G =
[ [0, 0, 0, 1,
0], (remember
Python indexing starts at 0, not 1)
[0, 0, 0, 1, 0] ] (you
have to negate “less than” equations)
h = [
2, 4 ]
Then
you can solve this combined system by calling:
from arls import arls, arlseq, arlsgt, arlsnn, arlaall
x = arlsgt(A,b,G,h)[0]
Case 5. Lastly,
you can combine all the above features:
from arls import arls, arlseq, arlsgt, arlsnn, arlsall
x = arlsall(A,b,E,f,G,h)[0]
Extra Information: In
each case above you can ask for extra information to be returned, like this:
x, nr, ur,
sigma, lambda = arl1s(A,b)
In
this case the extra parameters have these meanings when b contains only one
column:
nr : (int) is the numerical rank of the
matrix A. This measure of rank ignores “singular values” that are near zero.
ur :
(int) is the computed “usable rank”, which is a function of both A and b, and
tells the solver at what point the error in A and b combine to ruin the use of
more than ur rows of the orthogonalized
version of the system of equations. In other words, ur is
the breakpoint in the orthogonalized version of the system of equation between
rows in which the error in b is acceptably small, and rows in which the error
in b is uselessly large.
msigma :
(float) is the estimated estimated righthandside
rootmeansquare error. This means that our analysis reveals the actual error
in b – a unique feature!
mlambda :
(float) is the Tikhonov regularization parameter that goes with the estimated
right hand side error, sigma.
For the matrix, A, I computed an 11
by 11 sample of an inverse heat equation. This is a classic example of what are
often called “inverse problems”, which tend to be illconditioned. They often
involve attempting to reverse some natural process such as bluring or, in this
case, heat dissipation. For a “true” solution, xt, I picked a cusp of a sine
function, with a small negative value added, so I can illustrate nonnegative
solvers. That “true” solution is here: chosen "solution" .
Then I computed the “true” right hand
side, b = A * xt.
Then to make an interesting problem I
added to b random noise of roughly Gaussian(0.0, 0.00005). For some of the
problems, for realism, I further added an “accidental” error to b[0]
of 0.01.
The result for the right hand side is
[2.97863221 3.55482559 4.06729936 4.47735646
4.74273922 4.83465317
4.74282974 4.47743033
4.06726914 3.55472706 2.98858201]
The solution by lstsq(), which is NOT
appropriate solver for this problem is here.
Note that the plot scale is actually
1.0E7. Also note that lstsq() considers this to be an OK solution. It has no
concept that the problem is illconditioned. It only diagnoses the result as
wrong when the system of equations is fully singular. This is why one cannot
use a standard solver and accept the solution as OK until it reports a
singularity. Instead, we want to know when the solution has worthless character
like this and REGULARIZE the solution process.
The solution by lsmr(), which often
produces beautiful results, happened to turn out poorly.
The solution by arls(), which is very
nice in this case, is here.
For the record, the nonnegative
solution by arlsnn(), which is here.
Next we remove the intentional glitch
of 0.01 that we added to b[0], and resolve with lsmr(). The result
is much better… though not one of the
beautiful results than it sometimes does. And here is what arls() does
with this glitch removed. Obviously
the result by arls() is not greatly changed by a minor change in the right hand
side error. Finally we also reduced the error in the right hand side to roughly
Gaussian(0.0, 0.00002).
Lsmr() now produces a nice solution like arls().
Summary: lsmr() can produce some
beautiful results, but only sometimes. Arls() seems to more reliably produce a
useful solution.
We are interested in solving linear,
or matrix, system of arbitrary shape and size,
(1) A*x= b
Where the solution obtained from naïve method is badly behaved.
This happens for various reasons, such as duplicated data, or other reasons for
singularity. But the interesting case is when it happens because the right hand side has more data error than the system can
tolerate and get a good answer. This comes up commonly in “inverse problems”,
meaning problem in which we are trying to back up diffusive process in time.
For example, in solving the “inverse heat problem” we are trying to determine
what the heat profile of some linear arrangement was at an earlier time. Here
is a graph of that ideal original hear profile in this problem:
Here is an example “solution” obtained by naively solving the
13 row by 17 column problem (see data files
accompanying this document):
SVD
The natural way of solving any hard linear system
is to start with the widely available Singular Value Decomposition:
(2) A = U*S*V^{T}
Where U*U^{T} = I, U^{T}*U=I, V*V^{T} =I,
V^{T}*V=I, and S is a
diagonal matrix of decreasing “singular values”. We will ignore the minor
aggravation of whether U and V and the full, square matrices of theory, or a
reduced shape which takes advantage of the chance to save space by reducing the
sizes of U and V when there are singular values which are zero. Those matters
are only of concern to the code developer.
We
solve the system (1) as follows:
(3) A * x = b
(4) U * S * V^{T }* x = b
(5)
S * V^{T }* x = U^{T }*
b
(6) V^{T }*^{ }x = S^{+} *
U^{T }* b = PCV
(7) x = V * S^{+} * U^{T }* b
Here S^{+} is
the pseudoinverse of S, meaning that 0 or negligible values of S[i] are treated instead as being infinity (∞).
This “pseudoinverse” approach only helps when the difficulty is
the presence of one or more “simple” singularities due to such problems as
duplicated rows. Sometimes that is enough. In statistical contexts the matrix V
* S^{+} * U^{T}_{ }is referred to as the
“pseudoinverse of A and seems to be important.
The next step in trying to solve (1) is to set MORE elements of S
to zero (or ∞, whichever way you want to think of
it). For example, if the average accuracy of the elements of A is, say, 4
significant digits, then one might set all the singular values at the end of S
to zero that add together to less than 0.0001 times the average element of A,
or 0.0001 times S[1]. But this doesn’t really
help in general because the problem is usually with the error in b, not the
error in A.
For our purposes zero singular values are not a
difficulty… we always treat them as the pseudoinverse does. The more
important case is SMALL singular values.
Papers have been written for over thirty years on how to
automatically resolve the problem of excessively small singular values, or
“illconditioning” in linear equation systems. These problems are characterized
as having wildly unacceptable solutions, especially when the solution is
expected to be nonnegative. And, even irrelevantly small changes to the
observed data, b, can cause the solution to be wildly different yet, and still
unacceptable.
I studied this problem many years ago as part of my Ph.D. under
the wellknown Dr. Cleve Moler, the inventor of
MATLAB. Many papers have been published since then, with various approaches and
degrees of success, and various kinds of constraints or conditions. I have
presented later versions of my algorithm in C++ at appropriate conferences, and
I also conferred with a Russian mathematician on one of these competitive algorithm. My present algorithm is not very “theoretical”,
and not a very good candidate for
publication, but it is practical, efficient, and adaptable. For example, in my
C++ package I allow any number of both equality constraints and inequality
constraints to be added to the problem. However, there is little general use
for such matters. In the current package we offer only optional one extra
constraint: that the solution be nonnegative. With this perspective, let’s go on:
Our interest in dealing with this illconditioning is in the right hand side of (6), which we have labelled “PCV”
for Picard Condition Vector. Picard’s condition states that the elements of the
PCV must converge toward zero for the system to be solvable stably.
When the system (1) is illconditioned, that fact will show here by the PCV
growing inappropriately in its latter elements. This happens because the
elements of U^{T }* b drop to zero FASTER that the
singular values, S[i], drop to zero. In fact, what
happens is that the S[i] march steadily toward zero,
as that is the nature of the singular values in these case, whereas the latter
elements of U^{T }* b remain at a significantly high level due to
noise in A and/or b. Here is an example PCV for the small inverse heat problem
we mentioned.
Obviously there is a real problem
here! Next we zoom in to see the first few
values:
All appears to behave well with the PCV for the first half dozen
elements. Then something wrong happens. Below is a spread sheet of the singular
values and the elements of U^{T }* b. The PCV is computed by
dividing the elements of U^{T }* b in the second column
by the singular values in the first column. That quotient is in the third
column. You can see that the singular values are marching nicely to zero. But
the U^{T }* b values seem to plateau at random plus and minus
values a little over 0.0001 in magnitude. Since the latter values in U^{T }*
b should – by Picard – have been two or three orders of magnitude smaller than
they are, we can attribute maybe 99% of their magnitude to noise in
A and b. In fact, since U (and U^{T}) have all ones for
their singular values, the latter element of U^{T }* b are
irrelevantly transformed noise samples of b. So we
can take the “latter values” of U^{T }* b to be of the same
magnitude as the original error in b.
NOW… the crux of our situation: if we knew an algorithm to
automatically tell WHERE to start in the second column we could automatically
estimate the rootmeansquare, of the error in b – it is the same as the RMS of
these latter values in U^{T }* b. That would be very helpful because it would let us then determine how
much to regularize the solution process (more on this later). You will notice
that the choice of where to start counting the second column estimates as error
samples is not particularly critical on the high side: we consider rows 6 to 13
to be error samples; or with little change, consider rows 7 to 13 to be error
samples; and in this problem we could also use 8 to 13, and even further. But
often we will not have to luxury of so many rows of this table to work with.
Many cases of small systems will have only a couple of error samples: so they are precious and must not be wasted. On the
other hand, estimating the usable rows as 4 or 5 instead of 6 or 7 could be
catastrophic to the solution as that choice would unjustifiably increase the
RMS of the “error samples”. This would cause us to overregularize the problem.
So you may be asking, if the “usable rank” of this problem is
clearly around 5 to 7, why not just try setting all singular values past 5, 6,
and 7 to zero and select one of those answers? For two reasons: we need an
AUTOMATIC process. How would we choose between those three automatically? That
is a difficult matter. And second, NONE of those three may be very good. We can do better than that, and automatically.
(However, for the record, in this particular example zeroing
singular value past rank 5 produces a useful answer almost as good as autosol’s result. This is equivalent to using a
Generalized Inverse with a Singular Value cutoff of 0.004. But using a cutoff
of 0.005 would product a bad solution.)
Before talking about our algorithm for determining “usable rank”,
I want to emphasize that the above example is unusually clear. The PCV can vary
drastically. I have seen it be roughly level for a few values, then steadily
rise. It can oscillate between near zero and very
positive and negative large values. It is very unpredictable. So our methods must be aware of the complexity and
pitfalls.
We have developed for autosol two
heuristic algorithms for choosing that point in the PCV at which noise starts
dominating the elements of U^{T}*b. We have tuned the parameters with
many problems for many years. The first algorithm, called splita in the code, uses a pair of weighted moving
averages. One lazily follows successive values of the PCV which are larger than
its current value. The second lazily follows successive values of the PCV which
are smaller than its current value, but also VERY lazily
follows larger values of the PCV. When the first average becomes substantially
larger than the second average then autosol declares
that point to be the end of any possible useful data.
The second algorithm, called splitb in
the code, reanalyses the PCV data, up to the upper rank limit determined
by splita, using a fixed width, unweighted
average of the SQUARES of the elements of the PCV. This allows a closer look at
when the values get too large. The fixed width that is chosen depends on
problem size but is never greater than 6. We look carefully at these sums and
stop when one successive moving sum is substantially larger than the previous
sum. This result from splitb becomes what
we call the “usable rank”, and values beyond that in U^{T}*b are taken
to be samples of the error in the original RHS, b.
So how do we use these noise values to find a solution? Simply:
the RMS (rootmeansquare) of them becomes the parameter SIGMA that we use to
proceed with the solution. This process of ”Tikhonov regularization” can be
presented several ways, but this straightforward method seems adequate:
To
the original A*x=b, we add equations asking the x values to be zero. That is,
we add
(8) x_{i} =
0.0 , for all indices i
But
we weight these extra equations by a small value which is usually called
lambda, or λ.
(9) λ x_{i} =
0.0 , for all indices i
We
write this enhanced system in matrix form like this:
(10) A
* x = b
(11) λ * I * x =
0 ( I is an appropriately sized identity
matrix)
If λ is zero, there is no change to x. But the larger lambda
is, the more we will force the computed vector, x, from its nominal
pseudoinverse result toward a solution of all zeros. We choose which
x we want with the “discrepancy method”: that is, we search for a value of
λ for which the resulting solution, x(λ), satisfies this (where x(0)
is the pseudoinverse solution, for which λ is not present):
(12) RMS( x(λ) x(0)
) = sigma
That is, we, or rather, autosol asks
that the new, autoregularized solution deviate from the unregularized solution
by exactly SIGMA on average (in RMS sense).
And
that does it. The “Gordian knot” has been untied.
Finally, if the user so asks, we append a nonnegativity algorithm
to the above solution process. This is as in the classic book, “Solving Least
Squares Problems” by Lawson & Hanson (Hanson being another
member of my dissertation committee) . Basically, we pick a negative
result, and zero its column. Then we resolve. That value will now be zero, and the
solution will have adapted to part to that change. Then you pick another to
zero out. The question is, what ORDER should these be picked? Our choice is to
force the most negative x_{i }to become zero. We
iterate on this until there are no nonroundoff negative solution values.
This
is the resulting autoregularized solution for the inverse heat problem above:
And
this is the result with the nonnegative option:
Note that the negative values didn’t just
disappear: the solution adjusted better represent the flat region in the first
10 samples. This may not look like a great solution, but in most applications
involving these illconditioned systems, this is a fabulous automatic solution.
Rondall
E. Jones rejones7@msn.com