A Comprehensive Introduction to Our Methods and Software for

Solving Ill-Conditioned Systems of Linear Equations

Original material by Rondall Jones, 2020   (see rejtrix.net )

Presentation by Carl Kleffner, 2021

(Using Python for examples when specifics are needed.)

# Even Small Systems of Equations Can be Hard!

1.       A typical problem without difficulties

1*x + 1*y = 2

1*x - 1*y = 0

The answer is x=1, y=1.

2.       Missing Variable.

1*x + 0*y = 1

2*x + 0*y = 2

Our programs return x=1, y=0.  This is the minimum norm solution.

3.       Effectively missing equation.

1*x + 1*y = 2

0*x + 0*y = 0

Our programs return x=1, y=1.  This is the minimum norm solution.

4.       Dependent but consistent equations.

1*x + 1*y = 2

2*x + 2*y = 4

Our programs return x=1, y=1.  This is the minimum norm solution.

5.       Dependent and inconsistent equations.

1*x + 1*y = 2

2*x + 2*y = 6

Our programs recognize the problem as ill-conditioned, apply  an automatic regularization method and give x=1.2625, y=1.2625. This is reasonable behavior.

6.       Independent, consistent, but unreasonable equations.

1*x + 1.00*y = 2

1*x + 1.01*y = 3

A simple solution returns x=-98, y=101, which is probably NOT what the user expected.

Our programs recognize the problem as ill-conditioned, apply an automatic regularization method and return x=1.1165, y=1.1334.  This is the sort of behavior which our programs have been specifically designed to do.

7.       An under-determined system.

1x + 2y = 2

Our programs return the minimum-norm solution, x=0.4, y=0.8.  This is actually an easy problem.

8.       An over-determined system

1*x + 2*y = 15.1

2*x + 2*y = 15.9

-1*x + 1*y =  6.5

Our programs return x=0.7276. y=7.2069.  When you substitute these values into the equations, the resulting right side values are: 15.141, 15.869, and 6.479, which is excellent!

These are all illustrative, or “toy” problems.  Our programs usually are used to solve much more involved problems!

# What Does the Picard Condition Say About Matrix Solution?

Introduction. The Discrete Picard Condition was probably popularized most by Per Christian Hansen. For example, see The discrete picard condition for discrete ill-posed problems in BIT, Volumne 30, Number 4. Here is the abstract for that paper:

We investigate the approximation properties of regularized solutions to discrete ill-posed least squares problems. A necessary condition for obtaining good regularized solutions is that the Fourier coefficients of the right-hand side, when expressed in terms of the generalized SVD associated with the regularization problem, on the average decay to zero faster than the generalized singular values. This is the discrete Picard condition. We illustrate the importance of this condition theoretically as well as experimentally.

This topic has also been discussed in sources such as the book “Regularization of Inverse Problems” by Heinz W. Engl et al, in Theorem 2.8. (Note: the need for Fourier coefficients to converge has been understood for many years. We used this method in the 1980’s before the popularization of the concept, and it was well known to any advanced mathematics student far before that. But it was not widely discussed before the popularization by Hansen.)

The impact of this statement is significant for anyone attempting to regularize an ill-conditioned matrix system. But it is not obscure. If we are solving

Ax = b

by using the SVD then we have

USVTx = b

or

X= VS+UTb.

(S+ is the same as S-1 except zero is used when Si is zero.)

Here the Fourier coefficients are the elements of UT*b. If the Picard condition is satisfied then the elements of UTb decrease faster than the elements of S, so the elements of the “quotient” vector S+*UT*b then decrease. If instead the elements of S+*UT*b do something else – such as begin to drop then reverse and grow – then the Picard condition is not satisfied (and vice-versa).

Application. The value of noting this condition is, as the abstract says, that the condition must be satisfied in order to obtain “good regularized solutions”. Why is this condition often not satisfied? Usually because the elements of UT*b do not continue to decrease because random errors in b prevent the numeric cancellation that should occur to make the element get smaller. For example, an ill-conditioned set of equations might produce a UTb vector that decreases to a certain level, say about 0.001, then just randomly varies around that level. On the other hand, the elements of S my decrease straight to zero, or to round-off level (maybe 10-14).

So what must be done if the Picard condition is not satisfied, so we can get a usable solution? We must change the matrix, A, or the right side vector, b, or both, in some way. For example, Tikhonov regularization can be viewed as modifying A (by increasing the small singular values) or as modifying b. (To view Tikhonov as a modification to b, just compute the usual regularized solution, xr, then compute the corresponding right hand side Axr, and compare this to the original b.)

In other words, we must change either the model, or the observed data, or both.

Considerations. Note that Picard condition does not say that the system cannot be regularized to produce a useful solution; rather it says that an ill-conditioned system must be regularized (some way or another) if a useful solution is to be obtained. A solution based on a vector S+*UT*b which does not decrease is generally not useful.

We use this Picard condition in our AUTOREG automatic regularization scheme. We have developed heuristics which we use to decide exactly when to declare a system of equations as clearly ill-conditioned, and to decide a “usable rank”. This decision is harder than you may expect, because the vector S+UTb seldom smoothly decreases, even when the system is well conditioned. So we have to look instead at a moving average of the values in S+UTb (or their squares) and look for a minimum, followed by significant growth in the moving average. The point where the average grows above the minimum by a certain factor helps locate the point where error is evidently dominating UTb. Thus a “usable rank” is determined, and an error estimate for the right side vector can be made, and this facilitates Tikhonov regularization. See details and other documentation in this web site for more information.

Other details. We do not mean to imply that there are no other issues of interest here. There are. For example, note that we have no information yet on how closely a regularized solution may be to the “true” solution. Yet this may be very important to know in some situations. That is a topic for further study.

# EXAMPLE: Garden ertilizer calculation

I have a small garden that gets a lot attention. I was concerned about proper fertilization, as it seemed that various commercial fertilizers used very different formulas for their products. So I studied the subject a bit and came up with this set of preferences for what a garden fertilizer should contain for each element, by pounds, for 250 square feet:

 3 N 2.5 P 0 K 2.3 Ca 8 S 0.2 Fe 0.5 Mg

Now, my problem was that I didn’t have access to individual chemicals, such as plan iron (Fe). What I had was various bags of commercial fertilizer with various compounds in various amounts. Here are photos of couple of them:  Note the percentages of each  element  (by weight). They have conveniently given percentages of each ELEMENT, not just per chemical, like iron sulfate. What I need is some combination of my on-hand fertilizers that would give me something close to the proportions I need. Here is a table of the things I had on hand when I was trying to figure this all out, with the stated amounts if, any, of Nitrogen(N), Phosphorus (P), Potassium (K), Calcium (Ca), Sulfur(S), Iron(Fe) and Magnesium (Mg) (see rightmost column).

 Gypsum Sulph90% HypoN 3xP AceStarter Ironite Copperas Epsom Mur_Of_K MiraGro1 MiraGro2 Plantone 0 0 0.21 0 0.2 0.01 0 0 0 0.1 0.24 0.05 N 0 0 0 0.45 0.27 0 0 0 0 0.1 0.08 0.03 P 0 0 0 0 0.05 0.01 0 0 0.6 0.1 0.16 0.03 K 0.21 0 0 0 0 0.12 0 0 0 0 0 0.03 Ca 0.16 0.9 0 0 0 0.1 0.11 0.27 0 0.2 0 0.01 S 0 0 0 0 0.01 0.045 0.19 0 0 0 0 0 Fe 0 0 0 0 0 0.1 0 0.2 0 0 0 0.01 Mg

So here we have a nice linear algebra problem: the grid of numbers above (without the labels) is the matrix, and the red column in the earlier table is the desired right hand side. Solving this problem will give us the amount of each of these commercial products that we should use to achieve exactly the amount of each element I need… a proper balance of seven elements, not just two or three.

Note that it turns out this is NOT an ill-conditioned problem for which we need regularization. But there can still be problems. Here is the solution I got from autorej (presented as a column vector, as is traditional):

 8.36892 6.14039 3.67903 1.52176 4.19772 4.04394 -0.12608 0.382606 -2.07644 3.10855 3.92143 1.90845

So we do have a problem. There is no way possible to add a NEGATIVE amount of the 7th and 9th fertilizers.  How should we solve this situation? We could be more careful and delete just the most negative fertilizer.  But let’s try just deleting those two fertilizers, leaving 10 columns. Any reasonable solver will give this answer:

 9.06611 6.98893 10.1148 2.19851 6.46791 3.00713 0.937665 -1.53124 -1.47252 1.17538

So now we have TWO MORE fertilizers that need to be deleted. This is a common problem in real life. The proportions of the elements that I want are not easily obtained from the products on my shelf. But autorej has a non-negative solution.

Let’s let it worry about the algorithmic issues. Here is its solution:

 10.9524 6.06314 14.2857 5.55556 0 0 1.05263 2.5 0 0 0 0

For a few years now I have used this or a similar combination… depending on what new products I have, or what products I had to delete due to being out of it. The vegetable garden has done well with it. And the grass has done fantastic. But this particular formula is not right for everything: fruit trees should not have this much Nitrogen. So I have to reformulate a bit to cut the Nitrogen down when making fertilizer for my small orchard (20 modest trees).

# EXAMPLE: Why you should use all your information.

For about a year I interacted with a very interesting engineer living in India. I will call him C.H. He was working with well logging mostly. This means that instruments are lowered down an oil well which is being drilled, for the purpose of detecting properties of the surrounding earth. The instrument package emits various probing signals, and the surrounding earth responds in some fashion, depending on the nature of the probing signal: various wavelengths of radiation, and presumably certain radioactive materials, and I am not sure what else. Plus, I assume it measures easy to measure things such as temperature.

I am not sure exactly what the emitters and sensors were, because it didn’t matter to me. I was just trying to help him solve the difficult equation systems that resulted. One reason they were difficult is that this arrangement is in part a classic indirect sensing problem. Indirect sensing problems are often ill-conditioned due to the nature of trying reverse nature. In this case signals are emitted, which are basically a “diffusive” process. The signals spread out, and the observer can only measure certain effects of that diffusive process. Reversing nature’s stable processes such as heat flow is generally difficult, inaccurate, and unstable. Trying to calculate an earlier heat pattern in a material from an observed later, diffused heat patten is perhaps the most iconic example of such calculations, and inverse heat problems are perfect example problems of ill-conditioned systems. So they are used in testing algorithms like autorej uses.

Here is one of his matrix problems (a.k.a. linear algebra systems) that we worked on). The last column is the observations.

 56.22 1070 1.95 4.78 0.015 303.44 28.8 8.4 8.57566 3.91 4.08 1.22 2.64 0.25 4.51 3.2 2.77 1.85502 0 0 0 0 0 4000 180 250 19.5 0 0 0 0.001 0 2000 8 35 4.75051 0 0.001 0 0.001 0 300 17 8 1.53051 0 0 0 0.001 0 0 0.1 6 0.065506

Here is the Picard Condition Vector for this problem (see the Tutorial for more on the PCV):

[0.00448    0.00669   0.04127   0.03892  0.34695  0.37723]

This is not a bad  PCV at all, but it is a little odd as the first several values are very small, and the first, which is often the largest, is in fact the smallest. But I have seen this behavior in other contexts as well. Here is the solution (that would come any reasonable equation solver). I will show it graphically because the graphs convey the situation well here. Kind of an interesting graph. The problem is that this is not at all what C.H. expected. The scale of the vertical axis is also too small. Discussing this, I realized that it was known that the solution values should add to 1.  Autorej’s solve_constrained() accepts one constraint equation that is to be obeyed exactly: not as a normal least squares equation. To ask the sum to be 1.0 we set all the coefficients to 1, creating a sum, and set the right hand side value to 1, which is the sum we want.

[ 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  ], with b=1.0

Here is the result: Wow! What a difference! Almost shockingly different… and much more reasonable and as expected. So be sure and ask yourself, if you are having difficulty getting a good solution, if you have included all the data you have.

I should mention to you that the algorithms in autorej could be much more general than provided here, but we have chosen to include only the most common features. For example, my C++ library allows any number of such “equality” constraints, and any number of inequality constraints, such as equations that ask a particular value to be no less than 10.0, for example, or for the sum to be not over 5, etc.

# EXAMPLE: Around the house: matching stucco color

Our house was over 25 years old when the (exterior) stucco seriously needed some patching and a new color coat. Unfortunately, the builder, who was long gone, had used his private, just-right blend of standard stucco colors, and we could not determine what that mix was. So I decided to try to match the color myself. Looking back, I wish we had just picked the closest match of the standard colors, as when you coat a whole house with a color it likely will look different than you imagined from looking at tiny samples. But I was determined to do it. So I photographed the appropriate page of the booklet of standard colors in El Rey’s sampler of 75 colors. Then I took a photograph of a typical section of our stucco wall. I took all these photos at the same time and place with the same camera and the same lighting. This was done by taping the sample color page to the wall and taking the photos at the same time. Here is a re-creation of the four sample colors I decided to work with, with the last being the current stucco color I want to find a new mix for. LUZ                              GRAY                              CORAL                        ADOBE                            desired

I know some of these look very similar. But if you could see it on a whole wall there would be visible differences.

Now, how do we turn this into some linear equations to solve? Using PaintShop it is easy to point the cursor at a color and see the Red-Green-Blue values for that color. Each color is usually represented by a single 8-bit byte, so each color has a range from 0 to 255. So (0,0,0) for (R,G,B) would be black, and (255,255,255) would be the brightest white.

Here are the RBG values for the four samples, plus the color I need to find a mix for. (By the way, you can insert comment lines in autorej’s input file by putting the hashtag symbol (#) as the first character.)

 #Luz Gray Coral Adobe Desired 123 190 165 116 135 84 190 136 77 92 70 190 132 59 77

Want we want is the weightings for the first four colors that will reproduce the DESIRED RGB values in the last column. So there is the linear algebra problem to solve! We just put the bottom four lines in a tab-separated file and use it as input to autorej’s P.read_problem() function. Here is the result (as a row) from P.solve():

[  0.44108058  -0.11496714   0.32839188   0.41729469 ]

Oops! We forgot to tell autorej that the solution values must be nonnegative! Try again with P.solv_nonneg(). Here is its answer:

[ 1.09399306     0.         0.00225098     0.    ]

When we look at the table above we can see what is happening. The desired elements of column 5 are almost exactly 10% more (that is, a bit brighter) than the corresponding elements of Luz in column 1. Solve_nonneg() figured out that the best way to do that adjustment is to add a little bit of the substantially lighter Coral in column three to column one. That is, 0.00225 bags of column three to 1.09 bags of Luz.( Of course for real-world convenience the largest number here should be scaled to 1 bag, giving 1 bag of Luz to .00206 bags of Coral. )

Note that the RGB value obtained by our solver is precisely (134.9, 92.2, 76.9) which rounds to exactly the color what we wanted.

In the interest of full disclosure, I should tell you that when creating this example I couldn’t find the final solution that I used for actually stuccoing nearly 15 years ago, so I recreated the data here to make an appropriate example, with the same number of “input” colors and roughly the same hues.

In the actual case the amount of the dark color I used (Santa Fe Brown, not Luz) in the computed solution was close to half of the amount of Gray. But not wanting to cause the workers extra trouble, and possibly have successive wheelbarrow mixes vary due to their likely inexact measurements, I just told the workers to mix Santa Fe Brown and Gray equally. What a stupid mistake. The result was much darker than I actually wanted. It was not practical to fix, but fortunately the sun bleached it a little over the years so it still looks OK. The math was good, but I didn’t have the conviction to insist on an inconvenient mix. I hope I learned that lesson.

# EXAMPLE: Getting the computer’s help to play Mine Sweeper.

For years before I became interested in Python I had a web site offering somewhat similar software for C++ as autorej is for Python. One day a fellow from Finland contacted me and asked for the source code. (Sometimes I offered the code, but most times I offered apps that used the code.) I of course emailed it to him, but I asked casually what his application was. It said playing MineSweeper! I pointed out that if he got answers that were between 0 and 1 for a hidden square that probably he should consider those to be probabilities. He assured me he was only interested sure things: 0 for no bomb in a square, 1 for a bomb.

So when prepping these examples for Python and users (and anyone else who is interested), I decided to try my hand at solving  Minesweeper with linear equations, as I had never bothered to ask the customer HOW exactly he did that. I guess I figured it was straightforward. Let’s take this little segment which I choose from a must larger problem because it is a legitimate stand-alone problem in itself except for the three bottom squares in the right-most column which we will not be able to determine. So what are the equations? Let’s refer to the elements of the above grid as we would elements of a matrix that size, e.g., A11, A12, etc. And call the NUMBER which is (or should be) in that square #A11, etc. Now first, the unknowns for our problem are the values in the blank, unexploded squares. For example, we want to know how many bombs (aero or one) are at A38.After some thought I realized there are two types:

1.       Each square with a bomb, such as square (2,2), is a simple equation, e.g., #A22=1.0

2.       Each number which is next (by left, right, up, down, or diagonal) to an unknown square also gives us an equation, e.g., from A23 we get

#A22  + #A32 + #A33 = 2

So we have to go create all those equations. Here is my is my matrix file as read by autorej’s read_problem() method (except for the last column which was helpful during building the file but cannot be present in the input file). Also note the comment lines marked with “#”.

Now….. before trying to solve this problem to find the numbers in the blank squares, are at least SOME of them in this first try, we should ask ourselves exactly WHAT we expect to see. Do we have any right to expect to get back ONLY INTEGERS? Actually, we would like that, but we have not inserted any constraints into the problem that would even encourage that, much less require it. The calculation process will begin by computing the Singular Value Decomposition of our matrix. Here are the singular values for this matrix:

[3.66882986e+00    3.29958015e+00    2.74052834e+00    2.44205520e+00

2.33557658e+00    1.85017432e+00    1.51369996e+00    1.28268753e+00

1.15207922e+00    9.33783720e-01    6.28601369e-01    5.72988469e-01

4.98618121e-01    4.38608280e-01    2.59847585e-16    1.82895960e-16

1.16177952e-16    3.02411221e-17    1.69841264e-33    0.00000000e+00]

No hint here of any likelihood for integer answer!! When sending the C++ code to my customer I pointed out that if he got answers that were between 0 and 1 for a hidden square that probably he should consider those to be probabilities. He assured me he was only interested sure things: 0 for no bomb in a square, 1 for a bomb.

Here are the answers that Autorej normal solver gives:

[-4.04451282e-16  1.00000000e+00  1.03922202e-15 -1.51275221e-02

1.00000000e+00  1.00000000e+00  4.05824099e-16  4.41513009e-03

-1.36256212e-15  2.74346447e-01 -2.74346447e-01  1.32495853e-15

2.74346447e-01  8.62826776e-01  8.62826776e-01  0.00000000e+00]

Most of the answers are indeed VERY CLOSE to zero or 1. But the 4th and 10th are about -0.2. What does that mean? It means we should try autorej’s Non-negative solver. Here is the answer that that gives:

[1.     1.     0.     0.     0.     1.

0.     0.     1.     1.     0.     0.

0.     0.     0.     0.     0.     1.02359314    0.97640686   0.  ]

Now that’s pretty good!! Even the two that are slightly off of one are very near to one. Terrific!

If we go into the problem and mark the new squares that are now solved with a red symbol we have this: All the remaining blank squares are actually empty. But I wasn’t actually playing the game by this point so I was not able to click them to show they are empty. Our problem setup had no information about certain squares, such as the last three in the last column.

# Arls.py for Automatic Regularization of Ill-Conditioned Linear Systems

Arls.py is a complete package for AUTOMATICALLY solving difficult linear algebraic systems (or “matrix problems”).  Basically, these methods completely free the user who has a difficult, usually ill-conditioned, problem from worrying about :

·         choosing a rank to use

·         choosing a level of singular value to zero when using a Generalized Inverse

·         estimating the error in the matrix coefficients

·         estimating the error in the right-hand side

·         choosing the Tikhonov regularization parameter, lambda

Arls for Python is available on the Python Package Index, PyPI.

You should be able to install it on you Python installation with this command:

pip install arls

How to Use arls. We assume you have some equations to solve, and can use Python. Put the coefficients in rows of matrix A, and put the right hand side values in a column, b. It does not matter whether the number of rows equals the number of unknowns, or is larger or smaller than the number of unknowns, as arls’s routines handle all cases.  Singularity or ill-conditioning are expected and considered normal behavior by these methods.

·         If the number of rows is less than or equal to the number of unknowns and the data is consistent and accurate, then the solution will typically be exact except for round-off error, as any good standard linear equation solver would do.

·         If the number of rows is larger than the number of unknowns, or the data is significantly in error, then a best fit, or “least-squares” solution will be found.

·         If the system of equations had serious problems such as unintended dependency between rows, or “singularities”, or ill-conditioning, then arls will usually be able to address those issues automatically and still return a reasonable solution. This is the interesting case for arls, and the reason for its existence. Arls was developed to automatically solve so-called “ill-conditioned” systems which often otherwise require special handling or are considered impossible to solve. Of course, when arls’ algorithms produce a solution of such a difficult problem the residual error (that is, the values in Ax-b, or the single value, norm(Ax-b)) will of necessity be larger than what the residual would have been if a standard least-squares solver had been able to handle the problem. So always check the result for reasonableness and appropriateness to your technical needs.

Case 1. Once you have created your matrix, A, and the corresponding right-hand-side, b, we have the problem

A x = b

To solve it, import arls’s solvers and call arls:

from arls import arlsarlseqarlsgtarlsnnarlsall

x = arls1(A,b)

Case 2. If your system of equations results in some negative values, and that is not acceptable, then just call our non-negative solver:

from arls import arlsarlseqarlsgtarlsnnarlsall

x = arlsnn(A,b)

Case 3. If you have a special situation, such as needing the solution to add to, say, 100.0 percent, or the first value to start at exactly 0., or any such exact requirement (not just more regular “least-squares” equations) then put those requirements as rows in a separate set of equations, Ex = f and then solve the dual system, by calling our “equality constraint” solver:

from arls import arlsarlseqarlsgtarlsnnarlsall

x = arlseq(A,b,E,f)

For example, if there are 5 unknowns, to require the solution to add to 100, use this:

E = [ 1, 1, 1, 1, 1]

f =                         (or, just use f = 100. Arls will convert the number to a vector of length 1.)

Note: Depending on what you put in Ex = f, it may be impossible to satisfy all the equations. For example, you might accidentally say x = 1., and also x = 2. In such cases arlseq() will attempt to retain the equations that affect the result most. (The exact algorithm isn’t worth explaining here.) You can check the status of the equality constraints by examining the residual:

R = E*x – f                 (where x is the solution returned by arlseq() )

Normally all elements of R will be zero. If R, for example, is non-zero, then equation 3 will have a non-zero value.  You should try to resolve what is going on with these constraints.

Case 4. If you have some special need for certain unknowns to satisfy given bounds, then put those requirements in a separate set of equations, Gx >= h. Be sure to express them all in “greater than” form, such as x>1.0. If you need to use a “less than” equation, such as x < 2.0, then multiply the equation on both sides by -1 and you will get a “greater than” constraint instead of a “less than” constraint”:

-x >= -2.0.

Note that using this technique you can bound a value from below and above, such as

X >= 2

x <= 4                 ( a.k.a.  -x >=  -4 )

If we assume there are, say, 5 unknowns, then these two constraints can be represented by

G = [  [0, 0, 0, 1, 0],                                (remember Python indexing starts at 0, not 1)

[0, 0, 0, -1, 0]  ]                             (you have to negate “less than” equations)

h = [ 2,  -4 ]

Then you can solve this combined system by calling:

from arls import arlsarlseqarlsgtarlsnnarlaall

x = arlsgt(A,b,G,h)

Case 5. Lastly, you can combine all the above features:

from arls import arlsarlseqarlsgtarlsnnarlsall

x = arlsall(A,b,E,f,G,h)

Extra Information: In each case above you can ask for extra information to be returned, like this:

x, nr, ur, sigma, lambda = arl1s(A,b)

In this case the extra parameters have these meanings when b contains only one column:

nr : (int) is the numerical rank of the matrix A. This measure of rank ignores “singular values” that are near zero.

ur : (int) is the computed “usable rank”, which is a function of both A and b, and tells the solver at what point the error in A and b combine to ruin the use of more than ur rows of the orthogonalized version of the system of equations. In other words, ur is the breakpoint in the orthogonalized version of the system of equation between rows in which the error in b is acceptably small, and rows in which the error in b is uselessly large.

msigma : (float) is the estimated estimated right-hand-side root-mean-square error. This means that our analysis reveals the actual error in b – a unique feature!

mlambda : (float) is the Tikhonov regularization parameter that goes with the estimated right hand side error, sigma.

# A comparison of the performance of arls() and lsmr()

For the matrix, A, I computed an 11 by 11 sample of an inverse heat equation. This is a classic example of what are often called “inverse problems”, which tend to be ill-conditioned. They often involve attempting to reverse some natural process such as bluring or, in this case, heat dissipation. For a “true” solution, xt, I picked a cusp of a sine function, with a small negative value added, so I can illustrate non-negative solvers. That “true” solution is here: chosen "solution" . Then I computed the “true” right hand side,  b = A * xt.

Then to make an interesting problem I added to b random noise of roughly Gaussian(0.0, 0.00005). For some of the problems, for realism, I further added an “accidental” error to  b of  -0.01.

The result for the right hand side is

[2.97863221  3.55482559  4.06729936  4.47735646  4.74273922  4.83465317

4.74282974  4.47743033  4.06726914  3.55472706  2.98858201]

The solution by lstsq(), which is NOT appropriate solver for this problem is here. Note that the plot scale is actually 1.0E7. Also note that lstsq() considers this to be an OK solution. It has no concept that the problem is ill-conditioned. It only diagnoses the result as wrong when the system of equations is fully singular. This is why one cannot use a standard solver and accept the solution as OK until it reports a singularity. Instead, we want to know when the solution has worthless character like this and REGULARIZE the solution process.

The solution by lsmr(), which often produces beautiful results, happened to turn out poorly. The solution by arls(), which is very nice in this case, is here. For the record, the nonnegative solution by arlsnn(), which  is here. Next we remove the intentional glitch of -0.01 that we added to b, and resolve with lsmr(). The result is much better… though not one of the beautiful results than it sometimes does. And here is what arls() does with this glitch removed. Obviously the result by arls() is not greatly changed by a minor change in the right hand side error. Finally we also reduced the error in the right hand side to roughly Gaussian(0.0, 0.00002).

Lsmr() now produces a nice solution like arls(). Summary: lsmr() can produce some beautiful results, but only sometimes. Arls() seems to more reliably produce a useful solution.

# Further Background to Solving Ill-Conditioned Linear Systems

We are interested in solving linear, or matrix, system of arbitrary shape and size,

(1)    A*x= b

Where the solution obtained from naïve method is badly behaved. This happens for various reasons, such as duplicated data, or other reasons for singularity. But the interesting case is when it happens because the right hand side has more data error than the system can tolerate and get a good answer. This comes up commonly in “inverse problems”, meaning problem in which we are trying to back up diffusive process in time. For example, in solving the “inverse heat problem” we are trying to determine what the heat profile of some linear arrangement was at an earlier time. Here is a graph of that ideal original hear profile in this problem: Here is an example “solution” obtained by naively solving the 13 row by 17 column problem (see data files accompanying this document): SVD

The natural way of solving any hard linear system is to start with the widely available Singular Value Decomposition:

(2)    A = U*S*VT

Where U*UT = I,  UT*U=I,  V*VT =I,  VT*V=I, and S is a diagonal matrix of decreasing “singular values”. We will ignore the minor aggravation of whether U and V and the full, square matrices of theory, or a reduced shape which takes advantage of the chance to save space by reducing the sizes of U and V when there are singular values which are zero. Those matters are only of concern to the code developer.

We solve the system (1) as follows:

(3)                  A * x = b

(4)    U  * S * VT * x = b

(5)            S * VT * x = U* b

(6)                  VT *  x = S+ * U* b        = PCV

(7)                         x = V * S+ * U* b

Here S+ is the pseudo-inverse of S, meaning that 0 or negligible values of S[i] are treated instead as being infinity ().

This “pseudoinverse” approach only helps when the difficulty is the presence of one or more “simple” singularities due to such problems as duplicated rows. Sometimes that is enough. In statistical contexts the matrix V * S+ * UT is referred to as the “pseudo-inverse of A and seems to be important.

The next step in trying to solve (1) is to set MORE elements of S to zero (or , whichever way you want to think of it). For example, if the average accuracy of the elements of A is, say, 4 significant digits, then one might set all the singular values at the end of S to zero that add together to less than 0.0001 times the average element of A, or 0.0001 times S. But this doesn’t really help in general because the problem is usually with the error in b, not the error in A.

For our purposes zero singular values are not a difficulty…  we always treat them as the pseudoinverse does. The more important case is SMALL singular values.

Papers have been written for over thirty years on how to automatically resolve the problem of excessively small singular values, or “ill-conditioning” in linear equation systems. These problems are characterized as having wildly unacceptable solutions, especially when the solution is expected to be non-negative. And, even irrelevantly small changes to the observed data, b, can cause the solution to be wildly different yet, and still unacceptable.

I studied this problem many years ago as part of my Ph.D. under the well-known Dr. Cleve Moler, the inventor of MATLAB. Many papers have been published since then, with various approaches and degrees of success, and various kinds of constraints or conditions. I have presented later versions of my algorithm in C++ at appropriate conferences, and I also conferred with a Russian mathematician on one of these competitive algorithm. My present algorithm is not very “theoretical”, and not a very good candidate for publication, but it is practical, efficient, and adaptable. For example, in my C++ package I allow any number of both equality constraints and inequality constraints to be added to the problem. However, there is little general use for such matters. In the current package we offer only optional one extra constraint: that the solution be non-negative. With this perspective, let’s go on:

Our interest in dealing with this ill-conditioning is in the right hand side of (6), which we have labelled “PCV” for Picard Condition Vector. Picard’s condition states that the elements of the PCV must converge  toward zero for the system to be solvable stably. When the system (1) is ill-conditioned, that fact will show here by the PCV growing inappropriately in its latter elements. This happens because the elements of  U* b drop to zero FASTER that the singular values, S[i], drop to zero. In fact, what happens is that the S[i] march steadily toward zero, as that is the nature of the singular values in these case, whereas the latter elements of U* b remain at a significantly high level due to noise in A and/or b. Here is an example PCV for the small inverse heat problem we mentioned. Obviously there is a real problem here! Next we zoom in to see the first few values: All appears to behave well with the PCV for the first half dozen elements. Then something wrong happens. Below is a spread sheet of the singular values and the elements of U* b. The PCV is computed by dividing the elements of U* b  in the second column by the singular values in the first column. That quotient is in the third column. You can see that the singular values are marching nicely to zero. But the U* b values seem to plateau at random plus and minus values a little over 0.0001 in magnitude. Since the latter values in U* b should – by Picard – have been two or three orders of magnitude smaller than they are,  we can attribute maybe 99% of their magnitude to noise in A and b. In fact, since U (and  UT) have all ones for their singular values, the latter element of U* b are irrelevantly transformed noise samples of b. So we can take the “latter values” of U* b to be of the same magnitude as the original error in b. NOW… the crux of our situation: if we knew an algorithm to automatically tell WHERE to start in the second column we could automatically estimate the root-mean-square, of the error in b – it is the same as the RMS of these latter values in U* b. That would be very helpful because it would let us then determine how much to regularize the solution process (more on this later). You will notice that the choice of where to start counting the second column estimates as error samples is not particularly critical on the high side: we consider rows 6 to 13 to be error samples; or with little change, consider rows 7 to 13 to be error samples; and in this problem we could also use 8 to 13, and even further. But often we will not have to luxury of so many rows of this table to work with. Many cases of small systems will have only a couple of error samples: so they are precious and must not be wasted. On the other hand, estimating the usable rows as 4 or 5 instead of 6 or 7 could be catastrophic to the solution as that choice would unjustifiably increase the RMS of the “error samples”. This would cause us to over-regularize the problem.

So you may be asking, if the “usable rank” of this problem is clearly around 5 to 7, why not just try setting all singular values past 5, 6, and 7 to zero and select one of those answers? For two reasons: we need an AUTOMATIC process. How would we choose between those three automatically? That is a difficult matter. And second, NONE of those three may be very good. We can do better than that, and automatically. (However, for the record, in this particular example zeroing singular value past rank 5 produces a useful answer almost as good as autosol’s result. This is equivalent to using a Generalized Inverse with a Singular Value cutoff of 0.004. But using a cutoff of 0.005 would product a bad solution.)

Before talking about our algorithm for determining “usable rank”, I want to emphasize that the above example is unusually clear. The PCV can vary drastically. I have seen it be roughly level for a few values, then steadily rise. It can oscillate between near zero and very positive and negative large values. It is very unpredictable. So our methods must be aware of the complexity and pitfalls.

We have developed for autosol two heuristic algorithms for choosing that point in the PCV at which noise starts dominating the elements of UT*b. We have tuned the parameters with many problems for many years. The first algorithm, called splita in the code, uses a pair of weighted moving averages. One lazily follows successive values of the PCV which are larger than its current value. The second lazily follows successive values of the PCV which are smaller than its current value, but also VERY lazily follows larger values of the PCV. When the first average becomes substantially larger than the second average then autosol declares that point to be the end of any possible useful data.

The second algorithm, called splitb in the code, re-analyses the PCV data, up to the upper rank limit determined by splita, using a fixed width, unweighted average of the SQUARES of the elements of the PCV. This allows a closer look at when the values get too large. The fixed width that is chosen depends on problem size but is never greater than 6. We look carefully at these sums and stop when one successive moving sum is substantially larger than the previous sum. This result from splitb becomes what we call the “usable rank”, and values beyond that in UT*b are taken to be samples of the error in the original RHS, b.

So how do we use these noise values to find a solution? Simply: the RMS (root-mean-square) of them becomes the parameter SIGMA that we use to proceed with the solution. This process of ”Tikhonov regularization” can be presented several ways, but this straightforward method seems adequate:

To the original A*x=b, we add equations asking the x values to be zero. That is, we add

(8)    xi = 0.0   , for all indices i

But we weight these extra equations by a small value which is usually called lambda, or λ.

(9)    λ xi = 0.0   , for all indices i

We write this enhanced system in matrix form like this:

(10)        A * x = b

(11)  λ * I * x = 0               ( I is an appropriately sized identity matrix)

If λ is zero, there is no change to x. But the larger lambda is, the more we will force the computed vector, x, from its nominal pseudoinverse result toward a solution of all zeros.  We choose which x we want with the “discrepancy method”: that is, we search for a value of λ for which the resulting solution, x(λ), satisfies this (where x(0) is the pseudoinverse solution, for which λ is not present):

(12)  RMS( x(λ) -x(0) ) = sigma

That is, we, or rather, autosol asks that the new, auto-regularized solution deviate from the unregularized solution by exactly SIGMA on average (in RMS sense).

And that does it. The “Gordian knot” has been untied.

Finally, if the user so asks, we append a non-negativity algorithm to the above solution process. This is as in the classic book, “Solving Least Squares Problems” by  Lawson & Hanson (Hanson being another member of my dissertation committee) .  Basically, we pick a negative result, and zero its column. Then we resolve. That value will now be zero, and the solution will have adapted to part to that change. Then you pick another to zero out. The question is, what ORDER should these be picked? Our choice is to force the most negative xto become zero. We iterate on this until there are no non-roundoff negative solution values.

This is the resulting auto-regularized solution for the inverse heat problem above: And this is the result with the nonnegative option: Note that the negative values didn’t just disappear: the solution adjusted better represent the flat region in the first 10 samples. This may not look like a great solution, but in most applications involving these ill-conditioned systems, this is a fabulous automatic solution.

Rondall E. Jones  rejones7@msn.com