## P≠NP, proven by an analysis of “Subset sum problem”

Тъй като съм затруднен да публикувам това в научно издание или форум, го пускам тук. Който разбира нещо, да чете... :)

Update as of Jan 23rd 2016:

The Subset Sum Problem can not be

solved in polynomic time

By Dobri Bozhilov, MSc

Jan 23rd 2016

Abstract

This analysis proves mathematically that there cannot exist universal polynomic-time solution of Subset Sum Problem. The main idea is that it is impossible to define such a power index k that is equal in any subset, no matter of its size. Mathematical contradictions appear on such assumption and this leads to the conclusion that no polynomic-time solution is possible.

I.Introduction

The Subset Sum problem is one of the important problems in complexity theory. The problem is:  Given a set of integers, is there a non-empty subset whose sum is a given target number X? The problem is NP-complete. So like in all other problems, connected with P vs. NP, here exists the question if polynomic time solution is possible. Generally, finding the answer will help finding the answer of the millennium problem of P vs. NP.

II. The Most effective known algorithm

The most effective known algorithm on Subset Sum Problem is the one of Horowitz and Sahni. It lowers the complexity of solving the problem to O(2n/2). In last 4 decades since it was invented, no fundamentally better solution is found. So up to now all found solutions are exponential. Logically – after so long time of unsuccessful attempts, and based on intuition, many scientists believe that much better solution is impossible. Anyway there is no proof that such solution is impossible.

III.Mathematical proof

Let’s have 2 sets – with n and m elements respectively. Let’s for both we are having one and the same target number of X. Let’s suppose the theoretical most effective algorithm requires a steps to find a solution in first set (n) and b steps to find a solution in second set (m).

Let’s suppose we find k such as:

a=(f(n))k

Let’s suppose we find l such as:

b=(f(m))l

Obviously up to now we defined some dependencies between the number of steps required for finding a solution and the size of the initial set. So now we must check if k and l can be equal. If k=l, this will mean all solutions in any set will be reachable in polynomic time. If k≠l, then the number of steps will change exponentially.

So let’s assume k=l.

Then:

log f(n) a = log f(m) b

Therefore

log f(n) a = log f(n)b / log f(n)f(m)

log f(n) a / log f(n)b = 1/ log f(n)f(m)

log b a=1/ log f(n)f(m)

log a b = log f(n)f(m)

Obviously this equation is not possible. The right side of the equation is a constant, as it is dependent only on the sizes of the 2 sets. For any combinations of 2 sets, sized n and m respectively, the result will be one and the same.

Anyway this is not possible for a and b. The number of steps depends not only on the initial size of the set, but also on the values on numbers, their distribution, etc. In any case it is not possible to define such a strict constant dependency between a and b, like the one between f(n) and f(m). In some partial cases it is possible, but not in general case.

The simplest way to prove this is to declare a third set with a size of m, the same target number of X, but with different members. Based on them we will have a different minimum number of steps to find a solution – c.

Then as

log f(n)f(m)=const, for any sets sized n and m respectively

Therefore:

log a b = log a c, for b≠c

This is not true.

Therefore:

If log a b ≠ const

then

k≠l

Thinking in more general, as “Subset Sum Problem” is a NP-complete problem this must mean that P≠NP.

IV. Some non-math “philosophical” thoughts on problem and on P vs. NP

To have a “wormhole” you need a worm to dig it. If no worm, no hole either…

In short this is the solution of “P vs. NP” problem. The answer is “No”. P is not equal to NP, as there is no “worm” to dig a shortcut and this way bypass the exponential algorithm tragedy. There is no shortcut and such one is impossible.

What does it mean to prove that a specific NP-complete problem cannot be solved non-exponentially? Obviously this will prove that P≠NP, because otherwise contradiction will appear. If a specific NP-complete problem cannot be solved non-exponentially, this proves that P≠NP, because otherwise a contradiction will appear. If a NP-complete problem is proved as un-solvable in polynomic time and another NP-complete is solved then the first will also be solved as it could be converted into second one. Obviously not-possible.

To prove this assertion in previous section we used the “Subset sum problem”. Even with no mathematics the answer of “Subset sum problem” is very short and simple:

You can not avoid exponents (xn) in a problem where combinatorics is involved. No such possibility. And that is in short the solution of “P vs. NP”. P is not equal to NP.

And you can not solve the subset sum problem without combinatorics. Theoretically if such a decision was possible, so it will mean existence of functional dependency between the numbers or between the numbers and the target number. Functional dependency and calculating by formula is the only way to find the right answer among many other wrong answers. But this will contradict the initial condition of the problem – finding a subset in a set of arbitrary integers. The problem is defined as universal – the algorithm must work on ANY set of integers. And this excludes functional dependency. As no dependency exists, combinatorics is the way. And combinatorics means exponents.

You can do whatever you want with the initial set, but never the less what you do, some subset will remain for combinatorial checkup. Therefore the only result of your effort will be a hypothetic reduction of the exponent size (xn, xn/2…xn/1000…). But in all these cases some exponential element will remain. Generally the most used approach in this area is splitting the initial set to subsets, because this way the exponential power goes down. But anyway there remains an exponential component. In addition, splitting has some limits. You can not split too much, because you will start missing some combinations containing possible solution. To avoid this you will have to combine again subsets each other and this will bring you back to a previous level of splitting. In my previous thoughts, published on my blog, I reached some conclusions on what is the theoretical limit of lowering complexity when solving “Subset Sum Problem”. It is close to the one of Horowitz and Sahni. That explains why much better solution was not found for 43 years.

So in short the “philosophical” solution of “P vs. NP” is as follows:

“Subset sum problem” is a complete NP-problem. “Subset sum problem” can not be solved without using combinatorics at all. The existence of combinatorics means unavoidable exponents. And this means that no pure polynomic solution is possible. As “Subset sum problem” is proven unsolvable in polynomic time, and as it is a NP-complete, therefore: P≠NP

V.Conclusion

No matter whether you use Math of Philosophy, the finding that Subset Sum Problem is unsolvable in polynomic time is a serious progress in computer science. Although up to now about 80% of mathematicians and computer scientists believed that P is not NP, there is still much intellectual power spent on proving the opposite. Most computer algorithms and ideas, used in developing new technologies, rely on P≠NP. This is obviously right, as proved the analysis in this article. Now as the final answer is found the scientists will be able to focus on other concepts of simplifying the computer algorithms.

Jan 23rd 2016

Sofia

------------------------

References:

Horowitz, Ellis; Sahni, Sartaj (1974), "Computing partitions with applications to the knapsack problem", Journal of the Association for Computing Machinery 21: 277–292, doi:10.1145/321812.321823, MR 0354006

Bozhilov, Dobri (2016), “Is PNP, based on analysis of “Subset sum problem”, www.komentari.com

---------------------

Initial post as of Jan 04 2016::

P≠NP, proven by an analysis of

“Subset sum problem”

1.Solution in short

To have a “wormhole” you need a worm to dig it. If no worm, no hole either…

In short this is the solution of “P vs. NP” problem. The answer is “No”. P is not equal to NP, as there is no “worm” to dig a shortcut and this way bypass the exponential algorithm tragedy. There is no shortcut and such one is impossible. The purpose of this article is to prove this.

To achieve this the concept of this article is to prove that a specific NP-complete problem cannot be solved non-exponentially. This will prove that P≠NP, because otherwise contradiction will appear. If a NP-complete problem is proved as un-solvable in polynomic time and another NP-complete is solved then the first will also be solved as it could be converted into second one. Obviously not-possible.

To prove this assertion we can use the “Subset sum problem”. If we prove that it cannot be solved non-exponentially, this will be the end of “P vs. NP” discussion.

And the answer of “Subset sum problem” is very short and simple:

You can not avoid exponents (xn) in a problem where combinatorics is involved. No such possibility. And that is in short the solution of “P vs. NP”. P is not equal to NP.

You don’t need to read any more this article unless you are keen on brain gymnastics and developing intellect by thinking on complex issues. If you are interested just in the answer, that is all – exponents are inevitable with combinatorics involved.

And you can not solve the subset sum problem without combinatorics. Theoretically if such a decision was possible, so it will mean existence of functional dependency between the numbers. Functional dependency and calculating by formula is the only way to find the right answer among many other wrong answers. But this will contradict the initial condition of the problem – finding a subset in a set of arbitrary integers. The problem is defined as universal – the algorithm must work on ANY set of integers. And this excludes functional dependency. As no dependency exists, combinatorics is the way. And combinatorics means exponents.

You can do whatever you want with the initial set (below are some thoughts on this), but never the less what you do, some subset will remain for combinatorial checkup. Therefore the only result of your effort will be a hypothetic reduction of the exponent size (xn, xn/2…xn/1000…). But in all these cases some exponential element will remain. Generally the most used approach in this area is splitting the initial set to subsets, because this way the exponential power goes down. But anyway there remains an exponential component. In addition, splitting has some limits. You can not split too much, because you will start missing some combinations containing possible solution. To avoid this you will have to combine again subsets each other and this will bring you back to a previous level of splitting. But this is in the next part of analysis. The real solution of “P vs. NP” is as follows:

“Subset sum problem” is a complete NP-problem. “Subset sum problem” can not be solved without using combinatorics at all. The existence of combinatorics means unavoidable exponents. And this means that no pure polynomic solution is possible. As “Subset sum problem” is proven unsolvable in polynomic time, and as it is a NP-complete, therefore:

P≠NP

2.Some thoughts on what is theoretically lowest needed complexity for “Subset sum problem”

To have a “wormhole” you need a worm to dig it. If no worm, no hole either…

So let’s start searching for “worms”. I.e. for something that could help us reduce this horrifying check-all-combinations calculation burden. What could help us avoid this? Is there a “worm”, a “privilege”, any concept that could help us?

Unfortunately No.

All we have is a set of numbers. And nothing more. No number is marked as preferable. No any arithmetic operation on numbers can help us avoid some of the calculations. There is nothing that can help us. No “worm” to dig. No “filtering privilege” to send in trash these unneeded and exhausting combinations. No such savior among numbers.

Anyway we have something that can help. But not enough. We have 3 important things that can be used for filtering:

-We have the initial set itself and can somehow manipulate it. Generally we can split it

-We have a target number that the subsets must equal - for simplification we could accept this as zero(0).

-We have the value of the current calculated combination

That is all we have. In fact these elements can be used for some filtering. But not enoung to make any potential algorithm non-exponential. In fact in one way or another exactly these 3 elements are used in all algorithms for solving the problem, including the most effective ones. These elements can be used in different ways – directly or indirectly, by summing, sorting, merging, etc. But all ways descend visibly or not, from these 3 – manipulating the set, target number and value of current calculated combination.

So if based only on these 3 factors (worms, privilege filters) we can not have a non-exponential solution, therefore we can not have such a solution at all.

Let’s start with the set, because this is used together with the rest 2 factors.

Splitting the set is a powerful tool of lowering the exponential power. Working with smaller subsets you can reduce the number of combinations to be checked. Using it along with other filters the overall job done will go down. Anyway splitting does not remove the exponential nightmare for the computer. It just makes it smaller.

In addition there are limits of splitting. If you split only in 2, then you will lower the exponent power to the level of the bigger subset. If 2 subsets are equal, you will have exponential power of n/2. So why not then spilt in 4,5…1000? Why not split until we reach a subset of 1 number and solve the problem instantly? The answer is obvious. If you split on more than 2 subsets, you risk starting to miss some of the combinations needed for check. To avoid this you will have to combine again subsets each other. And this will bring you back to a previous level of splitting. At last you will be back to the initial split in 2 subsets. In this case with any subset having just one another subset as a possible “partner” there is no risk of missing combinations. So the theoretical limit of filtering power of splitting is n/2.

Now let’s go to the target number. It is the weaker filter. The main thing it can be used for is to conveniently divide the initial set into 2 subsets (for instance positive and negative after sorting). So it is an interesting “split point” that can be useful in some cases. In fact you can divide it at any number, but if it is not the target number you will lose some of the filtering power of this number. In fact some of algorithms are doing exactly this, but just because the “profit” from the other split is bigger than the loss from this. Anyway this is not so important issue, now we are on the matter of target value.

As we said above dividing the set to subsets is very useful because this way we can lower the complexity to calculate all possible combinations in each of the 2 subsets. For instance, if 2 subsets are equal, then we will have complexity lowered from O(2n) to (2*2n/2). If they are not equal we will have different reductions in each of 2 subsets.

Why the target number is interesting? The answer is because it splits a hypothetically sorted set of numbers to 2 subsets that does not need to calculate the combinations of numbers that are only from one of the sides of the target number. I.e. if the split is on positive and negative numbers and the target is 0, then you don’t need to check all combinations containing only positive or only negative numbers. You will save much of the time of your processor.

Anyway the target number as filtering factor is not so important. Sometimes it can be “more profitable” to split at any other number.

Once we have 2 subsets, theoretically we can go back to initial version by making each-with-each combination and then the complexity will go back to O(2n).

And here it comes the second filter – the value of the current combination. Using it we can prevent part of exactly this “going back”. In fact, we will really go back as we will have to check many combinations that contain members of both 2 subsets. But using the “current value” we can avoid part of the job.

So what is the “filtering power” of current value?

Generally we can benefit in 2 ways – by filtering final calculated values of other combinations, or by filtering the initial numbers that start creating combinations and also already calculated combinations from which next level of combinations descend.

These are 2 different approaches that are used in different algorithms. In this article I will not talk about any algorithm, nor focus on inventing algorithms. I am only analyzing the limits of filters that are one or another way implemented in these algorithms.

In filtering against final calculated values of other combinations, using the current value of current combination can be very powerful. If the subsets are sorted the current value can filter almost all useless combinations. Imagine a set of 1000 calculated positive values that are sorted. You have a negative current value. So you can theoretically filter (by some type of genius algorithm) all, or almost all, the values that are not equal to your current value. So the theoretical filtering power of current value is close to 100% when used against other calculated values that are sorted.

Anyway in this case the computer power will have to go to calculating other values and sorting them. In the case of a set divided in 2 subsets, we will have O(2n/2) for creating each subset plus some logarithmic cost of sorting. So if we want to use this extreme filtering power we will have to “invest” in initial calculations for creating subsets. If they are equal the complexity will be O(2n/2). But if we have unequal as size subsets then we risk increasing the complexity to the one of the bigger of the 2 subsets (for instance O(2n/4)- small and O(23n/4)-big).

So we come to the second approach. We can use the filtering power of current calculated value to filter combinations before they are calculated. For instance with current calculated value of -5 and a target of 0, we can filter all combinations that are based on positive numbers larger than 6. This means all 1-number possibilities and all 2,3,4…n combinations that descend from 6 and upper numbers. In addition we can on next step filter all 2-number combinations that exceed value of 5. For instance we will check the number 2, but will filter the combination “2+4” and all consecutive combinations. And so on.

All this can be done if we are using some type of tree-building algorithm with any new combination coming from the previous one. Combined with initially sorted set (or subset), this filtering can be very impressive as a size. And we don’t forget that in this case we have not still invested computer power in calculating all-possible-combinations.

But what is the limit of this filtering before calculating? In fact the calculation is not complex. In this case the dependency is something like “linear”. Not exactly linear because the function is based on discrete numbers and is not uninterrupted. But generally the calculations are the same. The number of calculations that can be filtered and avoided of calculating is depending on the absolute size of the current calculated value. If it is very low and has a low absolute value of the proportion of this value and maximal theoretic value for the subset, then we will be able to filter almost all of the combinations. Imagine current value of -2 and 1000 positive numbers in subset of positives with 999 numbers higher than 2. You will be able to filter 99,9% of all these combinations.

But look at the other side. Imagine current calculated value of -10000 and 1000 positive numbers with 999 of them lower than 10000. You will avoid just 0,01% of combinations. Of course you will have the ability to filter among 2,3…n number combinations. But theoretically if you have a current value that is close to the maximal possible calculated value (as absolute value, it can be negative) then you will be able to filter almost nothing.

So this dependency is something like linear. There are math rules here that lead us to filtering about half of the possible combinations. Тhe function is something like:

z=(y(x1)+y(x2)+…+y(xn))/n

y(x)=(x/constant1)*constant2.

Where

z – final number of calculated combinations

y - number of checked combinations for a given value of x

x – current calculated value

constant1 – theoretical max calculated value

constant2 – average density of combinations in set

The sum of all members(y) is something like half of the all possible combinations with linear-type function like this.

So is that good and enough?

If you don’t have any initial divide this economy is not so impressive. You will go to O(2n/2).  If we try a mixed approach we can calculate all and sort the smaller subset, and then try filtering with the bigger subset. In our previous cate - O(2n/4)- small and O(23n/4)-big). So the optimization will be to a level of O(23n/4/2). Obviously much worse than O(2n/2) with 2 equal as size subsets.

And that is the end. That is all. All possible possibilities are checked. The power to filter the combinations is limited to O(2n/2). And no algorithm based on these three “worms” can do better. In fact this is the answer of the question why after 4 decades of searching it is not yet discovered an algorithm better than the one of Horowitz and Sahni*. In fact this algorithm is very close to the theoretically possible lowest level complexity O(2n/2) and any other better algorithm will not bring enormous improvement.

So finally – if the “Subset sum problem” is proven unsolvable in non-exponential time, and as it is a NP-complete problem, therefore:

P≠NP

Jan-04-2016                                                                                   Author:  Dobri Bozhilov

------------------

References:

Horowitz, Ellis; Sahni, Sartaj (1974), "Computing partitions with applications to the knapsack problem", Journal of the Association for Computing Machinery 21: 277–292, doi:10.1145/321812.321823, MR 0354006