Surely you could implement this via a sorting algorithm? If you can prove the distance function is a metric and both lists contains elements from the same space under that metric, isn’t the answer to sort both?
unless the problem space includes all possible functions f , function f must itself have a complexity of at least n to use every number from both lists , else we can ignore some elements of either of the lists , therby lowering the complexity below O(n!²)
if the problem space does include all possible functions f , I feel like it will still be faster complexity wise to find what elements the function is dependant on than to assume it depends on every element , therefore either the problem cannot be solved in O(n!²) or it can be solved quicker
By “certain distance function”, I mean a specific function that forces the problem to be O(n!^2).
But fear not! I have an idea of such function.
So the idea of such function is the hamming distance of a hash (like sha256). The hash is computed iterably by h[n] = hash(concat(h[n - 1], l[n])).
This ensures:
We can save partial results of the hashing, so we don’t need to iterate through the entire lists for each permutation. Otherwise we would get another factor n in time complexity.
The cost of computing the hamming distance is O(1).
Order matters. We can’t cheat and come up with a way to exclude some permutations.
No idea of the practical use of such algorithm. Probably completely useless.
honestly I was very suspicious that you could get away with only calling the hash function once per permutation , but I couldn’t think how to prove one way or another.
so I implemented it, first in python for prototyping then in c++ for longer runs… well only half of it, ie iterating over permutations and computing the hash, but not doing anything with it. unfortunately my implementation is O(n²) anyway, unsure if there is a way to optimize it, whatever. code
as of writing I have results for lists of n ∈ 1 … 13 (13 took 18 minutes, 12 took about 1 minute, cant be bothered to run it for longer) and the number of hashes does not follow n! as your reasoning suggests, but closer to n! ⋅ n.
anyway with your proposed function it doesn’t seem to be possible to achieve O(n!²) complexity
also dont be so negative about your own creation. you could write an entire paper about this problem imho and have a problem with your name on it. though I would rather not have to title a paper “complexity of the magic lobster party problem” so yeah
Good effort of actually implementing it. I was pretty confident my solution is correct, but I’m not as confident anymore. I will think about it for a bit more.
you forgot about updating the hashes of items after items which were modified , so while it could be slightly faster than O((n!×n)²) , not by much as my data shows .
in other words , every time you update the first hash you also need to update all the hashes after it , etcetera
so the complexity is O(n×n + n×(n-1)×(n-1)+…+n!×1) , though I dont know how to simplify that
I’m taking into account that when I update a hash, all the hashes to the right of it should also be updated.
Number of hashes is about 2.71828 x n! as predicted. The time seems to be proportional to n! as well (n = 12 is about 12 times slower than n = 11, which in turn is about 11 times slower than n = 10).
Interestingly this program turned out to be a fun and inefficient way of calculating the digits of e.
if (recalc || numbers[i] != (hashstate[i] & 0xffffffff)) {
hashstate[i] = hasher.hash(((uint64_t)p << 32) | numbers[i]);
}
Since I decided to pack the hashes and previous number values into a single array and then forgot to actually properly format the values, the hash counts generated by my code were nonsense. Not sure why I did that honestly.
Also, my data analysis was trash, since even with the correct data, which as you noted is in a lineal correlation with n!, my reasoning suggests that its growing faster than it is.
Here is a plot of the incorrect ratios compared to the correct ones, which is the proper analysis and also clearly shows something is wrong.
Anyway, and this is totally unrelated to me losing an internet argument and not coping well with that, I optimized my solution a lot and turns out its actually faster to only preform the check you are doing once or twice and narrow it down from there. The checks I’m doing are for the last two elements and the midpoint (though I tried moving that about with seemingly no effect ???) with the end check going to a branch without a loop. I’m not exactly sure why, despite the hour or two I spent profiling, though my guess is that it has something to do with caching?
Also FYI I compared performance with -O3 and after modifying your implementation to use sdbm and to actually use the previous hash instead of the previous value (plus misc changes, see patch).
You have two lists of size n. You want to find the permutations of these two lists that minimizes a certain distance function between them.
Surely you could implement this via a sorting algorithm? If you can prove the distance function is a metric and both lists contains elements from the same space under that metric, isn’t the answer to sort both?
It’s essentially the traveling salesman problem
unless the problem space includes all possible functions f , function f must itself have a complexity of at least n to use every number from both lists , else we can ignore some elements of either of the lists , therby lowering the complexity below O(n!²)
if the problem space does include all possible functions f , I feel like it will still be faster complexity wise to find what elements the function is dependant on than to assume it depends on every element , therefore either the problem cannot be solved in O(n!²) or it can be solved quicker
By “certain distance function”, I mean a specific function that forces the problem to be O(n!^2).
But fear not! I have an idea of such function.
So the idea of such function is the hamming distance of a hash (like sha256). The hash is computed iterably by
h[n] = hash(concat(h[n - 1], l[n]))
.This ensures:
No idea of the practical use of such algorithm. Probably completely useless.
honestly I was very suspicious that you could get away with only calling the hash function once per permutation , but I couldn’t think how to prove one way or another.
so I implemented it, first in python for prototyping then in c++ for longer runs… well only half of it, ie iterating over permutations and computing the hash, but not doing anything with it. unfortunately my implementation is O(n²) anyway, unsure if there is a way to optimize it, whatever. code
as of writing I have results for lists of n ∈ 1 … 13 (13 took 18 minutes, 12 took about 1 minute, cant be bothered to run it for longer) and the number of hashes does not follow n! as your reasoning suggests, but closer to n! ⋅ n.
link for the desmos page
anyway with your proposed function it doesn’t seem to be possible to achieve O(n!²) complexity
also dont be so negative about your own creation. you could write an entire paper about this problem imho and have a problem with your name on it. though I would rather not have to title a paper “complexity of the magic lobster party problem” so yeah
Good effort of actually implementing it. I was pretty confident my solution is correct, but I’m not as confident anymore. I will think about it for a bit more.
So in your code you do the following for each permutation:
`for (int i = 0; i
you forgot about updating the hashes of items after items which were modified , so while it could be slightly faster than O((n!×n)²) , not by much as my data shows .
in other words , every time you update the first hash you also need to update all the hashes after it , etcetera
so the complexity is O(n×n + n×(n-1)×(n-1)+…+n!×1) , though I dont know how to simplify that
My implementation: https://pastebin.com/3PskMZqz
Results at bottom of file.
I’m taking into account that when I update a hash, all the hashes to the right of it should also be updated.
Number of hashes is about 2.71828 x n! as predicted. The time seems to be proportional to n! as well (n = 12 is about 12 times slower than n = 11, which in turn is about 11 times slower than n = 10).
Interestingly this program turned out to be a fun and inefficient way of calculating the digits of e.
Agh I made a mistake in my code:
Since I decided to pack the hashes and previous number values into a single array and then forgot to actually properly format the values, the hash counts generated by my code were nonsense. Not sure why I did that honestly.
Also, my data analysis was trash, since even with the correct data, which as you noted is in a lineal correlation with n!, my reasoning suggests that its growing faster than it is.
Here is a plot of the incorrect ratios compared to the correct ones, which is the proper analysis and also clearly shows something is wrong.
Anyway, and this is totally unrelated to me losing an internet argument and not coping well with that, I optimized my solution a lot and turns out its actually faster to only preform the check you are doing once or twice and narrow it down from there. The checks I’m doing are for the last two elements and the midpoint (though I tried moving that about with seemingly no effect ???) with the end check going to a branch without a loop. I’m not exactly sure why, despite the hour or two I spent profiling, though my guess is that it has something to do with caching?
Also FYI I compared performance with
-O3
and after modifying your implementation to use sdbm and to actually use the previous hash instead of the previous value (plus misc changes, see patch).It has been a pleasure having this internet argument with you. I learned a bit, and you learned a bit. It’s a win win :)
actually all of my effort was wasted since calculating the hamming distance between two lists of n hashes has a complexity of O(n) not O(1) agh
I realized this right after walking away from my devices from this to eat something :(
edit : you can calculate the hamming distance one element at a time just after rehashing that element so nevermind
Scalabe is not always quicker. Quicker is not always scalable.