Persistance - A Problem Point Of View.
What This Post Is About:
Disclaimer : I will assume the reader is familiar with basic path-copying persistance.
Here I will mostly elaborate some usecases of persistance, as these are ussually not given in a blog that teaches persistance. Note that these are mostly Competitive-Programming specific usecases.
Getting Started :
Persistance is mostly used when there are multiple query parameters. For example, if in a segment tree (as the most common usecase of persistance is in segment trees) if we have a query like (L,R,X) where X is some parameter, we would usually store a lot of information in each node and then binary search some information using the value X, in all the nodes that we would reach that would satisfy the (L,R) bound while traversing the segment tree, like we do in a merge sort tree. Persistance serves as an easy replacement for these types of problems, where it can reduce a O(logn) factor from our time complexity, at the cost of adding a O(logn) factor in our space complexity.
A Predominant Usecase:
PROBLEM:Given an array of numbers A[1...N] and Q queries of the form [L,R,X] where we have to report the number of numbers in range L to R that are less than X. Offline is allowed. All numbers are given to be distinct and less than 1e6. N,Q < 2e5.
This is perhaps the most basic example of the use case of persistance. When I first came across this problem, I used a mergesort tree, and sure enough got TLE. (This was a subproblem in another problem, so there was some additional time complexity factors, which made the extra logn of the merge sort tree problematic. Note that often these will come as subproblems of some larger problem, where this will just be used as a tool). Afterwards, a senior gave me a hint : use persistance! By that time, I had just learnt persistant data structures, and knew how it copied some small amount of information when a certain value was modified, so that all previous versions were intact. However, in the above problem , we are making NO update operations! And I was very confused as to how persistance might be useful in this scenario.
HINT1: offline is allowed.
HINT2: process the elements in sorted order
HINT3: version numbers can be made during construction and not just query.
SOLUTION: Create an array B[1...N] where initially B[i] = 0 for all i -- we will build our segment tree on this array. Now, we process the elements in sorted order. Now, whenever we insert an element A[k], we update B[k] to 1. Note that here K is the initial location of the element, and not the sorted location. Now, everytime we insert a new number, we increase the version. So essentially, its these update operations while constructing the segment tree that is modifying it. If we look at what we did, we basically made each different number a different version. How does this help? well, suppose we want to have an array which only contained numbers less than X. But that is exactly the version X of the segment tree. This is because till then we had only entered number less than X, into the segment tree, and the places which had greater numbers still have a 0 in them, meaning they are not filled. Regarding the actual query, we just go to the Xth version, and get the sum in the range [L,R] in that version. As all processed numbers have a 1 in that place, we get exactly how many numbers are there in that range.
If the condition that all numbers are distinct was removed, we only have a slight change. Instead of creating a new version for each value, we process the numbers in batch (each batch containing all numbers of same value) and update the version only for these batches. Key is to update the version for distinct numbers.
Extending:
Thus, in the last example, we just processed a 3dimensional query in time proportional to that of standard L,R queries. However a key observation is that if we have several datapoints per element in a list/array, we can only make those datapoints as the version number which are not dependent in online processing. Like in the last problem, if there were update queries involed as well, we could not use this method as changing an array element would mean changing all the versions after the version in which that number was initially processed in. However, such a cumulative udpate is not possible over the versions in a segment tree, which form the basis for the study of retroactive data structures.
The last problem is basically a tool for solving subproblems directly. Imagine it as a black-box technique that can solve a variety of similar problems, as long as we can choose one of the datapoints as a valid version number, so that such cascading updates over different versions are not formed.
Sometimes, instead of making an element of the array as a version number, we make the array index itself as the version number. case in point : MKTHNUM. This is however a widely discussed problem : i wont be going over its solution here.
The key intuition is thus that when we see a segment tree type problem, with some additional informations, we can usually build a persistant segment tree using the additional data.
Let's see another very common blackbox function : finding range intersection.
PROBLEM: We are given N ranges (X1,X2) and Q query ranges (L,R). we have to find number of ranges that fully cover it.
SOLUTION: First, we consider points (x1,x2) in the 2D plane. Now, construct a segment tree on the X axis, so that each node contains all y coordinates of points that fall in the range of the node. Previously we could have sorted vectors in these nodes for easy binary searching during the queries, which is basically a merge sort tree. However, instead we can make a persistant segtree where the version number is the X coordinate, and the segment tree is basically a frequency table for the y coordinate. The rest I will leave out to be figured out.
Just kidding: left as an excercise to reader is sometimes frustrating so I will outline the queries too.
For the query part, we just need to find the sum in the segtree for range L to R, for the versions L-1 and R. Then subtract the result to get our answer. A hint as to why we subtract is that when we update the segtree, we still store the previous sums and add to it, like in a cumulative sum. This is why we subtract to get the answer pertaining only to the current range.
For more on these types of problems, read my other blog -- "Moving up a dimension 1 and 2".
Some Tree Applications:
The previous usecases were somewhat out of the ordinary for someone who just learnt persistance. However a much more vanilla usecase is when we make operations on a tree.
- The first kind of operation is adding a node to any other node by an edge in a tree. This is very straightforward as we just make the update like we have learnt, and the version number is just he update number.
- A much better application is when we have to answer online queries regarding a tree, which would have been solved faster if it was offline. Suppose if we know the solution to the problem if it had been an array instead of a tree. And that for this problem, we would have to maintain a segment tree while doing a dfs. Ofcourse, if it were offline, we would do all the additions when visiting a child and then delete those information when going back from it, and do the calculations then and there. But now, we can only precompute -- the actual queries need to be answered online. Here, what we can do is that we can store the state of the segment tree after the preprocessing! What I mean is that when we go down the child, we again add the information in the segment tree. But we do this after creating a new version for our current node. Thus, the state for the previous node is saved under a different version, and my current version is also saved for this node where I made the modifications. Thus, we store all the precomputation for the nodes, and when we get the queries, we just access the corresponding versions, and the segment tree for that version is just as if we had done everything offline and maintained the segment tree through our DFS. I once made a problem, as an implementation overkill, where we had to maintain a mergesort tree during the DFS, and just for fun I made it online, so that you had to make the whole mergesort tree persistant XD. Note that sometimes it may be necessary to maintain a persistant segtree even while offline processing. This can be because the version number in the persistant segtree is something different due to the problem, and not the nodeID. Or because the operation that we perform while going to our child might not be invertible(reversed).
Persistant DSU
Suppose you need to make a DSU persistant. This is usually not required as rollbacks are more efficient for maintaining DSUs during suppose a tree DFS. However, if we also need to make updates to pervious versions, we obviously need to make it persistant. Now the way to make a DSU persistant is by making its underlying parent array persistant. The easiest way to make an array persistant is by imagining it as a segment tree. However note that in a persistant DSU, path copying is a bad choice as in path copying, the main motive was that access time became amortized with more and more queries. However in a persistant model, we can always go back to a bad history version and make queries, which would prevent amortization. Thus we need to apply rank compression, which is basically a small to large merge. As a general rule of thumb, amortized data structures are hard to make persistant because of this reason.
These are some of the usecases I could think of, that might not be obvious to someone who just learnt persistance, as some of these utilize the version concept much more differently than the standard interpretation. I will keep adding more.
Thanks for all your Valuable Comments...This Technology Topic is Very Interesting...I love to Learn more About this Topic...Keep Updating
ReplyDeleteJava training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
Thank You Sir It is really! helping.
ReplyDeleteAw, this was a decent post. Taking the time and genuine exertion to create a brilliant article… yet what would i be able to say… I tarry a ton and never figure out how to complete anything. best interiors
ReplyDeleteAs you can guess from the name, EaseUS Data Recovery keygen is pirated software created to bypass the program's security system. Easeus Data Recovery Wizard Crack
ReplyDelete