Segment Tree Beats - An introduction

Segment Tree Beats 

Disclaimer : This post is based on a recent post made by jiry_2 on codeforces.


Before I begin, a clear understanding of segment tree and lazy propagation is required. 
If you don't know those, I would recommend you to go search it up online and read it thoroughly, before coming back. With that out of the way, here goes...

Hey wait wait wait. Hold on. Just to clarify, now that you have read it, in case you are confused with what I call terminal nodes in a segment tree update/query, it's basically the node that is completely inside the update range, the one where we generally end our search in a segment tree. I say generally.. well.. you will see :). Anyway without further ado... 

A simple problem

So the first problem I want to address, and it is fairly easy, is supporting two operations on a range:
  • Range minimize (for each i in range, make it min(i, update_value) ) 
  • point query

Solution :

This is just a variant of the normal lazy propagation - the lazy values will just contain the min value, and when making a pushdown(), instead of adding the value to the lazy value of the children, u instead make lazy_child = min(lazy_child, lazy_myVal).

To get the point query, just go down to the leaf and retrieve the value.

The Actual Problem


The actual problem at hand is supporting the following operation on a range:
  • Range minimize(defined in the same way)
  • Range sum

Why the above method fails ....

Here, in this problem , we need to maintain the sum at each node. Now see that, a node basically represents a range in the original array. Thus it may have several different values. In the lazy propagation world, what we want to do is calculate the new value of a node directly in constant time without visiting its children, if we see that it's range lies completely within the update range. 

See the problem ?! Yes. You can't do this update in case of minimization in constant time.
The problem is the following: this node represents several different values. After a range min update, the values that were below this wouldn't get affected, while those above it will be decreased. What more, each of the values will be decreased by a different amount. 

To give an example, say a terminal node represented the values {1,2,3,4,5,6,7} (these are not indices.. the actual values. I am taking them like this just to make it simpler). Now suppose the min update value was 5. So the end result would make the set of numbers like so : {1,2,3,4,5,5,5}.

However notice that, 1-5 were not decreased at all, 6 was decreased by 1, and 7 was decreased by 2.
( Note that even though in this case the numbers are *coincidentally* in an AP, its just for example. normally it would be messed up, and not even sorted)

The main idea of Segment Tree Beats

So before I dive into the actual solution and this variation of the segment tree structure, let me set the stage for it first.

The main idea is weirdly interesting. We define a MAX value and a SECOND_MAX value for each node, apart from its sum. MAX is the maximum value present in the range represented by the node, while SECOND_MAX is the ***STRICT*** second maximum value in the range.

The main idea is this: wouldn't it be great if we knew that the update min value was always between the MAX and the SECOND_MAX, or above max altogether ? Why you ask? Well consider this. If it was above MAX altogether, no value would be affected anyway. If it was less than MAX but more than the SECOND_MAX, then we would ***ONLY*** have to update the MAX and set it to update_val. See the advantage? Well, the original method failed because we were updating different values by different amounts. Here however, we update only the MAX value, ie, only one kind of value is being updated. Thus we know exactly the amount of change that it would be causing to the overall sum of the node: If there were CNT values in the range that were equal to MAX, then the overall change would be that we would have to subtract CNT*(MAX-updateValue) from the sum.

If the above part wasn't clear, let me give you an example. Suppose the range given by the terminal node was {1,3,5,6,6,7,9,9}(again, just for clarity, I present the data in a sorted fashion).
Now it is clear that MAX is 9 and SECOND_MAX is 7.

If we hold to our assumption that SECONDMAX<= queryval <= MAX, then 8 is a valid minimization query value. 
Here, it is clear that both the 9's will be decreased by 1. Point is, we know exactly which value will be decreased, as there can be only 1 such unique value. Moreover we keep a count of the number of this value present in the range represented by this terminal node, so that we can account for the change in all the same MAX values( in this case there are 2 9's).

But yes. That is not possible in a normal segment tree. Well I will change that opinion in a moment.
What are terminal nodes? Its when the range represented by the node is completely within the update range. This is where we set the lazy tag. So let's just call it a TAG_CONDITION. However, think about this: what if we added extra clauses ?? This will mean that the node will have to satisfy additional conditions if it were to be a terminal node, not just the normal within-the-update-range condititon. Seeing where we are going with this? Exactly. add the condition "SECOND_MAX<= queryVal <= MAX" as an additional clause to the TAG_CONDITION. This means that for a node to be demarked as a terminal node, not only will the range have to be totally inside, but also this above condition has to be satisfied. ( NOTE : its a AND clause, not a OR clause. Thus the TAG_CONDITION is now much stronger).

Thus this is exactly what we wanted to enforce in the first place. Now the terminal nodes are exactly how we wanted them to be: we can update the sum of those terminal nodes in constant time as I had demonstrated. 

This is the core idea of segment tree beats. 


.... Hold on!. Won't this stronger TAG_CONDITION result in visiting more nodes than usual? We will be visiting the children of the normal terminating nodes in this process too! And by looks of it, LOTS of them. How is this even upholding the nice complexities of a segment tree?

Well, it turns out that this specific problem is indeed nlogn using this aforementioned technique. Make the problem a bit harder and we get a log²n general proof.

However I will leave the proof for the next post.


[Credits: The original post in codeforces is obviously the main source of this blog. This is just my understanding and an attempt to simplify the logic, because the codeforces post was quite complicated to understand in one go, in which I hope this would help. Also I had a lot of insightful discussions with my friend Shashwat Goel which helped me better understand the concepts]












Comments

Post a Comment

Popular posts from this blog

Small-to-Large

Centroid decomposition