Posts

Showing posts from 2018

An Introduction to MongoDB

What is MongoDb? MongoDb is a opensource database, that stores data in a document-oriented data model. It is what an example of what is commonly known as a NoSQL database. But let's get a bit thorough on this.  Normally, we are usually introduced to Relational Databases, like MS Access. Those are primarily SQL based languages. They have tables in which data is stored. Each table has multiple rows, and each row has multiple columns or fields. The data stored follows a well defined schema. Moreover, relations can be set-up between various tables, so that redundant data is not stored. One of the main advantages is that we can thus change the data in only one location and the effects are observed everywhere. Also, due to the presence of a schema, we know beforehand what all fields exist in the tables. But enough about Relational/SQL databases.  MongoDB, however is very different from the setup. Instead of tables, it has what are called as "Collections", though o

Persistance - A Problem Point Of View.

What This Post Is About: Disclaimer : I will assume the reader is familiar with basic path-copying persistance. Here I will mostly elaborate some usecases of persistance, as these are ussually not given in a blog that teaches persistance. Note that these are mostly Competitive-Programming specific usecases. Getting Started : Persistance is mostly used when there are multiple query parameters. For example, if in a segment tree (as the most common usecase of persistance is in segment trees) if we have a query like (L,R,X) where X is some parameter, we would usually store a lot of information in each node and then binary search some information using the value X, in all the nodes that we would reach that would satisfy the (L,R) bound while traversing the segment tree, like we do in a merge sort tree. Persistance serves as an easy replacement for these types of problems, where it can reduce a O(logn) factor from our time complexity, at the cost of adding a O(logn) facto

Moving Up a Dimension - Part 2

Overlapping Ranges First, suppose we have the following problem: We are given some ranges of the form [L,R] For each query we will be given a range [QL,QR] and we will have to report the count of all the ranges that lie completely within this range. In an update, we can add/delete a new range Main Idea:  The main idea that is not easy to come up with, and could  seem as being an overkill for this but is useful for a variety of problems, of which this example is just a simple case. We are given ranges in a 1D numberline. The crux is that we want to convert these ranges from a 1D line segment to a 2D point. How? Well, for each range that we are given [L,R] just define a point as (x=L,y=R). Thus the updates are basically just adding or deleting points. And the queries? Well, define the query as a point P(Ql,Qr). Now we just have to count the number of points whose X-coordinate is more that that of P and whose Y-coordinate is less than that of P. Thus forming a query-rec

Moving Up a Dimension - Part 1

Merge-Sort Tree I'll start with this because this is quite useful, but does not have much material on it in the internet. The construct is simple enough. First, build a normal segment tree Wait! don't finish building it :P . Instead of store a single value in a node, we store a vector of values. When we merge two nodes, we do a 2-ptr on the vectors to build our resultant vector so that the sorted order is preserved. During queries, we just go down to our terminal nodes and usually do a binary search on the values. Proof of space and time complexity of building is very similar to merge sort. Query time is O(n log² n) (since we normally do an extra binsearch in each of the terminal nodes, which there are O(logn) overall) Just to give an example, suppose we are given an array of size N and there are Q queries each of which gives a L,R and K and asks how many values in range L to R are less than K. For this, first create a merge-sort tree where each node'

Small-to-Large

The Vanilla Method Let's take the given problem: We have some sets as U = {S1,S2,S3....Sn} where U is the "container" set. Initially each set contains a single element. After each operation each set will contain some numbers. We define the cost of a set as the number of unique elements in the set. For each query, we merge two unmerged set and then report the sum of cost of all the sets.  The most brute force approach is for each query, add all numbers in one of the sets and put them in the other set. This will take approximately O(nlogn) time per merge operation and since there can be about O(N) mergings (because after atmost N mergings there will be only 1 set left) this will take time proportional to O(n^2logn) time. For the report, part, we just return the sum of the sizes of the set. This part is O(1) as we only have to check the size of the new merged set as the sizes of the other sets haven't changed. This is obviously a poor algorithm for a seem

Intro To Recurrences (And DP)

Image
What Are Recurrences? Let's start with a simple problem:  There are N students standing in a line. The first student gets 1 chocolate. The second student gets 2 chocolates. The third student however demands that he gets twice as many chocolates as the first student plus the number of chocolate that the second student got. Similarly, the fourth student said that he wanted twice the amount of chocolate that the second student got plus the amount the third student got. Thus, each student wants the amount that the previous student got plus twice the amount the student before that had got. We need to find the total number of chocolate that we are going to need to give to all the students.  The obvious solution is to first give student1 1 chocolate, student2 2 chocolates, student3  2*1+2=4 chocolates, student4 = 2*2+4=8 chocolates and so on and then take the sum of all the chocolates that the students had got. Well, the key idea here was that if we know how many chocolates th

Spanning Trees

Image
Spanning Trees in a Graph We are given a graph with say N vertices and M edges. We call a subset of these edges a spanning tree  of the graph so that if we keep only these edges, the graph still remains connected and so that the residual graph is a tree. This can also be thought of as specifying discreetly that we have to select exactly N-1 edges out of the total M that we are given. Some points that should be specified here is that there can be a lot of different number of spanning trees of a graph. Also, ALL the nodes need to be connected by the set of edges that we had selected in our spanning tree. Moreover, it doesn't really depend on whether we have weighted edges or unweighted edges, though we will shortly be looking at different types of spanning trees, some of which do take advantage of edge weights.  Finding Spanning Trees To find any random spanning tree of a graph a simple DFS will obviously suffice. Assuming the graph is connected, the

DAGs and SCC

Centroid decomposition

DISCLAIMER: I will not  be going over centroid tree. Just the decomposition technique as its far more useful than the actual centroid tree. Usefulness Before I describe the technique, it's better if i go over the usefulness of this technique first. Suppose we have been given a problem where we have to find the sum of F that is commutative that we have to compute over all possible paths in out tree. Oh and yes.. THIS TECHNIQUE IS ONLY FOR TREES . For now, let F be sum of edgeweights. Note that this could be done using other techniques too, and even faster than centroid decomposition, but this will serve as a nice example for the basics. We will approach harder problems later. Main Idea So the idea here is, it would be very nice if we could root the tree at some point, and then calculate this function for all paths passing through the root. This would have been done easily with a DFS. The problem is, we will then need to root the tree at various points, and calculate

Trees - Euler Tours

Image
Euler Tour        We mostly divide our problems into mostly two types: Array problems and Tree/Graph problems.  But what if we can convert some specific type of tree problems into actually array problems?  Problem: We are given a Tree. We have to perform two types of operations:   Query: Find the sum of weights of all nodes in the subtree of the given query node. Update: Set the weight of a given node to X. Obvious Solution? For each update just make the corresponding weight of the given node to X. For Queries, traverse the whole subtree and find the sum. This has a worst case of O(N*Q). However we will come up with a O(nlogn) solution to this same problem by the end of this post. Start Time/End Time When we do a DFS, suppose we keep a global variable called time.Whenever we visit a new node, we are doing something new and thus we "increase" the time. This can be thought of as a time variable that we are manually keeping track of. NOT the s

Trees

Image
Trees Trees are a special form of undirected graphs. It is basically a graph that is only 1 connected component and that has no cycles in it. We generally draw a tree in a "reverse-tree" way --- starting from a single node and then branching out as it goes lower and lower. The main properties of a tree? As mentioned above, it has no cycles. and each node has exactly 1,what we call , parent, and multiple children. Here, for example, 3 has parent 1 and children 6 and 7. 4 has 2 as its parent and 8 and 9 as its children. A node can have any number of children, but exactly 1 parent. EXCEPT the "root" node. Here for example, node 1 is the root node. The root node is very important. It is used in several  calculations that are performed in a tree.  It is also the starting point for a BFS or a DFS in a tree. However, note how BFS and DFS are so much simpler in a tree rather than in a general graph. Here, when we do a DFS, we only go downwards and never

Graphs For Beginners - Part-3

Image
Types Of Graphs We generally encounter the following types of graph in a problem. Each of them has some special properties which are often useful. Trees - This is by FAR the most common thing we shall encounter. Some don't even consider problems based on it as graph problems, still I will add this here for sake of completeness. Directed Acyclic Graphs (DAGs) - This is also a commonly recurring type of graph with lots of nice properties. For example, since cycles (acyclic .. duhh) do not exist, we are able to easily calculate DP on it. Later, we will show how ALL DP problems are just DAGs.  Planar graphs - These are quite rare but do appear in problems nonetheless. Special properties includes the application of euler's theorem and consideration of its dual graph. The vanilla version - This is our plain old graph, either directed or undirected. There are no special properties for this graph. Implicit graphs - sometimes the problem statement may not directly look like