Learning Notes on Hashing

I wouldn’t think of summarizing problems on hashing if I didn’t include “hashing” in the tag for my training plan and see those nice problems recently… While hashing can be used in string checking (so-called “polynomial hashing”), it can be extended to do things like set identity checking.

Problems like this is one of the direct application: We can give each number a random number. Then if two set has the same xor sum, the content of the set will likely be the same.

Shortest/Longest Path Cut Vertex Checking: 1214D, 650D
Sometimes, if we want to check if a vertex is neccesary for a shortest path/longest path, we can use hashing to calculate the number of ways and see if $f(s, v) * f(v, t) = f(s, t)$, where $f(u, v)$ is number of ways to go from $u$ to $v$.
P.S. There are simpler ways as talked here.

Permutation Checking: 213E
We want to maintain a data structure which characterizes a permutation, and supports addition and deletion of an element. How to characterize a permutation? Well, we can put the question like this: ” What uniquely determines the permutation? ” One way that can be implemented relatively easily is inversion sequence, where is $i$th term is the number of numbers before the number $i$. We can keep this structure efficiently using a Fenwick Tree.

Symmetry Checking: 452F
In this problem, we want to check if a certain group of number is symmetric. To solve this, we can store two values for a group, giving powers to numbers in reverse order. Say the frequency list is $[1, 1, 1, 0, 0]$, we can give them two string $”11100″$ and $”00111″$. To check whether they are symmetric around, for example, the second element, we can shift them (in hash) and then check whether they are the same.

Learning notes on 2-SAT

Algorithm to solve 2-SAT

First decompose the implication graph into SCCs, and then if $x_i$ and $\neg x_i$ is in the same SCC, then the answer is impossible. Else it suffices to choose the variable that appears later in the topological order to be true.

Problems and Techniques

Enforcing one variable to be true: link
To do that, we can just add an edge from $\neg x_i$ to $x_i$

Enforcing at most one variable among several to be true
This can be done in $O(n)$. Order the nodes arbitrarily. Then we can build prefix nodes for each prefix part of the nodes and establish implication between those prefix nodes and the variables themselves.

Learning notes on Palindrome Tree

Basic things this structure can do
  1. the number of distinct palindrome substring in a string (which turns out to be the number of extra states, which is linear)
  2. the number of times each palindrome appears
  3. The number of palindromes (or sum of the corresponding properties) that ends in a certain location.
  1. The size of the tree is $O(n)$. This is a result of the fact that at most one new distinct palindrome can be added when we add a new character. (Otherwise, choose the longest of them, $s_{longest}$, and the shorter one will definitely appear if we flip it from the middle point of $s_{longest}$.)
  2. Each fail pointer is pointing to a smaller index. So when we want to do tree DP, we can just use the index as the topological order.
Other tricks:

Adding characters from both sides: HDU 5421
To do such thing, we can keep a extra pointer at the begining of the string. Both pointer will be the longest palindrome prefix/suffix. The operations will be symmetric, and, additionally, when the new distinct palindrome we get is the new whole string, we need to update both pointer to keep the property of the pointers.

Series Link

To be continued…

Prüfer Code, Cayley’s Formula, etc

Prufer Code is a way to represent a labeled tree as an array, which is related some tree counting problems

The Prüfer code is a way of encoding a labeled tree with $n$ vertices using a sequence of $n-2$ integers in the interval $[0, n−1]$

Conversion and Bijection between Tree and Prüfer code

We can construct Prüfer Code from a given tree by removing the smallest leaf and add the number of the only vertex adjacent to it to the list until there are only two vertices.

We can also constructing a tree from a given Prüfer Code by the following procedure.
for i from 1 to n – 2:
    Let the smallest leaf not in the current sequence and not used be u
    Let the first element in the sequence be v;
    Mark v as used;
    Connect u and v;
    Delete the first element in the sequence;
Connect the last two unused vertices;

From the construction, we know each Prüfer Code has a tree, and each tree has a Prüfer Code. Let $f: \text{Tree} \rightarrow \text{Prüfer Code}$ be the function we use to convert a tree to a Prüfer Code. We can further prove $f^{-1}(f(t)) = t$ for any tree $t$ by induction, which implies that there is a bijection between labeled trees with $n$ vertices and Prüfer Code with length $n – 2$. The direct result of the bijection is Cayley’s Formula as shown below.

Cayley’s Formula and Generalized Version

Theorem 1 (Cayley’s Formula): For any integer $n > 0$, the number of trees of $n$ labeled vertices is $n^{n-2}$.

Theorem 2 (Generalized Cayley’s Formula): Let $T_{n, k}$ be the number of labelled forests on $n$ vertices with $k$ connected components, such that vertices $1, 2, …, k$ all belong to different connected components. Then $T_{n, k} = k n^{n-k-1}$.

Property of Prüfer Code

1. The vertex with the highest number will not be in the Code
2. Each vertex $i$ appears $d_i – 1$ times in the Code, where $d_i$ is the degree of vertex $i$.

Lemma 3: Given the degree of each vertex in a tree of size $n$, $d_1, d_2, …, d_n$, the number of possible trees is $$\frac{(n – 2)!}{(d_1 – 1)! (d_2 – 1)! … (d_n – 1)!}$$

Related Problems

Prüfer-Code-like Bijection: CF156D
Problem in short: Given a graph, find the number of ways to connect the graph with minimum edges.
Idea: We can construct sequence $P$ and $Q$ that represent the final graph similar to Prüfer Code. Let $k$ be the current number of connected component. $P$ denotes the vertex $v$ that we connect to when we remove the smallest labeled leaf component $C$ one by one. $Q$ denotes the vertex in $C$ that we use to connect $v$. The bijection between the sequences can be proved in a similar manner. Since for each sequence $P$, we will have $\prod_i sz_i$ different $Q$s, where $sz_i$ is the size of the $i$th component, the answer will be $$n^{k-2} \cdot \prod_i sz_i $$.

Generalized Cayley’s Formula: CF1109D
Trivial Application

About Suffix Automaton

Some Properties
  • Each state in the SAM corresponds an equivalent class of substrings of $s$ which appear in the same set of positions.
  • For some state $v$, $len(v) – minlen(v) + 1$ is the number of distinct substring in the corresponding class.
About String Matching

To check if $t$ occurs in $s$, we can keep matching $t$ on the SAM of $s$, go to the next state if the current character is matched, or get through suffix link otherwise.

If we want to know how much a string $t$ can be matched for each state in the SAM of $s$, we can first keep matching it and record whenever the character is matched, and then we can propagate the value through the links.

About Paths in the DAG

Fact: Every distinct path from the root is a distinct substring.

Following from this fact, we can use DP to calculate lots of things, such as number of distinct substrings, sum of distinct integers.
Example: sum of dictinct integers, Morse Code

About Suffix-Link Tree

The suffix links forms a tree in which you can propagate information upwards, like the first/last ocurring positions, number of characters matched in a suffix, etc.
Example: propagation of available indices

Related Problems

Sliding Windows for String Matching: HDU 6583
Analysis in the blog for the original contest.