Problems Notes on Codeforces String Problems

My goal is to solve all the problems related to strings/hashing/suffix structure starting from rating 2600 in Codeforces. I will keep summarizing problem ideas here.

If a problems falls into a specific large category, I will summarize it in those independent blogs:
Palindrome Tree
Suffix Automaton
Hashing

Shortest/Longest Path Cut Vertex Checking1214D650D, MCPC19B
Sometimes, if we want to check if a vertex is neccesary for a shortest path/longest path, one way to do is hashing as mentioned in the hashing blog. Another much easier way is to group the vertices by distance. The vertex is the cut vertex if and only if it is the only one in that “distance level”.

Non-Palindrome Cutting: Timus 2057
Consider that the only “impossible” cases are like $aaaaa$, $aaaabaaaa$, or $abababa$ (not really sure how to prove, though).
Otherwise the minimum answer is either $1$ or $2$. Check it brute-forcely.
For the maximum answer, we have $dp[i] = max\{dp[j] + 1\}$ for all $j$ s.t. $s[j + 1, i]$ is possible to split. Then we do the dp transition by excluding the three impossible cases.

Reference:
1. URAL Palindromic Contest (in Chinese)

Learning Notes on Hashing

I wouldn’t think of summarizing problems on hashing if I didn’t include “hashing” in the tag for my training plan and see those nice problems recently… While hashing can be used in string checking (so-called “polynomial hashing”), it can be extended to do things like set identity checking.

Problems like this is one of the direct application: We can give each number a random number. Then if two set has the same xor sum, the content of the set will likely be the same.

Shortest/Longest Path Cut Vertex Checking: 1214D, 650D
Sometimes, if we want to check if a vertex is neccesary for a shortest path/longest path, we can use hashing to calculate the number of ways and see if $f(s, v) * f(v, t) = f(s, t)$, where $f(u, v)$ is number of ways to go from $u$ to $v$.
P.S. There are simpler ways as talked here.

Permutation Checking: 213E
We want to maintain a data structure which characterizes a permutation, and supports addition and deletion of an element. How to characterize a permutation? Well, we can put the question like this: ” What uniquely determines the permutation? ” One way that can be implemented relatively easily is inversion sequence, where is $i$th term is the number of numbers before the number $i$. We can keep this structure efficiently using a Fenwick Tree.

Symmetry Checking: 452F
In this problem, we want to check if a certain group of number is symmetric. To solve this, we can store two values for a group, giving powers to numbers in reverse order. Say the frequency list is $[1, 1, 1, 0, 0]$, we can give them two string $”11100″$ and $”00111″$. To check whether they are symmetric around, for example, the second element, we can shift them (in hash) and then check whether they are the same.

Learning Notes on Generating Functions

Generating Function is a way to represent a sequence of number using coefficients of a polynomial.

The two most commonly used generating functions is:

1. Ordinary generating function (OGF)
$$G(a_n; x) = \sum_{n = 0}^{\infty}a_n x^n $$

2. Exponential generating function (EGF)
$$EG(a_n; x) = \sum_{n = 0}^{\infty}a_n \frac{ x^n }{n!} $$

The first one is used in combination, while the second is used in permutations.

For the second one, suppose we have two EGFs $f$ and $g$, then $$f(x) g(x) = \sum_{n=0}^{\infty} (\sum_{i=0}^{n} {{n}\choose{i}} f_i g_{n-i}) \frac{x^n}{n!}$$

Related Problems:

1. Representing $(x_1 + x_2 + .. + x_n)^k$: Codechef PSUM, HackerRank A Summation Problem
First of all, we can solve the problem in a knapsack fashion for the No. of the number item and the sum of items we are considering. Then we can keep a EGF $f$ s.t. the $i$-th term is the sum of all possible $(a_1 + a_2 + … )^i$. Then by multiplying $g(i) = \sum_{i = 0}^{k} \frac{{ v_i }^k}{n!}$, we put the $i$-th item in and permute with the terms in the sum.

2. Generating Functions and Knapsack Problem: Asia Shenyang 2018 M
In this problem we try to formulate knapsack problem to a generating function: $$\prod_{i = l}^{r} \frac{1 – x^{(a + 1) b}}{1 – x^b}$$. Also we have $$\frac{1}{1 – x^k} = 1 + x^k + x^{2k} + x^{3k} + …$$. Then we can precompute the prefix product and inverse product. To answer a query, we need to calculate the coefficients with the terms of powers $\leq c$ in the product.
Note:
1. From the generating function, we can also see undoing complete knapsack (where each item can be used many times) is the doing 0/1 knapsack (where each item can be used at most once) with a opposite coefficient. Just like the “add” and “del” function in this submission.
2. In each query, directly computing the product will be too costly. A neat trick is to make each coefficient the prefix sum of the correpsonding position for one of the polynomial beforehand, and then just take the coefficient of $x^c$ in the product. Then each query can be answer in $O(C)$.

3. Exponential Generating Functions: ZOJ4069
First key observation is that if $m < n$ (we also need to handle other cases very carefully), it must be composed by a $n – m$ chains. For one chain, the number ways it can be compsed by $n$ vertices is $\frac{n!}{2}$ if $n \geq 2$ and $1$ if $n = 1$. Then after dividing $n!$ for each term, the EGF is in a nice form $EG = \frac{1}{2}(2x+ x^2 + x^3 + x^4…) = x {(1 + \frac{1}{1 – x})}$. Suppose the coefficient of $x^n$ in $EG^m$ is $k$. The answer will be $\frac{k \cdot n!}{(n – m)!}$ because we didn’t specify the order of the $n – m$ chains. We can brute-forcedly calculate the answer since the form is not complicated.

Reference:
1. Generating Function – Wikipedia

Learning Notes on Multiplicative Functions, Dirichlet Convolution, Mobius Inversion, etc

Common Functions
  • Unit Function: $\epsilon(n)=[n=1]$
  • Constant Function: $1(n)=1$
  • Euler Function: $\varphi(n)=\sum_{i=1}^n [\gcd(i,n)=1]$
  • Mobius Function:$ \mu(n)=\left\{
    \begin{aligned}
    1 & \text{, if $n=1$} \\
    0 & \text{, if $n$ has some square}\\
    (-1)^k & \text{, ($k$ is the number of prime divisors of $n$) otherwise}
    \end{aligned}
    \right.
    $
Dirichlet Convolution

$$ (f*g)(n)=\sum_{d\mid n}f(d)g(\frac{n}{d}) $$

Common Convolutions:
$$ \varepsilon=\mu * 1 \iff \varepsilon(n)=\sum_{d\mid n}\mu(d)$$ $$ d=1 * 1 \iff d(n)=\sum_{d\mid n}1 $$ $$ \sigma=d * 1 \iff \sigma (n)=\sum_{d\mid n}d$$ $$\varphi=\mu * \text{ID} \iff \varphi(n)=\sum_{d\mid n}d\cdot\mu(\frac{n}{d})$$


Mobius Inversion

The first form of Mobius Inversion:
$\displaystyle F(n)=\sum_{d|n}f(d)\implies f(n)=\sum_{d|n}\bigg(\mu(d)\cdot F\Big(\frac{n}{d}\Big)\bigg)$

Proof of the first form:
$\displaystyle \sum_{d|n}\bigg(\mu(d)\cdot F\Big(\frac{n}{d}\Big)\bigg)$ $=\sum_{d|n}\bigg(\mu(d)\cdot \sum_{k|\frac{n}{d}}f(k)\bigg)=\sum_{k|n}\bigg(f(k)\cdot \sum_{d|\frac{n}{k}}\mu(d)\bigg)=f(n)$

Second form:
$\displaystyle F(n)=\sum_{n|d}f(d)\implies f(n)=\sum_{n|d}\bigg(\mu\Big(\frac{d}{n}\Big)\cdot F(d)\bigg)$

Related Problems

Euler’s Functions/ Mobius Inversion: CF645F
Let $F[n]$($f[n]$) be the number of $k$-tuples the g.c.d. of whose numbers is a multiple of k (exactly k ), respectively. First we have $F[n] = {{cnt[n]}\choose{k}}$, where $cnt[n]$ is the number of species that has number divisible by $n$.
By Mobius Inversion, we have $f(d) = \sum_{d|n} \mu(\frac{n}{d}) F(n)$, and the answer for the problem equals $\sum_{d = 1}^{10^6} d \cdot f(d)$
$$= \sum_{d = 1}^{10^6} d \cdot \sum_{d|n} \mu(\frac{n}{d}) F(n) $$
We can change the summation to sum over $n$ instead, and then we have $$= \sum_{n = 1}^{10^6} \sum_{d|n} d \cdot \mu(\frac{n}{d}) F(n)$$ $$= \sum_{n = 1}^{10^6} F(n) \sum_{d|n} d \cdot \mu(\frac{n}{d}) $$ $$= \sum_{n = 1}^{10^6} F(n) \varphi(n)$$.
Therefore, each update can only affect $O(maxdivs)$ values of $F$, which is good enough for this problem.

Changing representation of the sequence: 1097F
In this problem, we can store the frequency of the multiples instead of the number itself. This is essentially Mobius Inversion, which is also a linear transformation, we can use bitwise xor for addition, and bitwise and for Cartisian product.

LCM: SPOJ LCMSUM
Key takeaways:
1. The formula $lcm(a, b) = \frac{a \cdot b}{gcd(a, b)}$ can turn the problem to some gcd-related stuff where we can apply Mobius Inversion.
2. TODO

No. of Divisors: CF235E
Let $d(x)$ be the number of divisors of $x$.
Then $ d(mn)=\sum\limits_{i|m}\sum\limits_{j|n}[(i,j)=1] $
Also, $d(r \cdot s \cdot t)=\sum\limits_{i|r}\sum\limits_{j|st}[(i,j)=1]=\sum\limits_{i|r}\sum\limits_{j|s}\sum\limits_{k|t}[(i,jk)=1][(j,k)=1]$ $=\sum\limits_{i|r}\sum\limits_{j|s}\sum\limits_{k|t}[(i,j)=1][(i,k)=1][(j,k)=1]$
Refer to this blog for more details.

Centroid Decomposition: CF809E
By enumerating the GCD, we need to calculate $$\sum_{i = 1}^{n} \frac{f_i \cdot i}{\varphi(i)}$$ where $$f_i = \sum_i \sum_j [(a_i, b_j) = g] \varphi (a_i) \varphi (b_j) d(i)$$. Instead of calculating $f_i$, we can calculate $g_d = \sum_{d | a_i} \sum_{d | b_j} \varphi (a_i) \varphi (b_j) d(i) $ by using centroid decomposition.

Total number of divisors of all the numbers is $O(n \log n)$ and each of them is used $O(\log n)$ because of centroid decomposition. So the total complexity is $O(n \log^2 n)$

Reference

Notes on Codeforces Problems: Rating 2500

Complexity Analysis for Tree DP: 815C
In this problem, we have a straight-forward DP which has $sz_u \cdot sz_v$ for each children merging. In this case the total running time will be $O(n^2)$ since every pair of vertices will only be merged once.

$O(\sqrt{n})$ possibilities of n / i: 830B
We can express the sum of cut-off as $$\sum_{i} ( d \lceil \frac{a_i}{d}\rceil ) – \text{sum of } a_i$$ $$= d \sum_{i} (\lceil \frac{a_i}{d}\rceil ) – \text{sum of } a_i$$.
The sigma term will only change only $O(n \cdot \sqrt{a_i})$ many times and for each of them, we try to maximize $d$ within the range.

Cycle Under Modulo/ Fermat’s Little Theorem: 311D
It turns out that the length of cycle for a number cubed is always a divisor of 48. This can be explained by the fact that $3^{48} = 1 \bmod ( 95542721 – 1 )$. Therefore, by Fermat’s Little Theorem, we have $a^{3^k} = a^{3^{(k \bmod 48)}} \bmod 95542721$.

Binomial Coefficient/ Pascal’s Triangle: 407C
Don’t forget we have binomial coefficients in Pascal’s triangle! This is the way that we can update those coefficients offline in an array efficiently.

Centroid: 708C
Observation: Find any centroid $v$. We will only cut the edges adjacent to $v$. Otherwise, it will either be not possible (too large) or not optimal (can do better).

Meet-in-the-Middle: 327E, 1257F
When the constraints are really small, always try bit-mask DP or meet-in-the-middle.

MST/ Edge Groups of Kruskal’s Algorithm: 160D, 891C
The process of Krusal’s algorithm can be interpreted as grouping the edges with the same weights and processing them in increasing order of weights. Then for each weight, we want to find a spanning forest. This idea can be used to find the edges that can be/cannot be/might be in the MST.

Dealing with Double Divisions: 138C
If we are frequently doing double multiplicaitons and divisions during sliding window, the error might explode. Instead, either we can take logarithm beforehand and do additions/subtractions instead, or we can use a segment tree to avoid divisions.

Bipartite Graphs over points on 2D plane: 870E, 547D
When we are dealing with configurations of points on 2D plane, we can consider building a bipartite graph where the left (right) part are all possible x (y) coordinates and the points are edges that connects their x and y values.
By doing this construction, 870E becomes directing edges so that $|deg_{in} – deg_{out}| \leq 1$ for each node, and 547D becomes directing edges so that $deg_{in} \geq 1$ for each node.

Doing DFS to enumerate paths in the tree: 490F
We can start DFS from each node and do exactly like the original LIS problem by keeping a DP array. The time for update in each node is $O(1)$ so we can easily roll back when we exit a node.

“Construct optimal answer from any”: 460D
The main difficulty of the problem comes from whether we can choose three integers than have xor sum 0. We can start by any solution and try to tweak it. Then we can see if there are such solution exists, there must be some solution of form “$11000000…, 10111111…, 1111111…$”.

Maximum Pseudoforest/ Greedy: 875F
One can show that this problem is equivelant to finding the maximum weight pseudoforest. Since it is a matroid, we can use greedy to solve it using a way similar to Kruskal’s Algorithm.

Combinatorics / DP: 886E
To solve this problem, we try to use DP and see how to reduce to a smaller problem. Consider $f(n)$ as the number of permutations of $1$ to $n$ that exit early. Then we can consider the position of the largest value: if the largest value is in a early position $i <= n – k – 1$, the algorithm will alway exit early; otherwise the problem will become a smaller one.

Combinatorics: 348D
Let $f(x, y)$ be the number of paths from $x$ to $y$. We can shown the answer is $f((1, 0), (n, m-1)) \cdot f((0, 1), (n – 1, m)) – f((0, 1), (n, m-1)) \cdot f((1, 0), (n – 1, m)) $. The subtracted term is equal to the number of intersecting pairs by consider exchanging the membership of the path at the first time they intersect. The more general form of this problem is discussed here: Lindström–Gessel–Viennot lemma.

Segment Tree: 687D
One can notice that for any certain range of the edges, there will be at most $O(n)$ edges that matters (edges for a spanning tree and an violating edge). Therefore, we can store those edges for each interval in the segment tree, and for each query, we only have $O(n \log n)$ edges to consider.

Maximum Matching: 1198E
First consider covering the grid with unit strip (i.e. $1 \times n$ or $n \times 1$). Then we can reduce the problem into minimum vertex cover, which can be solved by max flow. The number of vertices is large but we can compress the identical ones.

HLD? / Brute Force: 442D
First of all, the answer will be at most $O(\log n)$ because an upper bound is HLD. Then first each newly added vertex, we can just iterative update its parent, and stop when nothing is updated. The amortized complexity is $O(n \log n)$ because each node will be updated at most $O(\log n)$ times.

Subset Sum / Bitset: 356D
While the problem itself is NP-hard, we can use bitset to squeeze the DP solution. To recover the solution, we can record the first item that make sum $j$ available, which can also be done by using bitset.

Constructive Algorithm: 1225F
Sometimes, when dealing with such problems like “finding the maximum answer”, it will be helpful to try to find a upper bound for the answer and show it is always achievable.

Greedy/ We-dont-care-about-everything DP: 822E
While doing brute force transition is too slow, we can just match as long as possible (i.e. longest common prefix) if we do match in a transition. Then the solution will work in $O(n \cdot k)$ after preprocessing LCP and RMQ array.

Cycle under Modulo / Constructive Algorithm: 763C
Consider a (cyclic) sequence where the next number of each number is $(x + d) \text{ mod } m$, where $x$ is the current number and $d, m$ is fixed. Then the answer we want is exactly a segment on this cyclic sequence. Then if $n \leq \frac{m}{2}$, we can see the difference $d$ of a two arbitrary numbers will appear $n – cnt$ times, where $cnt$ is the number of numbers between those two. If $n > \frac{m}{2}$, we can the complement and do the same thing.

Dilworth’s Theorem: 1257G, SPOJ MDOLLS
Dilworth’s Theorem states that the longest anti-chain in a DAG (i.e. the DAG version of “max independent set”) is equal to the size of the smallest path cover. In 1257G, we can prove that the latter is the size of the largest group of divisors that have the same number of prime factors.

Data Structure Keeping the sum of DP Values: 1129D
We try to do fast queries (of sum of values at positions of values of at most $k$) and update (for adding $+1$ or $-1$ to a interval) in a data structure that stores the DP values.
To do this we can do square root decompostion, where in each block we store the count of a each number and a offset to support fast update. This will cost $O(n \sqrt{n})$ for space (which is a little tight) and time.
To save more space, we can observe that the absolute difference in each block can be at most $\sqrt{n}$. Therefore, we can just store a prefix sum of a certain range of values, which reduces the space complexity to $O(n)$.

Learning notes on 2-SAT

Algorithm to solve 2-SAT

First decompose the implication graph into SCCs, and then if $x_i$ and $\neg x_i$ is in the same SCC, then the answer is impossible. Else it suffices to choose the variable that appears later in the topological order to be true.

Problems and Techniques

Enforcing one variable to be true: link
To do that, we can just add an edge from $\neg x_i$ to $x_i$

Enforcing at most one variable among several to be true
This can be done in $O(n)$. Order the nodes arbitrarily. Then we can build prefix nodes for each prefix part of the nodes and establish implication between those prefix nodes and the variables themselves.

Notes on Codeforces Problems: Rating 2400 (2)

Reduction to LIS: 76F
First, we need to understand the LIS problem with more depth. LIS problem, more generally, finds the longest sequence $X = [x_1, x_2,… , x_n]$ where $p(x_i) < p(x_{i+1})$ and $q(x_i) < q(x_{i+1})$ for $1 \leq i \leq n – 1$ for some function $p$ and $q$.
Then we return to the original problem to see how we can reduce it to LIS. After doing transformation to the formula, we see we can reach from $i$ to $j$ if and only if $|x_i – x_j| \leq (t_j – t_i) * V$. Then let $p_i = – x_i + t_i * V$ and $q_i = x_i + t_i * V$, then we can use LIS to solve this problem.

AC Automaton/ DP: 86C
AC Automaton is a good choice for maintaining “where we are when matching several strings”, but we also need to record how many characters are not matched yet when we are doing DP.
Another Note: To check whether a node is the end of a string, we need to check along the fail pointer until we reach the root.

Bottleneck Edge of Paths /Minimum Spanning Tree: 632F
The inequality in the problem looks like triangular inequality, which means we can somehow extend it into paths between two vertices, i.e. for any path between vertex $i$ and $j$, the maximum edge is large than the direct edge between them.
Then the way to solve this is to construct the MST and try to check each direct edge.

Euler’s Formula on Planar Graph: 933C
The solution of this problem utilizes Euler’s Formula, which relates the number of edges, vertices, and regions in a connected planar graph. $v-e+f = 2$, where $v$ is number of vertices, $e$ is number of edges, and $f$ is number of regions.

Reduction to MST / DP: 1120D
The first solution is to reduce this problem to MST. The idea of substituting between different tree nodes is similar to the way we make exchange argument in MST.
The second solution uses DP and the observation that we only need to care the cases where only one or zero vertice is left uncovered.

Almost Tree / Diameter: 835F
The graph is a almost tree, which contains a cycle and a bunch of trees attached to each node of the cycle. Ideally we want to enumerate the edge to delete. To do this, we can pick an arbitrary edge on the cycle, and do case analysis on whether it will be deleted. If yes, we can just run the diameter finding algorithm; If no, we can do some prefix/suffix sum around the cycle.

FFT / Segment Tree / Combinatorics: 1218E
1. While a good way to maintain single point/range update is to use a segment tree, it is still surprising to me that we can keep a frequency/sum list in each node and do convolution when merging nodes.
2. There is a step where we need to calculate array $s$ such that $$s_i = \sum_{j = 0}^{i} {{N – j}\choose{i – j}} \cdot t_j \cdot d^{i-j}$$. We can also use convolution to calculate this by spliting terms to those only related with $j$, $i – j$, $i$ respectively.
3. If we want to merge several intervals with complexity $O((n + m) \log (n + m))$. We can do this by pairing the smallest intervals together each time. This will yield a complexity of $L \log L$, where $L$ is the sum of all the intervals.

Segment Maintaining Max Prefix Sum: 1220F
We can keep the max prefix sums in a segment in two ways. One is using single point update and use a slightly more complicated merge function. Another is using lazy propagation directly on the prefix sum.

Divide-and-Conquer to Fit in Memory: 101E
If there is sufficient memory, we can do a straight-forward $O(n^2)$ DP. However, due to the tight memory constraint, in order to recover the answer, we will need to recursively find the middle point in the optimal answer using meet-in-the-middle tenique and divide the problems into the half, which yields a linear space complexity. In this way, the time complexity will be $O((n + m) ^ 2)$, which is still OK.

Reduction to Query in Segment Tree: 117D
One can see, when trying to caculate the sum, it is exactly like the process of querying sum in a segment tree.

HDU Multi-University Training 2019, Contest 4/5

Brute Force?/ Case Analysis: Contest 4, Problem M
First we can brute-forcely check factorize $n$ with the primes under $x = \sqrt[^5]{n}$. Then if $n$ is still larger than 1, other primes left to check is greater than $x$, which means there can only be at most 4 such primes. Analyzing the cases, we can see the answer will be 1 unless $n = p^2, p^3 \text{ or } p^4$ for some $p$.

XOR Operation/ Greedy?: Contest 5, Problem B
For each $a[i]$, define the favorite number as the best $b$ that $a[i]$ can choose (i.e. $a[i] \oplus b$ is smallest). For each $b[i]$ we define it similarly.
Without loss of generality, we assume there is no duplicate number in $A$ nor $B$. (The solution can be easily extended to the general case.)

Lemma 1: If for some number $a[i]$ and $b[j]$, they are each other’s favorite number, then $a[i] \oplus b[j]$ must be in the optimal solution. (If not, we can repair them together and the answer becomes better.)

Then we see this problem from a graph perspective: See each number as a vertex in a directed graph, and if $b$ is $a$’s favorite number, there is an directed edge from $a$ to $b$.

Lemma 2: There will be not cycle of length $>$ 2.
Proof (by contradiction): Suppose there is some cycle $x_1, x_2, x_3, …, x_n$. Then we have $x_1 \oplus x_2 < x_2 \oplus x_3 …. < x_n \oplus x_1 < x_1 \oplus x_2$. Contradiction.

Therefore, the idea to solve this problem is to effectively finding the cycle of length 2, putting the pair into the answer, and solve the problem again.