Addendum: Rotation Matrices, Amplitude Amplification

In the main lesson on Grover's algorithm, I chose to omit the specific calculations that show that the Grover operator is equivalent to a $2\theta$ rotation towards $|A_1\rangle$ on the $|A_0\rangle$-$|A_1\rangle$ plane.

However, I feel like it's still important to talk about these calculations, since there are some interesting insights that can be gleaned from going through it. So in this addendum, we will come to the conclusions that we found visually, but through numerical calculations.

Most of these calculations use different notations depending on the source you draw from, so here, I will primarily (but not exclusively) draw from the IBM Quantum Learning page on Grover's algorithm.

As a reminder, we defined two sets

$$ \begin{align*} A_0 &= \{x \in \{0,1\}^n : f(x)=0\} \\ A_1 &= \{x \in \{0,1\}^n : f(x)=1\} \end{align*} $$

and two corresponding superposition states

$$ \begin{align*} |A_0\rangle &= \frac{1}{\sqrt{|A_0|}} \sum_{x \in A_0} |x\rangle \\ |A_1\rangle &= \frac{1}{\sqrt{|A_1|}} \sum_{x \in A_1} |x\rangle \end{align*} $$

and we defined our starting state $|\psi_0\rangle = H^{\otimes n}|0\rangle$. Let $N=|A_0|+|A_1|$ be the total number of bitstrings in our list.

Now, note that our starting state can be written as

$$ |\psi_0\rangle = \sqrt{\frac{|A_0|}{N}}|A_0\rangle + \sqrt{\frac{|A_1|}{N}}|A_1\rangle $$

so let's try applying $G$ to both $|A_0\rangle$ and $|A_1\rangle$.

Let's begin by computing $G|A_0\rangle$. By definition, we know that

$$ \begin{align*} G|A_0\rangle &= H^{\otimes n} (2|0\rangle\langle 0|-\mathbb{I}) H^{\otimes n} O_f|A_0\rangle \\ &= (2|\psi_0\rangle\langle\psi_0|-\mathbb{I})O_f|A_0\rangle \\ &= (2|\psi_0\rangle\langle\psi_0|-\mathbb{I})|A_0\rangle \end{align*} $$

and since $\langle\psi_0 | A_0\rangle = \sqrt{|A_0|/N}$, we can further simplify this as

$$ \begin{align*} G|A_0\rangle &= 2\sqrt{\frac{|A_0|}{N}}|\psi_0\rangle-|A_0\rangle \\ &= 2\sqrt{\frac{|A_0|}{N}}\left(\sqrt{\frac{|A_0|}{N}}|A_0\rangle + \sqrt{\frac{|A_1|}{N}}|A_1\rangle\right)-|A_0\rangle \\ &= \left( \frac{2|A_0|}{N} - 1 \right) |A_0\rangle + \frac{2\sqrt{|A_0||A_1|}}{N}|A_1\rangle \\ &= \frac{|A_0|-|A_1|}{N}|A_0\rangle + \frac{2\sqrt{|A_0||A_1|}}{N}|A_1\rangle. \end{align*} $$

We can perform a similar process to find $G|A_1\rangle$:

$$ \begin{align*} G|A_1\rangle &= H^{\otimes n} (2|0\rangle\langle 0|-\mathbb{I}) H^{\otimes n} O_f|A_1\rangle \\ &= (2|\psi_0\rangle\langle\psi_0|-\mathbb{I})O_f|A_1\rangle \\ &= -(2|\psi_0\rangle\langle\psi_0|-\mathbb{I})|A_1\rangle \\ &= -2\sqrt{\frac{|A_1|}{N}}\left(\sqrt{\frac{|A_0|}{N}}|A_0\rangle + \sqrt{\frac{|A_1|}{N}}|A_1\rangle\right)+|A_1\rangle \\ &= -\frac{2\sqrt{|A_1||A_0|}}{N}|A_0\rangle + \left( 1-\frac{2|A_1|}{N} \right)|A_1\rangle \\ &= -\frac{2\sqrt{|A_1||A_0|}}{N}|A_0\rangle + \frac{|A_0|-|A_1|}{N}|A_1\rangle. \end{align*} $$

It's clear that there is some sort of symmetry going on here between the two results. This is a lot more clear if we express the operator $G$ as a matrix in the $\{|A_0\rangle, |A_1\rangle\}$ basis as follows:

$$ \frac{1}{N}\begin{pmatrix} |A_0|-|A_1| & -2\sqrt{|A_1||A_0|} \\ 2\sqrt{|A_0||A_1|} & |A_0|-|A_1| \end{pmatrix} = \frac{1}{N}\begin{pmatrix} \sqrt{|A_0|} & -\sqrt{|A_1|} \\ \sqrt{|A_1|} & \sqrt{|A_0|} \end{pmatrix}^2 $$

I have pulled out the $1/N$ term and highlighted the square root of this matrix to make the pattern here a little bit more clear. For those who have encountered rotation matrices before, this may look familiar, as we can observe that

$$ \frac{1}{\sqrt{N}}\begin{pmatrix} \sqrt{|A_0|} & -\sqrt{|A_1|} \\ \sqrt{|A_1|} & \sqrt{|A_0|} \end{pmatrix} = \begin{pmatrix} \cos{\theta} & -\sin{\theta} \\ \sin{\theta} & \cos{\theta} \end{pmatrix} $$

where $\cos\theta = \sqrt{|A_0|/N}$ and $\sin\theta = \sqrt{|A_1|/N}$. We have essentially highlighted a parameter in polar coordinates

$$ \theta=\arcsin\left(\sqrt{\frac{|A_1|}{N}}\right) $$

intrinsically linked to the $|A_0\rangle$-$|A_1\rangle$ plane we have established. And since

$$ \begin{pmatrix} \cos{\theta} & -\sin{\theta} \\ \sin{\theta} & \cos{\theta} \end{pmatrix}^2 = \begin{pmatrix} \cos(2\theta) & -\sin(2\theta) \\ \sin(2\theta) & \cos(2\theta) \end{pmatrix}, $$

the Grover diffusion operator turns out to correspond exactly to a $2\theta$ counterclockwise rotation in the $|A_0\rangle$-$|A_1\rangle$ plane. Therefore,

$$ |\psi_0\rangle = \sqrt{\frac{|A_0|}{N}}|A_0\rangle + \sqrt{\frac{|A_1|}{N}}|A_1\rangle = \cos(\theta)|A_0\rangle + \sin(\theta)|A_1\rangle $$

and the application of the Grover diffusion operator gives

$$ G^t|\psi_0\rangle = \cos((2t+1)\theta)|A_0\rangle + \sin((2t+1)\theta)|A_1\rangle. $$

We have therefore shown analytically that our operator $G$ corresponds to this rotation towards $|A_1\rangle$, the superposition state of just the terms we want, and away from $|A_0\rangle$, the superposition state of just the terms we don't want.

I want to stress that this "trick" of partitioning the "good" and "bad" elements we want into different sets and then using a rotation matrix in the plane of a superposition of each set is extremely powerful and can be used beyond search.

Specifically, this process is known as amplitude amplification, and it is one of the more powerful and well-researched quantum paradigms. I won't go too much into the generalization, since using the technique in general is very similar to using it for the specific purpose of Grover's algorithm, but you can check out the Wikipedia page if you're interested for a more formal description.

I do want to discuss one variation of Grover's algorithm, though, which is if we don't know the number of terms we want to search for, i.e. that our oracular function $f(x)$ outputs $1$ for.

Recall that if we have $N$ total bitstrings and know that we are searching for $|A_1|=M$ of them, the number of iterations should be

$$ t = \left\lfloor \frac{\pi}{4}\sqrt{\frac{N}{M}} \right\rfloor, $$

but in this case, $M$ is unknown, so we don't know exactly how many iterations to perform.

There is a pretty unapparent method that takes $O(\sqrt{N/M})$ to find a solution assuming one exists, even when $M$ is unknown. We start by picking a random $t$ from the set $\{1,\dots,T\}$ for increasing values of $T$. Once we find a value of $T$ that works, we stop increasing $T$ and end the process. We could then find other solutions from there if needed.

We just need to decide the growth rate for increasing $T$. It turns out that doubling $T$ each time turns out to be too large to efficiently find the solution. Multiplying $T$ by $5/4$ each time turns out to work well.