Skip to main content

Suffix Array

This page provides links to solutions that use the Suffix Array data structure.

Overview

A suffix array will contain integers that represent the starting indexes of the all the suffixes of a given string, after the suffixes are sorted.

Let's take an example for a string s = abaab\text{s = abaab} . All suffixes are as follows:

  • abaab\text{abaab} - 0th0^{\text{th}} index
  • baab\text{baab} - 1st1^{\text{st}} index
  • aab\text{aab} - 2nd2^{\text{nd}} index
  • ab\text{ab} - 3rd3^{\text{rd}} index
  • b\text{b} - 4th4^{\text{th}} index

After sorting these strings we get:

  • aab\text{aab} - 2nd2^{\text{nd}} index
  • ab\text{ab} - 3rd3^{\text{rd}} index
  • abaab\text{abaab} - 0th0^{\text{th}} index
  • b\text{b} - 4th4^{\text{th}} index
  • baab\text{baab} - 1st1^{\text{st}} index

Therefore the suffix array for string s\text{s} will be: [2,3,0,4,1][2, 3, 0, 4, 1].

Sorting string directly will take O(n2log n)\text{O}(\text{n}^{2} * \text{log n}) time, algorithm below does it using O(nlog n)\text{O}(\text{n} * \text{log n}) time.

Build Suffix Array

Let's create suffix array for input string s=banana\text{s=banana}. All the suffixes will be as below:

  • banana\text{banana}
  • anana\text{anana}
  • nana\text{nana}
  • ana\text{ana}
  • na\text{na}
  • a\text{a}

To generate the suffix array, we will begin by defining a suffix class that implements the comparable interface to compare two suffixes, as shown below:

// Suffix class to store the originalIndex, rank, and nextRank of a suffix
private static class Suffix implements Comparable<Suffix> {
int originalIndex;
int rank;
int nextRank;

// Constructor to initialize the originalIndex, rank, and nextRank
Suffix(int originalIndex, int rank, int nextRank) {
this.originalIndex = originalIndex;
this.rank = rank;
this.nextRank = nextRank;
}

// Compare two suffixes
@Override
public int compareTo(Suffix other) {
int c = Integer.compare(this.rank, other.rank);
return c == 0 ? Integer.compare(this.nextRank, other.nextRank): c;
}
}

Let's create a suffix object for each suffixes as below:

suffix_array_indexsuffixoriginal_indexranknext_rank
00banana\text{banana}001100
11anana\text{anana}11001313
22nana\text{nana}22131300
33ana\text{ana}33001313
44na\text{na}44131300
55a\text{a}55001-1
info

The original index in the suffix object represents the position of the suffix in the string s\text{s}, while rank and next rank refer to the ranks of the corresponding alphabet.

Once suffix objects are ready, sort them by rank first and then by next rank as below:

suffix_array_indexsuffixoriginal_indexranknext_rank
00a\text{a}55001-1
11anana\text{anana}11001313
22ana\text{ana}33001313
33banana\text{banana}001100
44nana\text{nana}22131300
55na\text{na}44131300
info

After sorting, the suffixes are no longer in their original order. To keep track of their new positions, we will create an array to stores each suffix's old index to new index mapping.

For example, the suffix ana\text{ana} which was initially at index 33 in the original string s\text{s}, is now at index 22 in the sorted suffix array above.

Now, we iterate through the sorted suffix objects and reassign a rank to each suffix based on its comparison with the previous suffix. The goal is to group suffixes with the same first character (or first two characters, or first four characters, and so on) together. Also, we set mapping for each suffix.

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}550\textbf{0}1-13\textbf{3}
11anana\text{anana}111\textbf{1}13131\textbf{1}
22ana\text{ana}331\textbf{1}13134\textbf{4}
33banana\text{banana}002\textbf{2}002\textbf{2}
44nana\text{nana}223\textbf{3}005\textbf{5}
55na\text{na}443\textbf{3}000\textbf{0}

Next, we assign a next rank to each suffix by offsetting its index by a power of 22 and retrieving the rank of the suffix at that new position, unless the offset exceeds the bounds, in which case we assign 1-1.

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}5500-1\textbf{-1}33
11anana\text{anana}11111\textbf{1}11
22ana\text{ana}33110\textbf{0}44
33banana\text{banana}00223\textbf{3}22
44nana\text{nana}22333\textbf{3}55
55na\text{na}4433-1\textbf{-1}00

For example:

First suffix in sorted array a\text{a} has index of 55, adding first power of 22 to it gives us 77, which is out of bounds for given string s\text{s}, so we assign next rank as 1-1.

Second suffix in sorted array anana\text{anana} has index of 11, adding first power of 22 to it gives us 33, now to get the new index of suffix which was originally at 33 we lookup idxArray and get 22. suffix at index 22 has a rank of 11, which becomes next rank for suffix anana\text{anana}.

Now, that we have assigned rank and next rank, we sort the suffixes again. We repeat this, doubling the number of characters compared in each iteration, until it becomes greater than or equal to length of the string s\text{s}.

After sorting again, suffixes will look like below:

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}55001-133
11ana\text{ana}33110044
22anana\text{anana}11111111
33banana\text{banana}00223322
44na\text{na}44331-100
55nana\text{nana}22333355

Next, reassign rank and index mapping to suffix objects as below:

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}550\textbf{0}1-13\textbf{3}
11ana\text{ana}331\textbf{1}002\textbf{2}
22anana\text{anana}112\textbf{2}115\textbf{5}
33banana\text{banana}003\textbf{3}331\textbf{1}
44na\text{na}444\textbf{4}1-14\textbf{4}
55nana\text{nana}225\textbf{5}330\textbf{0}

Now, we reassign next rank to suffix objects as below:

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}5500-1\textbf{-1}33
11ana\text{ana}3311-1\textbf{-1}22
22anana\text{anana}11220\textbf{0}55
33banana\text{banana}00334\textbf{4}11
44na\text{na}4444-1\textbf{-1}44
55nana\text{nana}2255-1\textbf{-1}00
info

In this iteration we have used next power of 22 i.e. 44 for assigning next rank.

Finally, we sort suffix object as below:

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}55001-133
11ana\text{ana}33111-122
22anana\text{anana}11220055
33banana\text{banana}00334411
44na\text{na}44441-144
55nana\text{nana}22551-100

So suffix array for banana\text{banana} will be: [5,3,1,0,4,2][5, 3, 1, 0, 4, 2].

// Function to construct the suffix array
public static int[] buildSuffixArray(char[] s) {
int n = s.length;
Suffix[] suffixes = new Suffix[n];

// Step 1: Create a Suffix object for each character in the input array
for (int i = 0; i < n; i++) {
suffixes[i] = new Suffix(i, s[i] - 'a', (i + 1 < n) ? s[i + 1] - 'a' : -1);
}

// Step 2: Sort the suffixes based on rank and next rank using a custom comparator
Arrays.sort(suffixes);

// An array to store the suffix indices in their sorted order
int[] idxMapping = new int[n];

// Step 3: Assign ranks and create the suffix index array
for (int k = 2; k <= n; k *= 2) {
int rank = 0;
int previousRank = suffixes[0].rank;
suffixes[0].rank = rank;
idxMapping[suffixes[0].idx] = 0;

// Step 3.1: Assign ranks based on suffix comparisons
for (int i = 1; i < n; i++) {
if (suffixes[i].rank != previousRank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
rank++;
previousRank = suffixes[i].rank;
}
suffixes[i].rank = rank;
idxMapping[suffixes[i].idx] = i;
}

// Step 3.2: Assign nextRank for each suffix
for (int i = 0; i < n; i++) {
int nextIndex = suffixes[i].idx + k;
suffixes[i].nextRank = (nextIndex < n) ? suffixes[idxMapping[nextIndex]].rank : -1;
}

// Step 3.3: Sort the suffixes based on rank and next rank using the custom comparator
Arrays.sort(suffixes);
}

// Step 4: Create the final suffix array and return it
int[] result = new int[n];
for (int i = 0; i < n; i++) {
result[i] = suffixes[i].idx;
}
return result;
}

How to Spot These Problems

You can identify Suffix Array problems if the problem requires you to:

  • Find the longest or shortest suffix, prefix, or substring.