Skip to main content

Suffix Array

This page provides links to solutions that use the Suffix Array data structure.

Overview

A suffix array contains integers that represent the starting indices of the suffixes of a given string, after the suffixes have been sorted.

Let's take an example for a string s = abaab\text{s = abaab} . All suffixes of given string will be as follows:

  • abaab\text{abaab} - 0th0^{\text{th}} index
  • baab\text{baab} - 1st1^{\text{st}} index
  • aab\text{aab} - 2nd2^{\text{nd}} index
  • ab\text{ab} - 3rd3^{\text{rd}} index
  • b\text{b} - 4th4^{\text{th}} index

After sorting these strings we get:

  • aab\text{aab} - 2nd2^{\text{nd}} index
  • ab\text{ab} - 3rd3^{\text{rd}} index
  • abaab\text{abaab} - 0th0^{\text{th}} index
  • b\text{b} - 4th4^{\text{th}} index
  • baab\text{baab} - 1st1^{\text{st}} index

Therefore the suffix array for string s\text{s} will be: [2,3,0,4,1][2, 3, 0, 4, 1].

Build Suffix Array

Let's create suffix array for input string s=banana\text{s=banana}. All the suffixes of input string will be as below:

  • banana\text{banana}
  • anana\text{anana}
  • nana\text{nana}
  • ana\text{ana}
  • na\text{na}
  • a\text{a}

To generate the suffix array, we will begin by defining a suffix class that implements the comparable interface to compare two suffixes, as shown below:

// Suffix class to store the originalIndex, rank, and nextRank of a suffix
private static class Suffix implements Comparable<Suffix> {
int originalIndex;
int rank;
int nextRank;

// Constructor to initialize the originalIndex, rank, and nextRank
Suffix(int originalIndex, int rank, int nextRank) {
this.originalIndex = originalIndex;
this.rank = rank;
this.nextRank = nextRank;
}

// Compare two suffixes
@Override
public int compareTo(Suffix other) {
int c = Integer.compare(this.rank, other.rank);
return c == 0 ? Integer.compare(this.nextRank, other.nextRank): c;
}
}

Let's create a suffix object for each suffixes as below:

suffixoriginal_indexranknext_rank
banana\text{banana}001100
anana\text{anana}11001313
nana\text{nana}22131300
ana\text{ana}33001313
na\text{na}44131300
a\text{a}55001-1
info

The original index in the suffix object above represents the position of the suffix in the string s\text{s}. For example suffix banana\text{banana} starts at index 00 in a string.

Rank and next rank represents ranks of first and second character of suffix in english alphbets. For example if character is "a" rank will be 00, if character is "b" rank will be 11, if character is "c" rank will be 22 and so on.

Once suffix objects are ready, sort them by rank first and then by next rank as below:

suffixoriginal_indexranknext_rank
a\text{a}55001-1
anana\text{anana}11001313
ana\text{ana}33001313
banana\text{banana}001100
nana\text{nana}22131300
na\text{na}44131300

Now, our goal is to group suffixes that have the same first two characters together, by assigning them same rank.

For example, in suffix object table above, suffix a\text{a} and next suffix anana\text{anana} do not have the same first two characters, so we will assign different rank to them, whereas suffix anana\text{anana} and next suffix ana\text{ana} do have same first two characters, so we will assign same rank to them. And so on.

The logic for reassigning rank will be as below:

// first suffix will be given a rank of 0.
int rank = 0;

// previousRank variable is used here because we override existing rank of each suffix.
int previousRank = suffixes[0].rank;

// overriding rank of 0th suffix.
suffixes[0].rank = rank;

// idxMapping stores mapping of suffix's original index to next index.
idxMapping[suffixes[0].originalIndex] = 0;

for (int i = 1; i < n; i++) {
// compare rank and next rank of current and previous suffix.
if (previousRank != suffixes[i].rank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
// if rank is not same
// store current suffix's old rank as previous rank and
// increment rank
previousRank = suffixes[i].rank;
rank++;
}

// override rank of current suffix
suffixes[i].rank = rank;

// store mapping of current suffix's original index to new index
idxMapping[suffixes[i].originalIndex] = i;
}

For readability, I have retained old rank and old next_rank values in suffix object table. After overriding rank for each suffix, our suffix objects will looks like below:

suffixoriginal_indexold_rankold_next_rankranknext_rankidx_mapping
a\text{a}55001-10\textbf{0}1-13\textbf{3}
anana\text{anana}110013131\textbf{1}13131\textbf{1}
ana\text{ana}330013131\textbf{1}13134\textbf{4}
banana\text{banana}0011002\textbf{2}002\textbf{2}
nana\text{nana}221313003\textbf{3}005\textbf{5}
na\text{na}441313003\textbf{3}000\textbf{0}
info

In above table, idx_mapping indicates current index of each suffix given original index. For example, idx_mapping[5] = 0 indicates suffix a\text{a} which was at index 55 in original string banana\text{banana} is now at 0th0^\text{th} index.

Up until now we have compared first 22 characters of each suffix using rank and next rank. Now, we will compare next two characters of each suffix and to achieve this we override next rank for each suffix.

To calculate next rank we add 22 to suffix's original index and retrieve the rank of the suffix that was originally at that index.

For example, first suffix a\text{a} have original index of 55, adding 22 to it gives us 77, which is out of bounds for given string banana\text{banana}, so we assign next rank as 1-1.

Similarly, second suffix anana\text{anana} have original index of 11, adding 22 to it gives us 33. Originally suffix ana\text{ana} was at index 33 in given string banana\text{banana}, so we retrieve rank of suffix ana\text{ana} which is 11 and assign it to suffix anana\text{anana}.

The logic for reassinging next rank will be as below:

for (int i = 0; i < n; i++) {
// k here will be multiples of 2
int nextIndex = suffixes[i].originalIndex + k;
// new index of suffix array which was original at nextIndex
int newIndex = idxMapping[nextIndex];
// assign next rank
suffixes[i].nextRank = (nextIndex < n) ? suffixes[newIndex].rank : -1;
}

After reassigning next rank for each suffix, our suffix objects will look like below:

suffixoriginal_indexold_rankold_next_rankranknext_rankidx_mapping
a\text{a}55001-100-1\textbf{-1}33
anana\text{anana}11111313111\textbf{1}11
ana\text{ana}33111313110\textbf{0}44
banana\text{banana}002200223\textbf{3}22
nana\text{nana}223300333\textbf{3}55
na\text{na}44330033-1\textbf{-1}00

Now, that we have reassigned rank and next rank, we sort the suffixes again. After sorting again, suffixes will look like below:

suffixoriginal_indexranknext_rankidx_mapping
a\text{a}55001-133
ana\text{ana}33110044
anana\text{anana}11111111
banana\text{banana}00223322
na\text{na}44331-100
nana\text{nana}22333355
info

Upto this point suffixes are sorted by first 44 characters, in the next iteration we will consider first 88 characters. To achive that we calculate next rank by adding 44 to current suffix's original index.

Firstly, we will repeat the process of reassigning the rank for each suffix. After reassigning rank our suffixes objects will look like below:

suffixoriginal_indexold_rankold_new_rankranknext_rankidx_mapping
a\text{a}55001-10\textbf{0}1-13\textbf{3}
ana\text{ana}3311001\textbf{1}002\textbf{2}
anana\text{anana}1111112\textbf{2}115\textbf{5}
banana\text{banana}0022333\textbf{3}331\textbf{1}
na\text{na}44331-14\textbf{4}1-14\textbf{4}
nana\text{nana}2233335\textbf{5}330\textbf{0}

Next we will repeat process of reassigning the next rank for each suffix. After reassigning next rank our suffix objects will look like below:

suffixoriginal_indexold_rankold_new_rankranknext_rankidx_mapping
a\text{a}55001-100-1\textbf{-1}33
ana\text{ana}33110011-1\textbf{-1}22
anana\text{anana}111111220\textbf{0}55
banana\text{banana}002233334\textbf{4}11
na\text{na}44331-144-1\textbf{-1}44
nana\text{nana}22333355-1\textbf{-1}00

Finally, we sort suffix objects as below:

suffix_array_indexsuffixoriginal_indexranknext_rankidx_mapping
00a\text{a}55001-133
11ana\text{ana}33111-122
22anana\text{anana}11220055
33banana\text{banana}00334411
44na\text{na}44441-144
55nana\text{nana}22551-100

At this point, the suffixes are sorted by the first 88 characters and can be considered fully sorted because none of the suffixes has a length greater than 88.

Now, to get the suffix array we just retrieve original index for each suffix and get suffix array as below:

[5,3,1,0,4,2][5, 3, 1, 0, 4, 2]
// function to construct the suffix array
public static int[] buildSuffixArray(char[] s) {
int n = s.length;
Suffix[] suffixes = new Suffix[n];

// Step 1: Create a Suffix object for each character in the input array
for (int i = 0; i < n; i++) {
suffixes[i] = new Suffix(
i, // original index
s[i] - 'a', // rank
(i + 1 < n) ? s[i + 1] - 'a' : -1 // next rank
);
}

// Step 2: Sort the suffixes based on rank and next rank using a custom comparator
Arrays.sort(suffixes);

// An array to store the suffix indices in their sorted order
int[] idxMapping = new int[n];

// Step 3: Assign ranks and create the suffix index array
for (int k = 2; k <= n; k *= 2) {
int rank = 0;
int previousRank = suffixes[0].rank;
suffixes[0].rank = rank;
idxMapping[suffixes[0].originalIndex] = 0;

// Step 3.1: Assign ranks based on suffix comparisons
for (int i = 1; i < n; i++) {
if (previousRank != suffixes[i].rank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
previousRank = suffixes[i].rank;
rank++;
}
suffixes[i].rank = rank;
idxMapping[suffixes[i].originalIndex] = i;
}

// Step 3.2: Assign nextRank for each suffix
for (int i = 0; i < n; i++) {
int nextIndex = suffixes[i].originalIndex + k;
suffixes[i].nextRank = (nextIndex < n) ? suffixes[idxMapping[nextIndex]].rank : -1;
}

// Step 3.3: Sort the suffixes based on rank and next rank using the custom comparator
Arrays.sort(suffixes);
}

// Step 4: Create the final suffix array and return it
int[] result = new int[n];
for (int i = 0; i < n; i++) {
result[i] = suffixes[i].originalIndex;
}
return result;
}

Complexity

Let's say there are N\text{N} characters in an input string.

Time Complexity

  • Assigning ranks takes O(N)\text{O(N)} time.
  • Sorting ranks of N\text{N} suffixes takes O(N log N)\text{O(N log N)} time.
  • These steps are repeated O(log N)\text{O(log N)} times.

So total time complexity to build suffix array will be:

O(N log2N)\text{O}(\text{N log}^{2} \text{N})
info

Time complexity to build suffix array can be reduced to O(N log N)\text{O}(\text{N log } \text{N}) by using radix sort for sorting ranks.

Space Complexity

Algorithm uses linear space for storing ranks, so space complexit to build suffix array will be:

O(N)\text{O(N)}

How to Spot These Problems

You can identify Suffix Array problems if the problem requires you to:

  • Find the longest or shortest suffix, prefix, or substring.

Leetcode Problem Set