Suffix Array
This page provides links to solutions that use the Suffix Array data structure.
Overview
A suffix array will contain integers that represent the starting indexes of the all the suffixes of a given string, after the suffixes are sorted.
Let's take an example for a string . All suffixes are as follows:
- - index
- - index
- - index
- - index
- - index
After sorting these strings we get:
- - index
- - index
- - index
- - index
- - index
Therefore the suffix array for string will be: .
Sorting string directly will take time, algorithm below does it using time.
Build Suffix Array
Let's create suffix array for input string . All the suffixes will be as below:
To generate the suffix array, we will begin by defining a suffix class that implements the comparable interface to compare two suffixes, as shown below:
- Java
// Suffix class to store the originalIndex, rank, and nextRank of a suffix
private static class Suffix implements Comparable<Suffix> {
int originalIndex;
int rank;
int nextRank;
// Constructor to initialize the originalIndex, rank, and nextRank
Suffix(int originalIndex, int rank, int nextRank) {
this.originalIndex = originalIndex;
this.rank = rank;
this.nextRank = nextRank;
}
// Compare two suffixes
@Override
public int compareTo(Suffix other) {
int c = Integer.compare(this.rank, other.rank);
return c == 0 ? Integer.compare(this.nextRank, other.nextRank): c;
}
}
Let's create a suffix object for each suffixes as below:
suffix_array_index | suffix | original_index | rank | next_rank |
---|---|---|---|---|
The original index in the suffix object represents the position of the suffix in the string , while rank and next rank refer to the ranks of the corresponding alphabet.
Once suffix objects are ready, sort them by rank first and then by next rank as below:
suffix_array_index | suffix | original_index | rank | next_rank |
---|---|---|---|---|
After sorting, the suffixes are no longer in their original order. To keep track of their new positions, we will create an array to stores each suffix's old index to new index mapping.
For example, the suffix which was initially at index in the original string , is now at index in the sorted suffix array above.
Now, we iterate through the sorted suffix objects and reassign a rank to each suffix based on its comparison with the previous suffix. The goal is to group suffixes with the same first character (or first two characters, or first four characters, and so on) together. Also, we set mapping for each suffix.
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
Next, we assign a next rank to each suffix by offsetting its index by a power of and retrieving the rank of the suffix at that new position, unless the offset exceeds the bounds, in which case we assign .
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
For example:
First suffix in sorted array has index of , adding first power of to it gives us , which is out of bounds for given string , so we assign next rank as .
Second suffix in sorted array has index of , adding first power of to it gives us , now to get the new index of suffix which was originally at we lookup idxArray and get . suffix at index has a rank of , which becomes next rank for suffix .
Now, that we have assigned rank and next rank, we sort the suffixes again. We repeat this, doubling the number of characters compared in each iteration, until it becomes greater than or equal to length of the string .
After sorting again, suffixes will look like below:
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
Next, reassign rank and index mapping to suffix objects as below:
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
Now, we reassign next rank to suffix objects as below:
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
In this iteration we have used next power of i.e. for assigning next rank.
Finally, we sort suffix object as below:
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
So suffix array for will be: .
- Java
// Function to construct the suffix array
public static int[] buildSuffixArray(char[] s) {
int n = s.length;
Suffix[] suffixes = new Suffix[n];
// Step 1: Create a Suffix object for each character in the input array
for (int i = 0; i < n; i++) {
suffixes[i] = new Suffix(i, s[i] - 'a', (i + 1 < n) ? s[i + 1] - 'a' : -1);
}
// Step 2: Sort the suffixes based on rank and next rank using a custom comparator
Arrays.sort(suffixes);
// An array to store the suffix indices in their sorted order
int[] idxMapping = new int[n];
// Step 3: Assign ranks and create the suffix index array
for (int k = 2; k <= n; k *= 2) {
int rank = 0;
int previousRank = suffixes[0].rank;
suffixes[0].rank = rank;
idxMapping[suffixes[0].idx] = 0;
// Step 3.1: Assign ranks based on suffix comparisons
for (int i = 1; i < n; i++) {
if (suffixes[i].rank != previousRank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
rank++;
previousRank = suffixes[i].rank;
}
suffixes[i].rank = rank;
idxMapping[suffixes[i].idx] = i;
}
// Step 3.2: Assign nextRank for each suffix
for (int i = 0; i < n; i++) {
int nextIndex = suffixes[i].idx + k;
suffixes[i].nextRank = (nextIndex < n) ? suffixes[idxMapping[nextIndex]].rank : -1;
}
// Step 3.3: Sort the suffixes based on rank and next rank using the custom comparator
Arrays.sort(suffixes);
}
// Step 4: Create the final suffix array and return it
int[] result = new int[n];
for (int i = 0; i < n; i++) {
result[i] = suffixes[i].idx;
}
return result;
}
How to Spot These Problems
You can identify Suffix Array problems if the problem requires you to:
- Find the longest or shortest suffix, prefix, or substring.