Suffix Array
This page provides links to solutions that use the Suffix Array data structure.
Overview
A suffix array contains integers that represent the starting indices of the suffixes of a given string, after the suffixes have been sorted.
Let's take an example for a string . All suffixes of given string will be as follows:
- - index
- - index
- - index
- - index
- - index
After sorting these strings we get:
- - index
- - index
- - index
- - index
- - index
Therefore the suffix array for string will be: .
Build Suffix Array
Let's create suffix array for input string . All the suffixes of input string will be as below:
To generate the suffix array, we will begin by defining a suffix class that implements the comparable interface to compare two suffixes, as shown below:
- Java
// Suffix class to store the originalIndex, rank, and nextRank of a suffix
private static class Suffix implements Comparable<Suffix> {
int originalIndex;
int rank;
int nextRank;
// Constructor to initialize the originalIndex, rank, and nextRank
Suffix(int originalIndex, int rank, int nextRank) {
this.originalIndex = originalIndex;
this.rank = rank;
this.nextRank = nextRank;
}
// Compare two suffixes
@Override
public int compareTo(Suffix other) {
int c = Integer.compare(this.rank, other.rank);
return c == 0 ? Integer.compare(this.nextRank, other.nextRank): c;
}
}
Let's create a suffix object for each suffixes as below:
suffix | original_index | rank | next_rank |
---|---|---|---|
The original index in the suffix object above represents the position of the suffix in the string . For example suffix starts at index in a string.
Rank and next rank represents ranks of first and second character of suffix in english alphbets. For example if character is "a" rank will be , if character is "b" rank will be , if character is "c" rank will be and so on.
Once suffix objects are ready, sort them by rank first and then by next rank as below:
suffix | original_index | rank | next_rank |
---|---|---|---|
Now, our goal is to group suffixes that have the same first two characters together, by assigning them same rank.
For example, in suffix object table above, suffix and next suffix do not have the same first two characters, so we will assign different rank to them, whereas suffix and next suffix do have same first two characters, so we will assign same rank to them. And so on.
The logic for reassigning rank will be as below:
- Java
// first suffix will be given a rank of 0.
int rank = 0;
// previousRank variable is used here because we override existing rank of each suffix.
int previousRank = suffixes[0].rank;
// overriding rank of 0th suffix.
suffixes[0].rank = rank;
// idxMapping stores mapping of suffix's original index to next index.
idxMapping[suffixes[0].originalIndex] = 0;
for (int i = 1; i < n; i++) {
// compare rank and next rank of current and previous suffix.
if (previousRank != suffixes[i].rank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
// if rank is not same
// store current suffix's old rank as previous rank and
// increment rank
previousRank = suffixes[i].rank;
rank++;
}
// override rank of current suffix
suffixes[i].rank = rank;
// store mapping of current suffix's original index to new index
idxMapping[suffixes[i].originalIndex] = i;
}
For readability, I have retained old rank and old next_rank values in suffix object table. After overriding rank for each suffix, our suffix objects will looks like below:
suffix | original_index | old_rank | old_next_rank | rank | next_rank | idx_mapping |
---|---|---|---|---|---|---|
In above table, idx_mapping indicates current index of each suffix given original index. For example, idx_mapping[5] = 0 indicates suffix which was at index in original string is now at index.
Up until now we have compared first characters of each suffix using rank and next rank. Now, we will compare next two characters of each suffix and to achieve this we override next rank for each suffix.
To calculate next rank we add to suffix's original index and retrieve the rank of the suffix that was originally at that index.
For example, first suffix have original index of , adding to it gives us , which is out of bounds for given string , so we assign next rank as .
Similarly, second suffix have original index of , adding to it gives us . Originally suffix was at index in given string , so we retrieve rank of suffix which is and assign it to suffix .
The logic for reassinging next rank will be as below:
- Java
for (int i = 0; i < n; i++) {
// k here will be multiples of 2
int nextIndex = suffixes[i].originalIndex + k;
// new index of suffix array which was original at nextIndex
int newIndex = idxMapping[nextIndex];
// assign next rank
suffixes[i].nextRank = (nextIndex < n) ? suffixes[newIndex].rank : -1;
}
After reassigning next rank for each suffix, our suffix objects will look like below:
suffix | original_index | old_rank | old_next_rank | rank | next_rank | idx_mapping |
---|---|---|---|---|---|---|
Now, that we have reassigned rank and next rank, we sort the suffixes again. After sorting again, suffixes will look like below:
suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|
Upto this point suffixes are sorted by first characters, in the next iteration we will consider first characters. To achive that we calculate next rank by adding to current suffix's original index.
Firstly, we will repeat the process of reassigning the rank for each suffix. After reassigning rank our suffixes objects will look like below:
suffix | original_index | old_rank | old_new_rank | rank | next_rank | idx_mapping |
---|---|---|---|---|---|---|
Next we will repeat process of reassigning the next rank for each suffix. After reassigning next rank our suffix objects will look like below:
suffix | original_index | old_rank | old_new_rank | rank | next_rank | idx_mapping |
---|---|---|---|---|---|---|
Finally, we sort suffix objects as below:
suffix_array_index | suffix | original_index | rank | next_rank | idx_mapping |
---|---|---|---|---|---|
At this point, the suffixes are sorted by the first characters and can be considered fully sorted because none of the suffixes has a length greater than .
Now, to get the suffix array we just retrieve original index for each suffix and get suffix array as below:
- Java
// function to construct the suffix array
public static int[] buildSuffixArray(char[] s) {
int n = s.length;
Suffix[] suffixes = new Suffix[n];
// Step 1: Create a Suffix object for each character in the input array
for (int i = 0; i < n; i++) {
suffixes[i] = new Suffix(
i, // original index
s[i] - 'a', // rank
(i + 1 < n) ? s[i + 1] - 'a' : -1 // next rank
);
}
// Step 2: Sort the suffixes based on rank and next rank using a custom comparator
Arrays.sort(suffixes);
// An array to store the suffix indices in their sorted order
int[] idxMapping = new int[n];
// Step 3: Assign ranks and create the suffix index array
for (int k = 2; k <= n; k *= 2) {
int rank = 0;
int previousRank = suffixes[0].rank;
suffixes[0].rank = rank;
idxMapping[suffixes[0].originalIndex] = 0;
// Step 3.1: Assign ranks based on suffix comparisons
for (int i = 1; i < n; i++) {
if (previousRank != suffixes[i].rank || suffixes[i - 1].nextRank != suffixes[i].nextRank) {
previousRank = suffixes[i].rank;
rank++;
}
suffixes[i].rank = rank;
idxMapping[suffixes[i].originalIndex] = i;
}
// Step 3.2: Assign nextRank for each suffix
for (int i = 0; i < n; i++) {
int nextIndex = suffixes[i].originalIndex + k;
suffixes[i].nextRank = (nextIndex < n) ? suffixes[idxMapping[nextIndex]].rank : -1;
}
// Step 3.3: Sort the suffixes based on rank and next rank using the custom comparator
Arrays.sort(suffixes);
}
// Step 4: Create the final suffix array and return it
int[] result = new int[n];
for (int i = 0; i < n; i++) {
result[i] = suffixes[i].originalIndex;
}
return result;
}
Complexity
Let's say there are characters in an input string.
Time Complexity
- Assigning ranks takes time.
- Sorting ranks of suffixes takes time.
- These steps are repeated times.
So total time complexity to build suffix array will be:
Time complexity to build suffix array can be reduced to by using radix sort for sorting ranks.
Space Complexity
Algorithm uses linear space for storing ranks, so space complexit to build suffix array will be:
How to Spot These Problems
You can identify Suffix Array problems if the problem requires you to:
- Find the longest or shortest suffix, prefix, or substring.