# Intersection code notes
## `WordList` object
What does it represent?
* Suggested words list? No, because that's returned in pointers from the intersection function.
* The Peter Broda, etc. word list? No, becuase that's all contained in the `resource` property.
* A object that handles word list processing? I think so.
* Why is there only one `filter` property?
* A single `WordList` object represents a single filter.
* You only ever need one or two filters.
* For one filter, that is the `MATCH` mode.
* For two filters, that is the `INTERSECTION` mode,
* But the functions for both of these accept `filter` parameters, instead of using the property. How come?
* The match and intersect functions almost seem more like utility functions. Do they make use of the `WordList` object being passed in, beyond the fact that they need the `resource`?
* Are there multiple `WordList` objects running in multiple threads, concurrently? The docs suggest this.
* Are `WordList` objects ever reused by changing the `filter` and `WordListMode`?
## `WordList` resource
### Word List section

### Filter Fragments section

==NOTE: *Word 1*, *Word 2*, etc. are offset pointers to the corresponding word, in the Word List section.==
==CORRECTION: Filter Index section should be from `A? index` to `...???Z index`.==
## `word_list_find_intersection`
### Variables
* Filters:
* `filter1`
* `filter2`
* Example: `C?T?`, `????`
* Filter offsets (of intersecting cell):
* `pos1`
* `pos2`
* Number of non-wildcards in filters (set by validation function):
* `n_chars1`
* `n_chars2`
* 0-1 are simple cases.
* Zero wildcards means `????`.
### Steps
1. Sanity checks:
* At least one filter is non-null.
* Both offsets are smaller than the filter length.
* If both filters exist, intersecting chars must be the same.
1. If intersecting char is a letter (non-wildcard):
* This is a simple case.
* For each filter, use the `MATCH` mode to generate the word suggestions.
1. If one or both filters are invalid:
* Ignore any invalid filters.
* If there's a valid filter:
* ==Phase 2 TODO==
1. If both filters are valid:
* Summary of phases:
* ==Phase 1: Return a charset with all the possible letters the intersecting cell can have, for filter 1.==
* ==Phase 2: Get all the possible words for filter 2.==
* ==Phase 3: Get all the possible words for filter 3.==
* Phase 1:
* If `n_chars1` is 0 (simple case):
* Filter looks like `????`.
* Look for fragments with the same length, and return a charset of all the letters that appear at `pos1`.
* If `n_chars1` is not 0:
* `get_filter_intersect_list`:
* For each letter in the filter:
* Create `FilterFragment` for that letter.
* `word_list_lookup_fragment`: get `FilterFragmentList` from the `FilterFragment`.
* ==NOTE: `FilterFragmentList` = filter fragment index.==
* How can the list have more than 1 element? How could something like `?A?` correspond to multiple fragments?
* `FilterFragmentList.len` refers to the number of words that match that filter fragment.
* It's not a list of filter fragments; it's a list of words that match the filter fragment.
* `FilterFragmentList` is essentially an object that represents a single *filter index* in the resource (i.e., an offset and length for a filter fragment).
* `intersect_list` = `intersect_list ∩ FilterFragmentList`.
* Return `intersect_list`.
* So essentially, this function returns the offsets of the words that match the given filter.
* Why not just use `word_list_get_array_list`? Probably because it's much more expensive. We only need the offsets; we don't need an array of the actual words themselves.
* Create a charset.
* For each word, add the intersecting letter to the charset.
* Return the charset.
* Phase 2:
* Repeat phase 1, but for filter 2. Get a charset of the possible letters the intersecting cell can have.
* Get the final charset: intersection between the filter 1 and filter 2 charsets.
* `word_array` = all the words that match `filter 2` and whose intersecting letter is in the final charset.
* Return `word_array`.
* Phase 3:
* `word_array` = all the words that match `filter 1` and whose intersecting letter is in the final charset.
* Return `word_array`.
# The bug
* Only occurs when the active slot is completely empty.
* The crossing slots may or may not contain letters in them (except for the crossing cell).
* So this bug occurs in this code path:
```
1.
word_list_find_intersection
```
```
2.
calculate_intersect_phase3
```
```
3.
if (n_chars == 0)
calculate_empty_filter_pos_chars
```
## `calculate_empty_filter_pos_chars`
* For each character in the charset:
* Create a `FragmentList` for that character.
* For each word in the `FragmentList`:
* Create a `WordIndex` for that word.
* Append the `WordIndex` to the return array.
So the function adds the words of the first fragment into the return array. Then it adds the words of the second fragment, and so on. And the order of the fragments is based on the order of the characters in the charset.
## Why does the bug occur?
Assume we are one a completely empty 3x3 grid, with the cursor on the top left corner:
```
▄ . .
. . .
. . .
```
This slot is compeltely unrestricted. So the charset is all the letters of the alphabet (*A* to *Z*).
The outer for loop iterates through each character of the charset, and it does this alphabetically. (Not sure if this is enforced by the `IpuzCharset` object itself, or if it's because we always add the letters alphabetically.)
Because of this, the fragments are sorted alphabetically, in the return array:
```
A?? words
B?? words
Z?? words
```
But if we are on the top-middle cell:
```
. ▄ .
. . .
. . .
```
Then the fragments in alphabetical order look like this:
```
?A? words
?B? words
...
?Z? words
```
Which is what causes the bug.
## Intra-fragment word order
Within each fragment subsection of the return array, the words are sorted in standard alphabetical order.
The ordering is based on the ordering of `FragmentList.data`:
```
for (guint i = 0; i < fragment_list.len; i++)
{
word_index.index = (gint) fragment_list.data[i];
g_array_append_val (word_array, word_index);
}
```
And `FragmentList.data` is mmaped to the word list resource.
==But aren't the words for each filter fragment stored in arbitrary order? What's causing the alphabetical order?==
## `word_list_lookup_fragment`
This function creates a `FilterFragmentList` from a `FilterFragment`:
```
static gboolean
word_list_lookup_fragment (WordList *word_list,
FilterFragment fragment,
FilterFragmentList *list)
{
gint offset;
guint frag_offset;
*****************************************************
* Is this the offset for the filter fragment index? *
*****************************************************
offset = word_index_fragment_index (word_list->index->min_length,
ipuz_charset_get_n_chars (word_list->index->charset),
fragment);
***************************************************************************
* How come we multiply the offset by the size of a filter fragment index? *
***************************************************************************
offset *= (sizeof (guint) + sizeof (gushort));
offset += word_list->index->letter_index_offset;
frag_offset = *(guint *) (word_list->data + offset);
list->data = (gushort *)(word_list->data + frag_offset);
list->len = *(gushort *)(word_list->data + offset + sizeof (guint));
return TRUE;
}
```
## The fix
The fix is to sort the list somehow.
The Phase 3 function returns the list of words as a `WordArray`. But apparently, `WordArray` should already be sorted, because it does that automatically?
```
/* WordArrays are a simple list of words represented as
* WordIndex. They are unique: inserting a word multiple times results
* in the word only existing once, ****and the array is always sorted.****
* It is used to keep a list of words we don't want to search through, as
* well as used internally within the word-list.
```
Then how come it isn't sorted in the UI?
## `WordArray` sorting
`WordArray` is not sorted properly, or at least it's not sorted by the `WordIndex` elements' offsets

We also want to sort it by word score first, and then alphabetically. But that is not happening here. The sorting does not consider word score at all.
If we can get `WordArray` to sort itself by offset, then the list would be in the right order. Because the word list resource already has the appropriate sorting (word score, then alphabetical).