Faiss wrapper
FaissIndex
A class for creating and querying a Faiss index.
Attributes:
Name | Type | Description |
---|---|---|
index |
faiss.Index
|
The Faiss index object used for similarity search. |
reverse_index |
dict
|
A dictionary mapping document IDs to their corresponding index in the Faiss index. This allows for quick lookup of document vectors during query time. |
Source code in src/utils/Faiss.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
__init__()
add_vectors(vectors, contents)
Add vectors and their corresponding contents to the Faiss index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vectors |
np.ndarray
|
The vectors to add to the index. Shape should be (num_vectors, vector_dim). |
required |
contents |
List[str]
|
The corresponding contents for each vector. Length should be num_vectors. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the length of contents does not match the number of vectors. |
Source code in src/utils/Faiss.py
load(index_file)
Load the Faiss index and its corresponding reverse index from disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_file |
Path
|
The path to the Faiss index file. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the index file or reverse index file is not found. |
Source code in src/utils/Faiss.py
save(index_file)
Save the Faiss index to disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_file |
Path
|
The path to the Faiss index file. |
required |
Source code in src/utils/Faiss.py
search(query_embedding)
Search the Faiss index for the nearest neighbors of a given query embedding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query_embedding |
np.ndarray
|
An array of shape (1, D) containing the query embedding, where D is the dimensionality of the embeddings used to build the index. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
A list of strings, each of which corresponds to the content of the document that |
List[str]
|
is closest to the query embedding in the embedding space. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the index has not been loaded yet. |
AssertionError
|
If the dimensionality of the query embedding does not match that of the index. |