IVF-HNSW
 All Classes Functions Variables Typedefs Pages
Public Member Functions | Public Attributes | Protected Attributes | List of all members
ivfhnsw::IndexIVF_HNSW_Grouping Struct Reference
Inheritance diagram for ivfhnsw::IndexIVF_HNSW_Grouping:
ivfhnsw::IndexIVF_HNSW

Public Member Functions

 IndexIVF_HNSW_Grouping (size_t dim, size_t ncentroids, size_t bytes_per_code, size_t nbits_per_idx, size_t nsubcentroids)
 
void add_group (size_t group_idx, size_t group_size, const float *x, const idx_t *ids)
 
void search (size_t k, const float *x, float *distances, long *labels)
 
void write (const char *path_index)
 Write index to the path.
 
void read (const char *path_index)
 Read index from the path.
 
void train_pq (size_t n, const float *x)
 
void compute_inter_centroid_dists ()
 Compute distances between the group centroid and its <subc> nearest neighbors in the HNSW graph.
 
- Public Member Functions inherited from ivfhnsw::IndexIVF_HNSW
 IndexIVF_HNSW (size_t dim, size_t ncentroids, size_t bytes_per_code, size_t nbits_per_idx)
 
void build_quantizer (const char *path_data, const char *path_info, const char *path_edges, size_t M=16, size_t efConstruction=500)
 
void assign (size_t n, const float *x, idx_t *labels, size_t k=1)
 
virtual void add_batch (size_t n, const float *x, const idx_t *xids, const idx_t *precomputed_idx=nullptr)
 
void compute_centroid_norms ()
 Compute norms of the HNSW vertices.
 
void rotate_quantizer ()
 For correct search using OPQ encoding rotate points in the coarse quantizer.
 

Public Attributes

size_t nsubc
 Number of sub-centroids per group.
 
bool do_pruning
 Turn on/off pruning.
 
std::vector< std::vector< idx_t > > nn_centroid_idxs
 Indices of the <nsubc> nearest centroids for each centroid.
 
std::vector< std::vector< idx_t > > subgroup_sizes
 Sizes of sub-groups for each group.
 
std::vector< float > alphas
 Coefficients that determine the location of sub-centroids.
 
- Public Attributes inherited from ivfhnsw::IndexIVF_HNSW
size_t d
 Vector dimension.
 
size_t nc
 Number of centroids.
 
size_t code_size
 Code size per vector in bytes.
 
hnswlib::HierarchicalNSW * quantizer
 Quantizer that maps vectors to inverted lists (HNSW [Y.Malkov])
 
faiss::ProductQuantizer * pq
 Produces the residual codes.
 
faiss::ProductQuantizer * norm_pq
 Produces the norm codes of reconstructed base vectors.
 
faiss::LinearTransform * opq_matrix
 Rotation matrix for OPQ encoding.
 
bool do_opq
 Turn on/off OPQ encoding.
 
size_t nprobe
 Number of probes at search time.
 
size_t max_codes
 Max number of codes to visit to do a query.
 
std::vector< std::vector< idx_t > > ids
 Inverted lists for indexes.
 
std::vector< std::vector
< uint8_t > > 
codes
 PQ codes of residuals.
 
std::vector< std::vector
< uint8_t > > 
norm_codes
 PQ codes of norms of reconstructed base vectors.
 

Protected Attributes

std::vector< float > query_centroid_dists
 Distances to the coarse centroids. Used for distance computation between a query and base points.
 
std::vector< std::vector< float > > inter_centroid_dists
 Distances between coarse centroids and their sub-centroids.
 
- Protected Attributes inherited from ivfhnsw::IndexIVF_HNSW
std::vector< float > norms
 L2 square norms of reconstructed base vectors.
 
std::vector< float > centroid_norms
 L2 square norms of coarse centroids.
 
std::vector< float > precomputed_table
 Size pq.M * pq.ksub.
 

Additional Inherited Members

- Public Types inherited from ivfhnsw::IndexIVF_HNSW
typedef uint32_t idx_t
 all indices are this type
 
- Protected Member Functions inherited from ivfhnsw::IndexIVF_HNSW
float pq_L2sqr (const uint8_t *code)
 L2 sqr distance function for PQ codes.
 

Member Function Documentation

void ivfhnsw::IndexIVF_HNSW_Grouping::add_group ( size_t  group_idx,
size_t  group_size,
const float *  x,
const idx_t ids 
)

Add <group_size> vectors of dimension <d> from the <group_idx>-th group to the index.

Parameters
group_idxindex of the group
group_sizenumber of base vectors in the group
xbase vectors to add (size: group_size * d)
idsids to store for the vectors (size: groups_size)
void ivfhnsw::IndexIVF_HNSW_Grouping::search ( size_t  k,
const float *  x,
float *  distances,
long *  labels 
)
virtual

Search procedure

During the IVF-HNSW-PQ + Grouping search we compute

d = || x - y_S - y_R ||^2

where x is the query vector, y_S the coarse sub-centroid, y_R the refined PQ centroid. The expression can be decomposed as:

d = (1 - α) * (|| x - y_C ||^2 - || y_C ||^2) + α * (|| x - y_N ||^2 - || y_N ||^2) + || y_S + y_R ||^2 - 2 * (x|y_R)


term 1 term 2 term 3 term 4

We use the following decomposition:

  • term 1 is the distance to the coarse centroid, that is computed during the 1st stage search in the HNSW graph, minus the norm of the coarse centroid.
  • term 2 is the distance to y_N one of the <subc> nearest centroids, that is used for the sub-centroid computation, minus the norm of this centroid.
  • term 3 is the L2 norm of the reconstructed base point, that is computed at construction time, quantized using separately trained product quantizer for such norms and stored along with the residual PQ codes.
  • term 4 is the classical non-residual distance table.

Norms of centroids are precomputed and saved without compression, as their memory consumption is negligible. If it is necessary, the norms can be added to the term 3 and compressed to byte together. We do not think that it will lead to considerable decrease in accuracy.

Since y_R defined by a product quantizer, it is split across sub-vectors and stored separately for each sub-vector.

Reimplemented from ivfhnsw::IndexIVF_HNSW.

void ivfhnsw::IndexIVF_HNSW_Grouping::train_pq ( size_t  n,
const float *  x 
)
virtual

Train product quantizers

Parameters
nnumber of training vectors of dimension d
xlearn vectors, size n * d

Reimplemented from ivfhnsw::IndexIVF_HNSW.


The documentation for this struct was generated from the following files: