Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

params.yaml 2.0 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
  1. eimp:
  2. embeddings:
  3. vector_size: 50,100,200, 300 # list of values devided by coma, or range, e.g. 50:500:10
  4. dataset_size: 2000, 5000
  5. train_size: 0.9 # proportion of train data in the dataset
  6. all_models_params:
  7. metric: ['euclidean', 'manhattan', 'chebyshev', 'cosine', 'angular']
  8. search_params:
  9. k: [1, 10] #k-neighbors
  10. model:
  11. faiss:
  12. venc: [8, 16, 32, 64] #vectors encoding
  13. indexes: ['Flat', 'HNSW32,Flat', 'IVF65536_HNSW32,Flat', 'HNSW32,SQ8', 'IVF65536_HNSW32,SQ8']
  14. nprobe: [1, 5, 10, 20, 40, 80, 100] #the number of cells (out of nlist)
  15. nlist: [1, 5, 10, 20, 40, 80, 100] #the number of cells
  16. M: [1, 10, 100, 1000] #is the number of neighbors used in the graph
  17. annoy:
  18. n_trees: [10, 50, 100, 200, 500, 1000]
  19. postgre:
  20. indexes: ['gist', 'spgist']
  21. KDTree:
  22. leaf_size: [10, 50, 100, 200, 500, 1000]
  23. estimation: # different view on graphs to build
  24. aimed_param_values: # if the following parameters variations are not shown is the graph, only part of dataframe will be considered with their restrictions
  25. dataset_size: 5000
  26. metric: 'euclidean'
  27. vector_size: 300
  28. k: 10
  29. x: ['k', "dataset_size", 'vector_size'] # parameters to be used in x axis
  30. y: # parameters to be used in y axis
  31. train: ["training_time", "saving_time", "model_size"]
  32. test: ["search_time", "loading_time"]
  33. lines: ['k', "metric", "model"] # parameters to be used as different lines on the graph (hue/color/lines)
  34. facet: ['k', "dataset_size", 'vector_size', "metric"] # parameters to be used in facet construction
  35. topn: 3 # show best topn models in the report file
  36. relative_graphs: false # show relative (e.g. model search time / fullscan search time) data on graphs
  37. log10_graphs: true # show results in semi-log scale
  38. order_size_train:
  39. max: 10
  40. min: 2
  41. random_model:
  42. random_seed: 2019
  43. recommendations:
  44. n_items: 10
  45. basket_tfidf_perceptron_model:
  46. random_seed: 10027
  47. params_grid: {"batch_size": 512, "epoch_count": 20, "lr": 0.001, "momentum": 0.8}
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...