Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

recipe_recommendations.py 8.2 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
  1. import numpy as np
  2. import pandas as pd
  3. import streamlit as st
  4. from collections import Counter
  5. from sklearn.feature_extraction.text import CountVectorizer
  6. from sklearn.metrics.pairwise import cosine_similarity
  7. import matplotlib.pyplot as plt
  8. import networkx as nx
  9. from plot import plot_network_graph
  10. pd.set_option("display.max_colwidth", None)
  11. #### Introduction
  12. st.markdown('<style>h1{color: red;}</style>', unsafe_allow_html=True)
  13. st.markdown('<style>h2{color: blue;}</style>', unsafe_allow_html=True)
  14. st.title("Recipe inspirations")
  15. st.write("Often recipe websites contain a filter to look for recipes. Sometimes when you look for a recipe you don't necessarily know what filter you want to apply exactly. Do you want to filter on recipes with zucchini or carrots? Maybe you don't mind what kind of vegetable is used when the taste of the dishes are sort of similar.")
  16. st.write("The goal of this recommendation system is to find recipes that are considered similar in taste and composition. And, to challenge your comfortzone with new recipes.")
  17. st.write("In order to do that we use the ingredients of each recipe as the features and calculate how similar to another recipe each recipe is.")
  18. st.write("The foundation of this recommendation system is a network graph. A network graph will provide a representation of how connected, or similar, recipes are.")
  19. st.write("In this example the relations between recipes are explored. However, the concept described here can be applied to many other type of relations between entities or actors.")
  20. st.write("Ok, let's continue with an example. For example, try recipe 41995 in the Mexican kitchen.")
  21. # Load the data
  22. df = pd.read_json('../data/train.json')
  23. df = df.head(1000)
  24. #### Choose your recipe of reference
  25. st.header("Recommend recipes")
  26. # Choose a kitchen category
  27. category = st.selectbox(label='Select a kitchen', options= df['cuisine'].unique())
  28. # Choose a recipe
  29. category_subset = df[df['cuisine']==category]
  30. recipe = st.selectbox(label='Select a recipe', options= sorted(category_subset['id'].unique()))
  31. # get index by recipe ID
  32. RECIPE_INDEX = df[df['id']==recipe].index.values[0]
  33. THRESHOLD = st.slider('Choose a similarity threshold', 0., 1., .5, .01)
  34. #### Preprocess the data and compute similarities between recipes
  35. if recipe is not None:
  36. #### Create a document-term-matrix
  37. vectorizer = CountVectorizer(lowercase=True, min_df=1, analyzer='word', stop_words=None)
  38. #### one dtm with matching unique words
  39. onewordingredients = [["".join(i.split()) for i in inner] for inner in list(df['ingredients'])]
  40. original_ingredient_corpus = [" ".join(i) for i in onewordingredients]
  41. dtm_orignal_ingredient = vectorizer.fit_transform(original_ingredient_corpus)
  42. #### And another dtm where each word is its own token
  43. separate_words_corpus = [" ".join(i) for i in list(df['ingredients'])]
  44. dtm_separate_words = vectorizer.fit_transform(separate_words_corpus)
  45. # concatenate matrices
  46. dtm = np.concatenate((dtm_orignal_ingredient.toarray(), dtm_separate_words.toarray()), axis=1)
  47. #### Compute similarity between any two recipes
  48. similarity_csr = cosine_similarity(dtm, dense_output=False)
  49. # get similar recipes by index
  50. sim_recipes = np.argwhere(similarity_csr > THRESHOLD)
  51. sim_recipes = sim_recipes[sim_recipes[:, 0] != sim_recipes[:, 1]]
  52. st.write('When you hit any of the button below it will show you a dataframe with the recipe index, the kitchen of the recipe, and the corresponding ingredients per recipe.')
  53. #### Return similar recipes
  54. first_order = [i[1] for i in sim_recipes if i[0] in [RECIPE_INDEX]]
  55. second_order = list(set([i[1] for i in sim_recipes if i[0] in first_order]))
  56. # remove original recipe
  57. if RECIPE_INDEX in second_order:
  58. second_order.remove(RECIPE_INDEX)
  59. second_order = [x for x in second_order if x not in first_order]
  60. third_order = list(set([i[1] for i in sim_recipes if i[0] in second_order]))
  61. # remove original recipe
  62. if RECIPE_INDEX in third_order:
  63. third_order.remove(RECIPE_INDEX)
  64. third_order = [x for x in third_order if x not in first_order+second_order]
  65. first_order_output = st.button('Find most similar recipes')
  66. if first_order_output:
  67. st.dataframe(df.loc[first_order].set_index('id'))
  68. # Find new recipes that are similar to the recipes similar to the reference recipe
  69. second_order_output = st.button('Show me more recipes')
  70. if second_order_output:
  71. st.dataframe(df.loc[second_order].set_index('id'))
  72. # repeat
  73. third_order_output = st.button('Let me be inspired')
  74. if third_order_output:
  75. st.dataframe(df.loc[third_order].set_index('id'))
  76. #### Build the network graph
  77. st.header("Visualization of the recommendation system")
  78. st.write("If you're curious what lies underneath this recommendation system, hit the button below and it will show you how this recommendation system is structured.")
  79. model_run = st.button('Visualize the output')
  80. if model_run:
  81. # get list of all recommended recipes by index
  82. all_recommendations = list(set([RECIPE_INDEX] + first_order + second_order + third_order))
  83. all_recommendations.sort()
  84. # keep only those recipes of interest
  85. # - note that a new matrix will change the index number of the recommended recipes
  86. row_idx = np.array(all_recommendations)
  87. col_idx = np.array(all_recommendations)
  88. recommendation_csr = similarity_csr[row_idx[:, None], col_idx]
  89. # for the connected nodes keep only those pairs that have a similarity > THRESHOLD
  90. direct_recommendation_csr = (recommendation_csr > THRESHOLD)
  91. # convert adjacency matrix to graph
  92. G = nx.from_numpy_matrix(direct_recommendation_csr)
  93. # return the new indices of the narrowed matrix containing only the recommendations
  94. new_indices = [i for i in enumerate(all_recommendations)]
  95. # get the new index of the original recipe
  96. original_recipe_idx = [i[0] for i in new_indices if i[1]==RECIPE_INDEX][0]
  97. # get new indices of the recommendations
  98. first = []
  99. second = []
  100. third = []
  101. for idx,i in new_indices:
  102. if i in first_order:
  103. first.append(idx)
  104. if i in second_order:
  105. second.append(idx)
  106. if i in third_order:
  107. third.append(idx)
  108. #### Create the visualization
  109. # map a color to the recommendation level
  110. d = {}
  111. d[original_recipe_idx] = 0
  112. d.update({i: 1 for i in first})
  113. d.update({j: 2 for j in second})
  114. d.update({k: 3 for k in third})
  115. node_colors_by_position = [d[i] for i in sorted(d)]
  116. node_text_by_position = list(df.loc[all_recommendations]['id'].values)#list(pos.keys())
  117. fig = plot_network_graph(G, TITLE="Recommended recipes by distance with threshold of {}".format(THRESHOLD), list_of_colors_by_order_of_nodes=node_colors_by_position, list_of_text_by_order_of_nodes=node_text_by_position)
  118. st.plotly_chart(fig, use_container_width=True, sharing='streamlit')
  119. st.write("The idea behind this recommendation system is to look for recipes that are most similar in terms of their cosine distance. Each recipe is converted to a vector throug a bag of words matrix. The cosine similarity is calculated between any two vectors. The advantage of using a cosine similarity metric is to outweigh the fact that some recipes contain very few ingredients and others contain many. By converting recipes to vectors we only consider the angel between two vectors and not the lenght. The result is a matrix where the columns and rows represent the recipes and the values the value of the similarity ranging from 0.0 to 1.0. Here, any two recipes with a threshold of more than 0.5 are considered similar.")
  120. st.write("This method is repeated three times. The first iteration returns recipes similar to the reference recipe. The second iteration returns recipes similar to the recipes that are similar to the reference recipe. Repeated three times in total.")
  121. st.write("Next, a network is constructed of only those recipes.")
  122. st.write("The nodes in the network are colored by the level of similarity. The reference recipe is colored in blue; its most direct recommendations are purple; the most similar recipes to those are orange; and the furthest recommendations are yellow.")
  123. st.write("Based on your curiousity to try out recipes with a new taste, but that show some familiarity, you may grow the network.")
  124. st.write("Enjoy!")
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...