Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

optimizers.html 10 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
  1. ---
  2. title: Optimizers
  3. keywords: fastai
  4. sidebar: home_sidebar
  5. nb_path: "nbs/08_optimizers.ipynb"
  6. ---
  7. <!--
  8. #################################################
  9. ### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
  10. #################################################
  11. # file to edit: nbs/08_optimizers.ipynb
  12. # command to build the docs after a change: nbdev_build_docs
  13. -->
  14. <div class="container" id="notebook-container">
  15. {% raw %}
  16. <div class="cell border-box-sizing code_cell rendered">
  17. </div>
  18. {% endraw %}
  19. {% raw %}
  20. <div class="cell border-box-sizing code_cell rendered">
  21. </div>
  22. {% endraw %}
  23. {% raw %}
  24. <div class="cell border-box-sizing code_cell rendered">
  25. <div class="output_wrapper">
  26. <div class="output">
  27. <div class="output_area">
  28. <div class="output_markdown rendered_html output_subarea ">
  29. <h2 id="Adafactor" class="doc_header"><code>class</code> <code>Adafactor</code><a href="https://github.com/arampacha/reformer_fastai/tree/master/reformer_fastai/optimizers.py#L10" class="source_link" style="float:right">[source]</a></h2><blockquote><p><code>Adafactor</code>(<strong><code>params</code></strong>, <strong><code>lr</code></strong>=<em><code>None</code></em>, <strong><code>eps</code></strong>=<em><code>(1e-30, 0.001)</code></em>, <strong><code>clip_threshold</code></strong>=<em><code>1.0</code></em>, <strong><code>decay_rate</code></strong>=<em><code>-0.8</code></em>, <strong><code>mom</code></strong>=<em><code>None</code></em>, <strong><code>weight_decay</code></strong>=<em><code>0.0</code></em>, <strong><code>scale_parameter</code></strong>=<em><code>True</code></em>, <strong><code>relative_step</code></strong>=<em><code>True</code></em>, <strong><code>warmup_init</code></strong>=<em><code>False</code></em>) :: <code>Optimizer</code></p>
  30. </blockquote>
  31. <p>Base class for all optimizers.</p>
  32. <p>.. warning::
  33. Parameters need to be specified as collections that have a deterministic
  34. ordering that is consistent between runs. Examples of objects that don't
  35. satisfy those properties are sets and iterators over values of dictionaries.</p>
  36. <p>Arguments:
  37. params (iterable): an iterable of :class:<code>torch.Tensor</code> s or
  38. :class:<code>dict</code> s. Specifies what Tensors should be optimized.
  39. defaults: (dict): a dict containing default values of optimization
  40. options (used when a parameter group doesn't specify them).</p>
  41. </div>
  42. </div>
  43. </div>
  44. </div>
  45. </div>
  46. {% endraw %}
  47. {% raw %}
  48. <div class="cell border-box-sizing code_cell rendered">
  49. </div>
  50. {% endraw %}
  51. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  52. <div class="text_cell_render border-box-sizing rendered_html">
  53. <p>Implements Adafactor algorithm.</p>
  54. <p>This implementation is based on: <code>Adafactor: Adaptive Learning Rates with Sublinear Memory Cost</code> (see <a href="https://arxiv.org/abs/1804.04235">https://arxiv.org/abs/1804.04235</a>)</p>
  55. <p>Note that this optimizer internally adjusts the learning rate depending on the <em>scale_parameter</em>, <em>relative_step</em> and <em>warmup_init</em> options. To use a manual (external) learning rate schedule you should set <code>scale_parameter=False</code> and <code>relative_step=False</code>.</p>
  56. <p><strong>Arguments</strong></p>
  57. <pre><code>`params` (iterable): iterable of parameters to optimize or dicts defining parameter groups
  58. `lr` (float, optional): external learning rate (default: None)
  59. `eps` (tuple[float, float]): regularization constans for square gradient and parameter scale respectively (default: (1e-30, 1e-3))
  60. `clip_threshold` (float): threshold of root mean square of final gradient update (default: 1.0)
  61. `decay_rate` (float): coefficient used to compute running averages of square gradient (default: -0.8)
  62. `mom` (float): coefficient used for computing running averages of gradient (default: None)
  63. `weight_decay` (float, optional): weight decay (L2 penalty) (default: 0)
  64. `scale_parameter` (bool): if True, learning rate is scaled by root mean square of parameter (default: True)
  65. `relative_step` (bool): if True, time-dependent learning rate is computed instead of external learning rate (default: True)
  66. `warmup_init` (bool): time-dependent learning rate computation depends on whether warm-up initialization is being used (default: False)</code></pre>
  67. </div>
  68. </div>
  69. </div>
  70. {% raw %}
  71. <div class="cell border-box-sizing code_cell rendered">
  72. <div class="output_wrapper">
  73. <div class="output">
  74. <div class="output_area">
  75. <div class="output_markdown rendered_html output_subarea ">
  76. <h4 id="adafactor" class="doc_header"><code>adafactor</code><a href="https://github.com/arampacha/reformer_fastai/tree/master/reformer_fastai/optimizers.py#L182" class="source_link" style="float:right">[source]</a></h4><blockquote><p><code>adafactor</code>(<strong><code>param_groups</code></strong>, <strong><code>lr</code></strong>=<em><code>None</code></em>, <strong><code>eps</code></strong>=<em><code>(1e-30, 0.001)</code></em>, <strong><code>clip_threshold</code></strong>=<em><code>1.0</code></em>, <strong><code>decay_rate</code></strong>=<em><code>-0.8</code></em>, <strong><code>mom</code></strong>=<em><code>None</code></em>, <strong><code>weight_decay</code></strong>=<em><code>0.0</code></em>, <strong><code>scale_parameter</code></strong>=<em><code>True</code></em>, <strong><code>relative_step</code></strong>=<em><code>True</code></em>, <strong><code>warmup_init</code></strong>=<em><code>False</code></em>)</p>
  77. </blockquote>
  78. </div>
  79. </div>
  80. </div>
  81. </div>
  82. </div>
  83. {% endraw %}
  84. {% raw %}
  85. <div class="cell border-box-sizing code_cell rendered">
  86. <div class="input">
  87. <div class="inner_cell">
  88. <div class="input_area">
  89. <div class=" highlight hl-ipython3"><pre><span></span><span class="nd">@delegates</span><span class="p">(</span><span class="n">Adafactor</span><span class="o">.</span><span class="fm">__init__</span><span class="p">)</span>
  90. <span class="k">def</span> <span class="nf">adafactor</span><span class="p">(</span><span class="n">param_groups</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
  91. <span class="k">return</span> <span class="n">OptimWrapper</span><span class="p">(</span><span class="n">Adafactor</span><span class="p">([{</span><span class="s1">&#39;params&#39;</span><span class="p">:</span> <span class="n">ps</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">}</span> <span class="k">for</span> <span class="n">ps</span> <span class="ow">in</span> <span class="n">param_groups</span><span class="p">]))</span>
  92. </pre></div>
  93. </div>
  94. </div>
  95. </div>
  96. </div>
  97. {% endraw %}
  98. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  99. <div class="text_cell_render border-box-sizing rendered_html">
  100. <p>Wrapping a pytorch optimizer in fastai's <code>OptimWrapper</code> enables its use with fastai</p>
  101. </div>
  102. </div>
  103. </div>
  104. {% raw %}
  105. <div class="cell border-box-sizing code_cell rendered">
  106. <div class="input">
  107. <div class="inner_cell">
  108. <div class="input_area">
  109. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">ps</span> <span class="o">=</span> <span class="p">[</span><span class="n">tensor</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">])]</span> <span class="c1">#, tensor([4,5,6])]</span>
  110. <span class="n">adaf</span> <span class="o">=</span> <span class="n">Adafactor</span><span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="n">mom</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">weight_decay</span><span class="o">=</span><span class="mf">1e-2</span><span class="p">)</span>
  111. <span class="n">test_adaf</span> <span class="o">=</span> <span class="n">adafactor</span><span class="p">(</span><span class="n">param_groups</span><span class="o">=</span><span class="n">ps</span><span class="p">,</span> <span class="n">mom</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">weight_decay</span><span class="o">=</span><span class="mf">1e-2</span><span class="p">)</span>
  112. <span class="c1">#Access to param_groups</span>
  113. <span class="n">test_eq</span><span class="p">(</span><span class="n">test_adaf</span><span class="o">.</span><span class="n">param_lists</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">adaf</span><span class="o">.</span><span class="n">param_groups</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">&#39;params&#39;</span><span class="p">])</span>
  114. <span class="c1">#Set param_groups</span>
  115. <span class="n">test_adaf</span><span class="o">.</span><span class="n">param_lists</span> <span class="o">=</span> <span class="p">[[</span><span class="n">tensor</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">])]]</span>
  116. <span class="n">test_eq</span><span class="p">(</span><span class="n">test_adaf</span><span class="o">.</span><span class="n">opt</span><span class="o">.</span><span class="n">param_groups</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">&#39;params&#39;</span><span class="p">],</span> <span class="p">[</span><span class="n">tensor</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">)])</span>
  117. <span class="c1">#Access to hypers</span>
  118. <span class="c1"># test_eq(test_adaf.hypers, [{**adaf.defaults}])</span>
  119. <span class="c1"># #Set hypers</span>
  120. <span class="n">test_adaf</span><span class="o">.</span><span class="n">set_hyper</span><span class="p">(</span><span class="s1">&#39;mom&#39;</span><span class="p">,</span> <span class="mf">0.95</span><span class="p">)</span>
  121. <span class="n">test_eq</span><span class="p">(</span><span class="n">test_adaf</span><span class="o">.</span><span class="n">opt</span><span class="o">.</span><span class="n">param_groups</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">&#39;mom&#39;</span><span class="p">],</span> <span class="mf">0.95</span><span class="p">)</span>
  122. </pre></div>
  123. </div>
  124. </div>
  125. </div>
  126. </div>
  127. {% endraw %}
  128. </div>
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...