Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

gmtregress.rst 11 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
  1. .. index:: ! gmtregress
  2. .. include:: module_core_purpose.rst_
  3. **********
  4. gmtregress
  5. **********
  6. |gmtregress_purpose|
  7. Synopsis
  8. --------
  9. .. include:: common_SYN_OPTs.rst_
  10. **gmt regress** [ *table* ] [ |-A|\ [*min*\ /*max*\ /*inc*][**+f**\ [**n**\|\ **p**]] ]
  11. [ |-C|\ *level* ]
  12. [ |-E|\ **x**\|\ **y**\|\ **o**\|\ **r** ]
  13. [ |-F|\ *flags* ]
  14. [ |-N|\ **1**\|\ **2**\|\ **r**\|\ **w** ]
  15. [ |-S|\ [**r**] ]
  16. [ |-T|\ [*min/max*\ /]\ *inc*\ [**+n**] \|\ |-T|\ *file*\|\ *list* ]
  17. [ |SYN_OPT-V| ]
  18. [ |-W|\ [**w**]\ [**x**]\ [**y**]\ [**r**] ]
  19. [ |-Z|\ [±]\ *limit* ]
  20. [ |SYN_OPT-a| ]
  21. [ |SYN_OPT-b| ]
  22. [ |SYN_OPT-d| ]
  23. [ |SYN_OPT-e| ]
  24. [ |SYN_OPT-g| ]
  25. [ |SYN_OPT-h| ]
  26. [ |SYN_OPT-i| ]
  27. [ |SYN_OPT-o| ]
  28. [ |SYN_OPT-q| ]
  29. [ |SYN_OPT--| ]
  30. |No-spaces|
  31. Description
  32. -----------
  33. **regress** reads one or more data tables [or *stdin*]
  34. and determines the best linear [weighted] regression model *y* = *a* + *b*\ \* *x* for each segment using the chosen parameters.
  35. The user may specify which data and model components should be reported. By default, the model will be evaluated at the
  36. input points, but alternatively you can specify an equidistant range over which to evaluate
  37. the model, or turn off evaluation completely. Instead of determining the best fit we can
  38. perform a scan of all possible regression lines
  39. (for a range of slope angles) and examine how the chosen misfit measure varies with slope.
  40. This is particularly useful when analyzing data with many outliers. **Note**: If you
  41. actually need to work with log10 of *x* or *y* you can accomplish that transformation during
  42. the read phase by using the **-i** option.
  43. Required Arguments
  44. ------------------
  45. None
  46. Optional Arguments
  47. ------------------
  48. .. |Add_intables| replace:: The first two columns are expected to contain the required *x* and *y* data. Depending on
  49. your **-W** and **-E** settings we may expect an additional 1-3 columns with error estimates
  50. of one of both of the data coordinates, and even their correlation (see **-W** for details).
  51. .. include:: explain_intables.rst_
  52. .. _-A:
  53. **-A**\ [*min*\ /*max*\ /*inc*][**+f**\ [**n**\|\ **p**]]
  54. There are two uses for this setting: (1) Instead of determining a best-fit regression
  55. we explore the full range of regressions. Examine all possible regression lines with slope
  56. angles between *min* and *max*, using steps of *inc* degrees [-90/+90/1]. For each slope,
  57. the optimum intercept is determined based on your regression type (**-E**) and misfit norm
  58. (**-N**) settings. For each data segment we report the four columns *angle*, *E*, *slope*,
  59. *intercept*, for the range of specified angles. The best model parameters within this range
  60. are written into the segment header and reported in verbose information mode (**-Vi**).
  61. (2) Except for **-N2**, append **+f** to force the best regression to
  62. only consider the given restricted range of angles [all angles]. As shortcuts for negative
  63. or positive slopes, just use **+fn** or **+fp**, respectively.
  64. .. figure:: /_images/GMT_slopes.*
  65. :width: 500 px
  66. :align: center
  67. Scanning slopes (**-A**) to see how the misfit for an fully orthogonal regression using the LMS (-Nr) criterion
  68. varies with the line angle. Here we see the best solution gives a line angle of -78.3 degrees
  69. but there is another local minimum for an angle of 78.6 degrees that is almost as good.
  70. .. _-C:
  71. **-C**\ *level*
  72. Set the confidence level (in %) to use for the optional calculation of confidence bands
  73. on the regression [95]. This is only used if **-F** includes the output column **c**.
  74. .. _-E:
  75. **-Ex**\|\ **y**\|\ **o**\|\ **r**
  76. Type of linear regression, i.e., select the type of misfit we should calculate.
  77. Choose from **x** (regress *x* on *y*; i.e., the misfit is measured horizontally from data point to regression line),
  78. **y** (regress *y* on *x*; i.e., the misfit is measured vertically [Default]), **o** (orthogonal regression;
  79. i.e., the misfit is measured from data point orthogonally to nearest point on the line), or **r** (Reduced Major
  80. Axis regression; i.e., the misfit is the product of both vertical and horizontal misfits) [**y**].
  81. .. figure:: /_images/GMT_misfit.*
  82. :width: 600 px
  83. :align: center
  84. The four types of misfit. The sum of the squared lengths of :math:`e_k` is minimized, for k = e, y, or o.
  85. For **-Er** the sum of the green areas is minimized instead.
  86. .. _-F:
  87. **-F**\ *flags*
  88. Append a combination of the columns you wish returned; the output order will match the order specified. Choose from
  89. **x** (observed *x*), **y** (observed *y*), **m** (model prediction), **r** (residual = data minus model),
  90. **c** (symmetrical confidence interval on the regression; see **-C**
  91. for specifying the level), **z** (standardized residuals or so-called *z-scores*) and **w** (outlier weights 0 or 1; for
  92. **-Nw** these are the Reweighted Least Squares weights) [**xymrczw**].
  93. As an alternative to evaluating the model, just give **-Fp** and we instead write a single record with the model
  94. parameters *npoints xmean ymean angle misfit slope intercept sigma_slope sigma_intercept*.
  95. .. _-N:
  96. **-N1**\|\ **2**\|\ **r**\|\ **w**
  97. Selects the norm to use for the misfit calculation. Choose among **1** (L-1 measure; the mean of the
  98. absolute residuals), **2** (Least-squares; the mean of the squared residuals),
  99. **r** (LMS; The least median of the squared residuals), or **w** (RLS; Reweighted Least Squares: the
  100. mean of the squared residuals after outliers identified via LMS have been removed) [Default is **2**].
  101. Traditional regression uses L-2 while L-1 and in particular LMS are more robust in how they handle outliers.
  102. As alluded to, RLS implies an initial LMS regression which is then used to identify outliers in the data,
  103. assign these a zero weight, and then redo the regression using a L-2 norm.
  104. .. _-S:
  105. **-S**\ [**r**]
  106. Restricts which records will be output. By default all data records will be output in the format specified
  107. by **-F**. Use **-S** to exclude data points identified as outliers by the regression. Alternatively,
  108. use **-Sr** to reverse this and only output the outlier records.
  109. .. _-T:
  110. **-T**\ [*min/max*\ /]\ *inc*\ [**+n**] \|\ |-T|\ *file*\|\ *list*
  111. Evaluate the best-fit regression model at the equidistant points implied by the arguments. If only
  112. **-T**\ *inc* is given instead we will reset *min* and *max* to the extreme *x*-values for each segment.
  113. To skip the model evaluation entirely, simply provide **-T**\ 0.
  114. For details on array creation, see `Generate 1D Array`_.
  115. .. _-V:
  116. .. |Add_-V| unicode:: 0x20 .. just an invisible code
  117. .. include:: explain_-V.rst_
  118. .. _-W:
  119. **-W**\ [**w**]\ [**x**]\ [**y**]\ [**r**]
  120. Specifies weighted regression and which weights will be provided.
  121. Append **x** if giving 1-sigma uncertainties in the *x*-observations, **y** if giving 1-sigma uncertainties in *y*, and
  122. **r** if giving correlations between *x* and *y* observations, in the order these columns appear in the input (after the
  123. two required and leading *x*, *y* columns).
  124. Giving both **x** and **y** (and optionally **r**) implies an orthogonal regression, otherwise giving
  125. **x** requires **-Ex** and **y** requires **-Ey**.
  126. We convert uncertainties in *x* and *y* to regression weights via the relationship weight = 1/sigma.
  127. Use **-Ww** if the we should interpret the input columns to have precomputed weights instead. **Note**: Residuals
  128. with respect to the regression line will be scaled by the given weights. Most norms will then square this weighted
  129. residual (**-N1** is the only exception).
  130. .. _-Z:
  131. **-Z**\ [±]\ *limit*
  132. Change the threshold for outlier detection: When **-Nw** is used, residual *z-scores* that exceed this *limit* [±2.5] will
  133. be flagged as outliers. To only consider negative or positive *z-scores* as possible outliers, specify a signed *limit*.
  134. .. include:: explain_-aspatial.rst_
  135. .. |Add_-bi| unicode:: 0x20 .. just an invisible code
  136. .. include:: explain_-bi.rst_
  137. .. |Add_-bo| replace:: [Default is same as input].
  138. .. include:: explain_-bo.rst_
  139. .. |Add_-d| unicode:: 0x20 .. just an invisible code
  140. .. include:: explain_-d.rst_
  141. .. |Add_-e| unicode:: 0x20 .. just an invisible code
  142. .. include:: explain_-e.rst_
  143. .. |Add_-g| unicode:: 0x20 .. just an invisible code
  144. .. include:: explain_-g.rst_
  145. .. |Add_-h| unicode:: 0x20 .. just an invisible code
  146. .. include:: explain_-h.rst_
  147. .. include:: explain_-icols.rst_
  148. .. include:: explain_-ocols.rst_
  149. .. include:: explain_-q.rst_
  150. .. include:: explain_help.rst_
  151. .. include:: explain_precision.rst_
  152. .. include:: explain_array.rst_
  153. Notes:
  154. ------
  155. The output segment header will contain all the various statistics we compute for each segment.
  156. These are in order: *N* (number of points), *x0* (weighted mean x), *y0* (weighted mean y),
  157. *angle* (of line), *E* (misfit), *slope*, *intercept*, *sigma_slope*, and *sigma_intercept*. For the
  158. standard regression (**-Ey**) we also report the Pearsonian correlation (*r*) and
  159. coefficient of determination (*R*). We end with the effective number of measurements, :math:`n_{eff}`.
  160. For weighted data and the calculation of squared regression misfit to minimize (**-N2**), we use
  161. .. math::
  162. E_2(\nu) = \frac{\sum_{i=1}^n w_i e_i^2}{\sum_{i=1}^n w_i} \frac{n_{eff}}{n_{eff}-2},
  163. where the effective number of measurements is given by
  164. .. math::
  165. n_{eff} = \frac{\left (\sum_{i=1}^n w_i\right )^2}{\sum_{i=1}^n w_i^2}.
  166. and hence :math:`\nu = n_{eff} - 2` are the effective degrees of freedom.
  167. Examples
  168. --------
  169. .. include:: explain_example.rst_
  170. To return the coordinates on the best-fit orthogonal regression line through the data in the
  171. remote file hertzsprung-russell.txt, try::
  172. gmt regress @hertzsprung-russell.txt -Eo -Fxm
  173. To do a standard least-squares regression on the *x-y* data in points.txt and return
  174. x, y, and model prediction with 99% confidence intervals, try
  175. ::
  176. gmt regress points.txt -Fxymc -C99 > points_regressed.txt
  177. To just get the slope for the above regression, try
  178. ::
  179. slope=`gmt regress points.txt -Fp -o5`
  180. To do a reweighted least-squares regression on the data rough.txt and return
  181. x, y, model prediction and the RLS weights, try
  182. ::
  183. gmt regress rough.txt -Fxymw > points_regressed.txt
  184. To do an orthogonal least-squares regression on the data crazy.txt but first take
  185. the logarithm of both x and y, then return x, y, model prediction and the normalized
  186. residuals (z-scores), try
  187. ::
  188. gmt regress crazy.txt -Eo -Fxymz -i0-1l > points_regressed.txt
  189. To examine how the orthogonal LMS misfits vary with angle between 0 and 90
  190. in steps of 0.2 degrees for the same file, try
  191. ::
  192. gmt regress points.txt -A0/90/0.2 -Eo -Nr > points_analysis.txt
  193. To force an orthogonal LMS to pick the best solution with a positive slope, try
  194. ::
  195. gmt regress points.txt -A+fp -Eo -Nr > best_pos_slope.txt
  196. References
  197. ----------
  198. Bevington, P. R., 1969, *Data reduction and error analysis for the physical sciences*,
  199. 336 pp., McGraw-Hill, New York.
  200. Draper, N. R., and H. Smith, 1998, *Applied regression analysis*, 3rd ed., 736 pp.,
  201. John Wiley and Sons, New York.
  202. Rousseeuw, P. J., and A. M. Leroy, 1987, *Robust regression and outlier detection*,
  203. 329 pp., John Wiley and Sons, New York.
  204. York, D., N. M. Evensen, M. L. Martinez, and J. De Basebe Delgado, 2004, Unified
  205. equations for the slope, intercept, and standard errors of the best straight line,
  206. *Am. J. Phys.*, 72(3), 367-375.
  207. See Also
  208. --------
  209. :doc:`gmt`,
  210. :doc:`trend1d`,
  211. :doc:`trend2d`
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...