Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

batch.rst 16 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
  1. .. index:: ! batch
  2. .. include:: module_core_purpose.rst_
  3. *****
  4. batch
  5. *****
  6. |batch_purpose|
  7. Synopsis
  8. --------
  9. .. include:: common_SYN_OPTs.rst_
  10. **gmt batch** *mainscript*
  11. |-N|\ *prefix*
  12. |-T|\ *njobs*\|\ *min*/*max*/*inc*\ [**+n**]\|\ *timefile*\ [**+p**\ *width*]\ [**+s**\ *first*]\ [**+w**\ [*str*]]
  13. [ |-I|\ *includefile* ]
  14. [ |-M|\ [*job*] ]
  15. [ |-Q|\ [**s**] ]
  16. [ **-Sb**\ *preflight* ]
  17. [ **-Sf**\ *postflight* ]
  18. [ |SYN_OPT-V| ]
  19. [ |-W|\ [*workdir*] ]
  20. [ |-Z| ]
  21. [ |SYN_OPT-x| ]
  22. [ |SYN_OPT--| ]
  23. |No-spaces|
  24. Description
  25. -----------
  26. The **batch** module can generate GMT processing jobs using a single master script
  27. that is repeated for all jobs, with some variation using specific job variables. The
  28. module simplifies (and hides) most of the steps normally needed to set up a full-blown
  29. processing sequence. Instead, the user can focus on composing the main processing script and let the
  30. parallel execution of jobs be automatic. We can set up required data sets and do one-time calculations
  31. via an optional *preflight* script. After completion we can optionally assemble the data output
  32. and make summary plots or similar in the *postflight* script.
  33. Required Arguments
  34. ------------------
  35. *mainscript*
  36. Name of a stand-alone GMT modern mode processing script that makes the parameter-dependent calculations. The
  37. script may access job variables, such as job number and others defined below, and may be
  38. written using the Bourne shell (.sh), the Bourne again shell (.bash), the csh (.csh)
  39. or DOS batch language (.bat). The script language is inferred from the file extension
  40. and we build hidden batch scripts using the same language. Parameters that can be accessed
  41. are discussed below.
  42. .. _-N:
  43. **-N**\ *prefix*
  44. Determines the prefix of the batch file products and the final sub-directory with all job products.
  45. .. _-T:
  46. **-T**\ *njobs*\|\ *min*/*max*/*inc*\ [**+n**]\|\ *timefile*\ [**+p**\ *width*]\ [**+s**\ *first*]\ [**+w**\ [*str*]]
  47. Either specify how many jobs to make, create a one-column data set width values from
  48. *min* to *max* every *inc* (append **+n** if *inc* is number of jobs instead), or supply a file with
  49. a set of parameters, one record (i.e., row) per job. The values in the columns will be available to the
  50. *mainscript* as named variables **BATCH_COL0**, **BATCH_COL1**, etc., while any trailing text
  51. can be accessed via the variable **BATCH_TEXT**. Append **+w** to split the trailing
  52. string into individual *words* that can be accessed via variables **BATCH_WORD0**, **BATCH_WORD1**,
  53. etc. By default we use any white-space to separate words. Append *str* to select another character(s)
  54. as the valid separator(s). The number of records equals the number of jobs. Note that the *preflight* script is allowed to
  55. create *timefile*, hence we check for its existence both before *and* after the *preflight* script has
  56. completed. Normally, the job numbering starts at 0; you can change this by appending a different starting
  57. job number via **+s**\ *first*. **Note**: All jobs are still included; this modifier only affects
  58. the numbering of the given jobs. Finally, **+p** can be used to set the tag *width* of the format
  59. used in naming jobs. For instance, name_000010.grd has a tag width of 6. By default, this is
  60. automatically set but if you are splitting large jobs across several computers (via **+s**) then you
  61. must use the same tag width for all names.
  62. Optional Arguments
  63. ------------------
  64. .. _-I:
  65. **-I**\ *includefile*
  66. Insert the contents of *includefile* into the batch_init.sh script that is accessed by all batch scripts.
  67. This mechanism is used to add information (typically constant variable assignments) that the *mainscript*
  68. and any optional **-S** scripts can rely on.
  69. .. _-M:
  70. **-M**\ [*job*]
  71. Instead of making and launching the full processing sequence, select a single master job [0] for testing.
  72. The master job will be run and its product(s) are placed in the top directory. While any *preflight* script
  73. will be run prior to the master job, the *postflight* script will not be executed (but it will be created).
  74. .. _-Q:
  75. **-Q**\ [**s**]
  76. Debugging: Leave all files and directories we create behind for inspection. Alternatively, append **s** to
  77. only build the batch scripts but *not* perform any executions. One exception involves the optional
  78. *preflight* script derived from **-Sb** which is always executed since it may produce data needed when
  79. building the main batch (or master) scripts.
  80. .. _-Sb:
  81. **-Sb**\ *preflight*
  82. The optional GMT modern mode *preflight* (written in the same scripting language as *mainscript*) can be
  83. used to download or copy data files or create files (such as *timefile*) that will be needed by *mainscript*.
  84. It is always run **b**\ efore the main sequence of batch scripts.
  85. .. _-Sf:
  86. **-Sf**\ *postflight*
  87. The optional *postflight* (written in the same scripting language as *mainscript*) can be
  88. used to perform final processing steps **f**\ ollowing the completion of all the individual jobs, such as
  89. assembling all the products into a single larger file. The script may also make one or more illustrations
  90. using the products or stacked data after the main processing is completed. It does not have to be a GMT
  91. script.
  92. .. _batch-V:
  93. .. |Add_-V| unicode:: 0x20 .. just an invisible code
  94. .. include:: explain_-V.rst_
  95. .. _-W:
  96. **-W**\ [*workdir*]
  97. By default, all temporary files and job products are created in the subdirectory *prefix* set via **-N**.
  98. You can override that selection by giving another *workdir* as a relative or full directory path. If no
  99. path is given then we create a working directory in the system temp folder named *prefix*. The main benefit
  100. of a working directory is to avoid endless syncing by agents like DropBox or TimeMachine, or to avoid
  101. problems related to low space in the main directory. The product files will still be placed in the *prefix*
  102. directory. The *workdir* is removed unless **-Q** is specified for debugging.
  103. .. _-Z:
  104. **-Z**
  105. Erase the *mainscript* and all input scripts given via **-I** and **-S** upon completion. Not compatible
  106. with **-Q**.
  107. .. _-cores:
  108. **-x**\ [[-]\ *n*]
  109. Limit the number of cores to use when loading up the cores.
  110. By default we try to use all available cores. Append *n* to only use *n* cores
  111. (if too large it will be truncated to the maximum cores available). Finally,
  112. give a negative *n* to select (all - *n*) cores (or at least 1 if *n* equals or exceeds all).
  113. The parallel processing does not depend on OpenMP; new jobs are launched when the previous ones
  114. complete.
  115. .. include:: explain_help.rst_
  116. Parameters
  117. ----------
  118. Several parameters are automatically assigned and can be used when composing the *mainscript* and the
  119. optional *preflight* and *postflight* scripts. There are two sets of parameters: Those that are constants
  120. and those that change with the job number. The constants are accessible by all the scripts:
  121. **BATCH_PREFIX**\ : The common prefix of the batch jobs (it is set with **-N**). **BATCH_NJOBS**\ : The
  122. total number of jobs (given or inferred from **-T**). Also, if **-I** was used then any static parameters
  123. listed therein will be available to all the scripts as well. In addition, the *mainscript* also has access
  124. to parameters that vary with the job counter: **BATCH_JOB**\ : The current job number (an integer, e.g., 136),
  125. **BATCH_TAG**\ : The formatted job number given the precision (a string, e.g., 000136), and **BATCH_NAME**\ :
  126. The name prefix unique to the current job (i.e., *prefix*\ _\ **BATCH_TAG**), Furthermore, if a *timefile*
  127. was given then variables **BATCH_COL0**\ , **BATCH_COL1**\ , etc. are also set, yielding one variable per
  128. column in *timefile*. If *timefile* has trailing text then that text can be accessed via the variable
  129. **BATCH_TEXT**, and if word-splitting was explicitly requested by **+w** modifier to **-T** then the trailing
  130. text is also split into individual word parameters **BATCH_WORD0**\ , **BATCH_WORD1**\ , etc. **Note**: Any
  131. product(s) made by the processing scripts should be named using **BATCH_NAME** as their name prefix as these
  132. will be automatically moved up to the starting directory upon completion.
  133. Data Files
  134. ----------
  135. The batch scripts will be able to find any files present in the starting directory when **batch** was initiated,
  136. as well as any new files produced by *mainscript* or the optional scripts set via **-S**.
  137. No path specification is needed to access these files. Other files may
  138. require full paths unless their directories were already included in the :term:`DIR_DATA` setting.
  139. Constructing the Main Script
  140. ----------------------------
  141. A batch sequence is not very interesting if nothing changes between calls. For the process to change you need
  142. to have your *mainscript* either access a *different* data set as the job number changes, or you need to access
  143. only a varying *subset* of a data set, or the processing parameters need to change, or all of the above. There
  144. are several strategies you can use to accomplish these effects:
  145. #. Your *timefile* passed to **-T** may list names of specific data files and you simply have your *mainscript*
  146. use the relevant **BATCH_TEXT** or **BATCH_WORD?** to access the job-specific file name.
  147. #. You have a 3-D grid (or a stack of 2-D grids) and you want to interpolate along the axis perpendicular to the
  148. 2-D slices (e.g., time, or it could be depth). In this situation you will use the module :doc:`grdinterpolate`
  149. to have the *mainscript* obtain a slice for the correct time (this may be an interpolation between two different
  150. times or depths) and process this temporary grid file.
  151. #. You may be creating data on the fly using :doc:`gmtmath` or :doc:`grdmath`, or perhaps processing data slightly
  152. differently per job (using parameters in the *timefile*) and computing these or the changes between jobs.
  153. #. Use your imagination to pass whatever arguments are needed via *timefile*.
  154. Technical Details
  155. -----------------
  156. The **batch** module creates several hidden script files that are used in the generation of the products
  157. (here we have left the script file extension off since it depends on the scripting language used): *batch_init*
  158. (initializes variables related to the overall batch job and includes the contents of the optional *includefile*),
  159. *batch_preflight* (optional since it derives from **-Sb** and computes or prepares needed data files), *batch_postflight*
  160. (optional since it derives from **-Sf** and processes files once all the batch job complete), *batch_job*
  161. (accepts a job counter argument and processes data for those parameters), and *batch_cleanup* (removes temporary
  162. files at the end of the process). For each job, there is a separate *batch_params_######* script that provides
  163. job-specific variables (e.g., job number and anything given via **-T**). The *preflight* and *postflight* scripts
  164. have access to the information in *batch_init*, while the *batch_job* script in addition has access to the job-specific
  165. parameter file. Using the **-Q** option will just produce these scripts which you can then examine.
  166. **Note**: The *mainscript* is duplicated per job and many of these are run simultaneously on all available cores.
  167. Multi-treaded GMT modules will therefore be limited to a single core per call. Because we do not know how
  168. many products each batch job makes, we ensure each job creates a unique file when it is finished. Checking for
  169. these special (and empty) files is how **batch** learns that a particular job has completed and it is time to
  170. launch another one.
  171. Hints for Batch Makers
  172. ----------------------
  173. Composing batch jobs is relatively simple, but you have to think in terms of *variables*. Examine the examples
  174. we describe. Then, start by making a single script (i.e., your *mainscript*) and identify which
  175. things should change with time (i.e., with the job number). Create variables for these values. If they
  176. are among the listed parameters that **batch** creates automatically then use those names. Unless you only
  177. require the job number you will need to make a file that you can pass via **-T**. This file should
  178. then have all the values you need, per job (i.e., per row), with values across all the columns you need.
  179. If you need to assign various *fixed* variables that do not change with time, then your *mainscript*
  180. will look shorter and cleaner if you offload those assignments to a separate *includefile* (via **-I**).
  181. To test your *mainscript*, start by using options **-Q -M** to ensure that your master job results are correct.
  182. The **-M** option simply runs one job of your batch sequence (you can select which one via the **-M**
  183. arguments [0]). Fix any issues with your use of variables and options until this works. You can then try
  184. to remove **-Q**. We recommend you make a very short (i.e., via **-T**) and small batch sequence so you don't
  185. have to wait very long to see the result. Once things are working you can beef up number of jobs.
  186. Examples
  187. --------
  188. We extract a subset of bathymetry for the Gulf of Guinea from the 2x2 arc minute resolution Earth DEM and compute
  189. Gaussian filtered high-pass grids using filter widths ranging from 10 to 200 km in steps of 10 km. When the grids
  190. are all completed we determine the standard deviation in the result. To replicate our setup, try::
  191. cat << EOF > pre.sh
  192. gmt begin
  193. gmt math -o0 -T10/200/10 T = widths.txt
  194. gmt grdcut -R-10/20/-10/20 @earth_relief_02m -Gdata.grd
  195. gmt end
  196. EOF
  197. cat << EOF > main.sh
  198. gmt begin
  199. gmt grdfilter data.grd -Fg\${BATCH_COL0}+h -G\${BATCH_NAME}.grd -D2
  200. gmt end
  201. EOF
  202. cat << EOF > post.sh
  203. gmt begin \${BATCH_PREFIX} pdf
  204. gmt grdmath \${BATCH_PREFIX}_*.grd -S STD = \${BATCH_PREFIX}_std.grd
  205. gmt grdimage \${BATCH_PREFIX}_std.grd -B -B+t"STD of Gaussians residuals" -Chot
  206. gmt coast -Wthin,white
  207. gmt end show
  208. EOF
  209. gmt batch main.sh -Sbpre.sh -Sfpost.sh -Twidths.txt -Nfilter -V -Z
  210. Of course, the syntax of how variables are used vary according to the scripting language. Here, we actually
  211. build the pre.sh, main.sh, and post.sh scripts on the fly, hence we need to escape any variables (since they
  212. start with a dollar sign that we need to be written verbatim). At the end of the execution we find 20 grids
  213. (e.g., such as filter_07.grd), as well as the filter_std.grd file obtained by stacking all the individual
  214. scripts and computing a standard deviation. The information needed to do all of this is hidden from the user;
  215. the actual batch scripts that we execute are derived from the user-provided main.sh script and **batch***
  216. supplies the extra machinery. The **batch** module automatically manages the parallel execution loop over all
  217. jobs using all available cores and launches new jobs as old ones complete.
  218. As another example, we get a list of all European countries and make a simple coast plot of each of them,
  219. placing their name in the title and the 2-character ISO code in the upper left corner, then in postflight
  220. we combine all the individual PDFs into a single file and delete them::
  221. cat << EOF > pre.sh
  222. gmt begin
  223. gmt coast -E=EU+l > countries.txt
  224. gmt end
  225. EOF
  226. cat << EOF > main.sh
  227. gmt begin \${BATCH_NAME} pdf
  228. gmt coast -R\${BATCH_WORD0}+r2 -JQ10c -Glightgray -Slightblue -B -B+t"\${BATCH_WORD1}" -E\${BATCH_WORD0}+gred+p0.5p
  229. echo \${BATCH_WORD0} | gmt text -F+f16p+jTL+cTL -Gwhite -W1p
  230. gmt end
  231. EOF
  232. cat << EOF > post.sh
  233. gs -dQUIET -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=\${BATCH_PREFIX}.pdf -dBATCH \${BATCH_PREFIX}_*.pdf
  234. rm -f \${BATCH_PREFIX}_*.pdf
  235. EOF
  236. gmt batch main.sh -Sbpre.sh -Sfpost.sh -Tcountries.txt+w"\t" -Ncountries -V -W -Zs
  237. Here, the postflight script is not even a GMT script; it simply runs gs and deletes what we don't want.
  238. See Also
  239. --------
  240. :doc:`gmt`,
  241. :doc:`gmtmath`,
  242. :doc:`grdinterpolate`,
  243. :doc:`grdmath`
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...