Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

lake_algorithms.py 8.7 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
  1. """
  2. Custom algorithms for detecting active subglacial lakes and estimating ice
  3. volume displacement over time.
  4. """
  5. try:
  6. import cudf as xpd
  7. except ImportError:
  8. import pandas as xpd
  9. import numpy as np
  10. def find_clusters(
  11. X: xpd.DataFrame,
  12. eps: float = 3000,
  13. min_samples: int = 250,
  14. output_colname: str = "cluster_id",
  15. **kwargs,
  16. ) -> xpd.Series:
  17. """
  18. Classify a point cloud into several groups, with each group being assigned
  19. a positive integer label like 1, 2, 3, etc. Unclassified noise points are
  20. labelled as NaN.
  21. Uses Density-based spatial clustering of applications with noise (DBSCAN).
  22. See also https://www.naftaliharris.com/blog/visualizing-dbscan-clustering
  23. *** ** 111 NN
  24. ** ** * 11 22 N
  25. * **** --> 1 2222
  26. ** ** 33 22
  27. ****** 333333
  28. Parameters
  29. ----------
  30. X : cudf.DataFrame or pandas.DataFrame
  31. A table of X, Y, Z points to run the clustering algorithm on.
  32. eps : float
  33. The maximum distance between 2 points such they reside in the same
  34. neighborhood. Default is 3000 (metres).
  35. min_samples : int
  36. The number of samples in a neighborhood such that this group can be
  37. considered as an important core point (including the point itself).
  38. Default is 250 (sample points).
  39. output_colname : str
  40. The name of the column for the output Series. Default is 'cluster_id'.
  41. kwargs : dict
  42. Extra parameters to pass into the `cuml.cluster.DBSCAN` or
  43. `sklearn.cluster.DBSCAN` function.
  44. Returns
  45. -------
  46. cluster_labels : cudf.Series or pd.Series
  47. Which cluster each datapoint belongs to. Noisy samples are labeled as
  48. NaN.
  49. """
  50. try:
  51. from cuml.cluster import DBSCAN
  52. except ImportError:
  53. from sklearn.cluster import DBSCAN
  54. # Run DBSCAN using {eps} m distance, and minimum of {min_samples} points
  55. dbscan = DBSCAN(eps=eps, min_samples=min_samples, **kwargs)
  56. dbscan.fit(X=X)
  57. cluster_labels = dbscan.labels_ + 1 # noise points -1 becomes 0
  58. if isinstance(cluster_labels, np.ndarray):
  59. cluster_labels = xpd.Series(data=cluster_labels, dtype=xpd.Int32Dtype())
  60. cluster_labels = cluster_labels.mask(cond=cluster_labels == 0) # turn 0 to NaN
  61. cluster_labels.index = X.index # let labels have same index as input data
  62. cluster_labels.name = output_colname
  63. return cluster_labels
  64. def ice_volume_over_time(
  65. df_elev: xpd.DataFrame,
  66. surface_area: float,
  67. rolling_window: int or str = "91d",
  68. elev_col: str = "h_anom",
  69. time_col: str = "utc_time",
  70. outfile: str = None,
  71. ) -> xpd.DataFrame:
  72. """
  73. Generates a time-series of ice volume displacement. Ice volume change (dvol
  74. in m^3) is estimated by multiplying the lake area (in m^2) by the mean
  75. height anomaly (dh in m), following the methodology of Kim et al. 2016 and
  76. Siegfried et al., 2016. Think of it as a multiplying the circle of a
  77. cylinder by it's height:
  78. _~~~~_
  79. _~ ~_ <--- Lake area (m^2)
  80. |~__ __~ | *
  81. | ~~~ | <-- Mean height anomaly (m) from many points inside lake
  82. | _| =
  83. ~__ __~ <--- Ice volume displacement (m^3)
  84. ~~~
  85. More specifically, this function will:
  86. 1) Take a set of height anomaly and time values and calculate a rolling
  87. mean of height change over time.
  88. 2) Multiply the height change over time sequence in (1) by the lake surface
  89. area to obtain an estimate of volume change over time.
  90. Parameters
  91. ----------
  92. df_elev : cudf.DataFrame or pandas.DataFrame
  93. A table of with time and height anomaly columns to run the rolling time
  94. series calculation on. Ensure that the rows are sorted with time in
  95. ascending order from oldest to newest.
  96. surface_area : float
  97. The ice surface area (of the active lake) experiencing a change in
  98. height over time. Recommended to provide in unit metres.
  99. rolling_window : str
  100. Size of the moving window to calculate the rolling mean, given as a
  101. time period. Default is '91d' (91 days = 1 ICESat-2 cycle).
  102. elev_col : str
  103. The elevation anomaly column name to use from the table data, used to
  104. calculate the rolling mean height anomaly at every time interval.
  105. Default is 'h_anom', recommended to provide in unit metres.
  106. time_col : str
  107. The time-dimension column name to use from the table data, used in the
  108. rolling mean algorithm. Default is 'utc_time', ensure that this column
  109. is provided in a datetime64 format.
  110. outfile : str
  111. Optional. Filename to output the time-series data, containing columns
  112. for time, the elevation anomaly (dh in m^2) +/- standard deviation, and
  113. the ice volume displacement (dvol in km^3) +/- standard deviation. Note
  114. that this export requires 'pint' units in the inputs and the
  115. 'uncertainties' package to be installed.
  116. Returns
  117. -------
  118. dvol : cudf.Series or pd.Series
  119. A column of delta volume changes over time. If pint metre units were
  120. provided in df_elev, then this output will be given in cubic metres
  121. with a one standard deviation uncertainty range.
  122. Examples
  123. --------
  124. >>> import pandas as pd
  125. >>> import pint
  126. >>> import pint_pandas
  127. >>> ureg = pint.UnitRegistry()
  128. >>> pint_pandas.PintType.ureg = ureg
  129. >>> h_anom = pd.Series(
  130. ... data=np.random.RandomState(seed=42).rand(100), dtype="pint[metre]"
  131. ... )
  132. >>> utc_time = pd.date_range(
  133. ... start="2018-10-14", end="2020-09-30", periods=100
  134. ... )
  135. >>> df_elev = pd.DataFrame(data={"h_anom": h_anom, "utc_time": utc_time})
  136. >>> dvol = ice_volume_over_time(
  137. ... df_elev=df_elev, surface_area=123 * ureg.metre ** 2
  138. ... )
  139. >>> print(f"{dvol.iloc[len(dvol)//2]:Lx}")
  140. \SI[]{12+/-35}{\meter\cubed}
  141. References:
  142. - Kim, B.-H., Lee, C.-K., Seo, K.-W., Lee, W. S., & Scambos, T. A. (2016).
  143. Active subglacial lakes and channelized water flow beneath the Kamb Ice
  144. Stream. The Cryosphere, 10(6), 2971–2980.
  145. https://doi.org/10.5194/tc-10-2971-2016
  146. - Siegfried, M. R., Fricker, H. A., Carter, S. P., & Tulaczyk, S. (2016).
  147. Episodic ice velocity fluctuations triggered by a subglacial flood in
  148. West Antarctica. Geophysical Research Letters, 43(6), 2640–2648.
  149. https://doi.org/10.1002/2016GL067758
  150. """
  151. # Get just the elevation anomaly and time columns
  152. df_: pd.DataFrame = df_elev[[elev_col, time_col]].copy()
  153. # Detect if pint units are being used
  154. elev_dtype = df_elev[elev_col].dtype
  155. has_pint: bool = "pint" in str(elev_dtype)
  156. if has_pint:
  157. ureg: pint.UnitRegistry = surface_area._REGISTRY
  158. # Calculate rolling mean of elevation
  159. df_roll = df_.rolling(window=rolling_window, on=time_col, min_periods=1)
  160. elev_mean: pd.Series = df_roll[elev_col].mean()
  161. # Calculate elevation anomaly as elevation at time=n minus elevation at time=1
  162. elev_anom: pd.Series = elev_mean - elev_mean[0]
  163. # If pint units are used, add standard deviation uncertainties to mean.
  164. # TO-DO refactor after pint.Measurements overhaul in
  165. # https://github.com/hgrecco/pint/issues/350 is completed so that elev_anom
  166. # can be kept as a pandas.Series instead of having to convert to np.ndarray
  167. if has_pint:
  168. import uncertainties.unumpy
  169. elev_std: pd.Series = df_roll[elev_col].std()
  170. elev_anom: np.ndarray = uncertainties.unumpy.uarray(
  171. nominal_values=elev_anom, std_devs=elev_std
  172. ) * ureg.Unit(elev_dtype.units)
  173. # Calculate ice volume displacement (m^3) = area (m^2) x height (m)
  174. dvol: np.ndarray = surface_area * elev_anom
  175. ice_dvol: pd.Series = df_[elev_col].__class__(
  176. data=dvol,
  177. dtype=f"pint[{dvol.units}]" if has_pint else df_elev[elev_col].dtype,
  178. index=df_.index,
  179. )
  180. # Convert dvol from m**3 to km**3 and save to text file
  181. if outfile and has_pint:
  182. dvol_km3 = ice_dvol.pint.to("kilometre ** 3").pint.magnitude
  183. df_dvol: pd.DataFrame = df_.__class__(
  184. data={
  185. time_col: df_elev[time_col],
  186. "dh": uncertainties.unumpy.nominal_values(arr=elev_anom),
  187. "dh_std": uncertainties.unumpy.std_devs(arr=elev_anom),
  188. "dvol_km3": uncertainties.unumpy.nominal_values(arr=dvol_km3),
  189. "dvol_std": uncertainties.unumpy.std_devs(arr=dvol_km3),
  190. }
  191. )
  192. df_dvol.to_csv(
  193. path_or_buf=outfile,
  194. sep="\t",
  195. index=False,
  196. na_rep="NaN",
  197. date_format="%Y-%m-%dT%H:%M:%S.%fZ",
  198. )
  199. return ice_dvol
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...