Title: | Threshold Selection and Uncertainty for Extreme Value Analysis |
---|---|
Description: | Provides functions for the selection of thresholds for use in extreme value models, based mainly on the methodology in Northrop, Attalides and Jonathan (2017) <doi:10.1111/rssc.12159>. It also performs predictive inferences about future extreme values, based either on a single threshold or on a weighted average of inferences from multiple thresholds, using the 'revdbayes' package <https://cran.r-project.org/package=revdbayes>. At the moment only the case where the data can be treated as independent identically distributed observations is considered. |
Authors: | Paul J. Northrop [aut, cre, cph], Nicolas Attalides [aut] |
Maintainer: | Paul J. Northrop <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.6 |
Built: | 2024-12-19 05:39:17 UTC |
Source: | https://github.com/paulnorthrop/threshr |
Provides functions for the selection of extreme value threshold. At the moment only the simplest case, where the data can be treated as independent identically distributed observations, is considered. Future releases will tackle more general situations. See the 'threshr' website for more information, documentation and examples.
The main function in the threshr package is ithresh
,
which uses leave-one-out cross-validation in a Bayesian setup to compare
the predictive ability resulting from the use of each of a user-supplied
set of thresholds.
See vignette("threshr-vignette", package = "threshr")
for an
overview of the package.
Maintainer: Paul J. Northrop [email protected] [copyright holder]
Authors:
Nicolas Attalides
Northrop, P. J. (2017). revdbayes: Ratio-of-Uniforms Sampling for Bayesian Extreme Value Analysis. R package version 1.2.1. https://cran.r-project.org/package=revdbayes.
Northrop, P. J., Attalides, N. and Jonathan, P. (2017) Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society Series C: Applied Statistics, 66(1), 93-120. doi:10.1111/rssc.12159
The packages revdbayes
and
rust
.
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
A numeric vector containing 315 hindcasts of storm peak significant wave heights, in metres, from 1900 to 2005 at an unnamed location in the Gulf of Mexico.
gom
gom
A vector containing 315 observations.
Oceanweather Inc. (2005) GOMOS – Gulf of Mexico hindcast study.
Northrop, P. J., N. Attalides, and P. Jonathan. (2017). Cross-Validatory Extreme Value Threshold Selection and Uncertainty with Application to Ocean Storm Severity. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(1), 93-120. doi:10.1111/rssc.12159.
Produces a diagnostic plot to assist in the selection of an extreme value threshold in the case where the data can be treated as independent and identically distributed (i.i.d.) observations. For example, it could be that these observations are the cluster maxima resulting from the declustering of time series data. The predictive ability of models fitted using each of a user-supplied set of thresholds is assessed using leave-one-out cross-validation in a Bayesian setup. These models are based on a Generalized Pareto (GP) distribution for threshold excesses and a binomial model for the probability of threshold exceedance. See Northrop et al. (2017) for details.
ithresh(data, u_vec, ..., n_v = 1, npy = NULL, use_rcpp = TRUE)
ithresh(data, u_vec, ..., n_v = 1, npy = NULL, use_rcpp = TRUE)
data |
A numeric vector of observations. Any missing values will
be removed. The argument |
u_vec |
A numeric vector. A vector of training thresholds
at which inferences are made from a binomial-GP model. These could be
set at sample quantiles of |
... |
Further (optional) arguments to be passed to the
|
n_v |
A numeric scalar.
Each of the |
npy |
A numeric scalar. The mean number of observations per year
of data, after excluding any missing values, i.e. the number of
non-missing observations divided by total number of years of non-missing
data. May be supplied using as an attribute The value of |
use_rcpp |
A logical scalar. If |
For a given threshold in u_vec
:
the number of values in data
that exceed the threshold,
and the amounts (the threshold excesses) by which these value
exceed the threshold are calculated;
rpost_rcpp
(or rpost
) is used to sample from the posterior
distributions of the parameters of a GP model for the threshold
excesses and a binomial model for the probability of threshold
exceedance;
the ability of this binomial-GP model to predict data
thresholded at the validation threshold(s) specified by n_v
is
assessed using leave-one-out cross-validation (the measure of
this is given in equation (7) of Northrop et al. (2017).
See Northrop et al. (2017) and the introductory threshr vignette for further details and examples.
An object (list) of class "ithresh"
, containing the
components
pred_perf
: A numeric matrix with length(u_vec)
rows and n_v
columns. Each column contains the values of
the measure of predictive performance. Entries corresponding
to cases where the training threshold is above the validation
threshold will be NA
.
u_vec
: The argument u_vec
to ithresh
.
v_vec
: A numeric vector. The validation thresholds
implied by the argument n_v
to ithresh
.
u_ps
: A numeric vector. The approximate levels of the
sample quantiles to which the values in u_vec
correspond,
i.e. the approximate percentage of the data the lie at or below
each element in u_vec
.
v_ps
: A numeric vector. The values in u_ps
that correspond to the validation thresholds.
sim_vals
: A numeric matrix with 4 columns and
n
x length(u_vec)
rows. The th block of
n
rows contains in columns 1-3 the posterior samples of
the threshold exceedance probability, the GP scale
parameter and the GP shape parameter respectively,
based on training threshold u_vec[i]
,
and in column 4 the value of .
n
: A numeric scalar. The value of n
.
npy
: A numeric scalar. The value of npy
.
data
: The argument data
to ithresh
detailed above, with any missing values removed.
use_rcpp
: A logical scalar indicating whether
rpost_rcpp
(use_rcpp = TRUE
) or
rpost
(use_rcpp = FALSE
)
was used for posterior simulation.
for_post
: A list containing arguments with which
rpost_rcpp
(or rpost
) was called, including
any user-supplied arguments to these functions.
call:
The call to ithresh
.
Northrop, P.J. and Attalides, N. (2016) Posterior propriety in Bayesian extreme value analyses using reference priors Statistica Sinica, 26(2), 721–743 doi:10.5705/ss.2014.034.
Northrop, P. J., Attalides, N. and Jonathan, P. (2017) Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society Series C: Applied Statistics, 66(1), 93-120. doi:10.1111/rssc.12159
Jonathan, P. and Ewans, K. (2013) Statistical modelling of extreme ocean environments for marine design : a review. Ocean Engineering, 62, 91-109. doi:10.1016/j.oceaneng.2013.01.004
plot.ithresh
for the S3 plot method for objects of
class ithresh
.
summary.ithresh
Summarizing measures of threshold
predictive performance.
predict.ithresh
for predictive inference for the
largest value observed in N years.
rpost
in the
revdbayes
package for details of the arguments
that can be passed to
rpost_rcpp
/rpost
.
set_prior
and
set_bin_prior
in the
revdbayes
package for details of how to set a
prior distributions for GP parameters and for the exceedance probability
.
# Note: # 1. Smoother plots result from making n larger than the default n = 1000. # 2. In some examples below validation thresholds rather higher than is # advisable have been used, with far fewer excesses than the minimum of # 50 suggested by Jonathan and Ewans (2013). ## North Sea significant wave heights, default prior ----------------------- #' # A plot akin to the top left of Figure 7 in Northrop et al. (2017) #' # ... but with fewer training thresholds u_vec_ns <- quantile(ns, probs = seq(0.1, 0.9, by = 0.1)) ns_cv <- ithresh(data = ns, u_vec = u_vec_ns, n_v = 2) plot(ns_cv, lwd = 2, add_legend = TRUE, legend_pos = "topright") mtext("significant wave height / m", side = 3, line = 2.5) ## Gulf of Mexico significant wave heights, default prior ------------------ u_vec_gom <- quantile(gom, probs = seq(0.2, 0.9, by = 0.1)) # Setting a prior using its name and parameter value(s) -------------------- # This example gives the same prior as the default gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = "mdi", h_prior = list(a = 0.6)) ## Setting a user-defined (log-)prior R function --------------------------- # This example also gives the same prior as the default # (It will take longer to run than the example above because ithresh detects # that the prior is an R function and sets use_rcpp to FALSE.) user_prior <- function(pars, a, min_xi = -1) { if (pars[1] <= 0 | pars[2] < min_xi) { return(-Inf) } return(-log(pars[1]) - a * pars[2]) } user_bin_prior <- function(p, ab) { return(stats::dbeta(p, shape1 = ab[1], shape2 = ab[2], log = TRUE)) } gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = user_prior, h_prior = list(a = 0.6), bin_prior = user_bin_prior, h_bin_prior = list(ab = c(1 / 2, 1 / 2))) ## Setting a user-defined (log-)prior (pointer to a) C++ function ---------- # We make use of a C++ function and function create_prior_xptr() to create # the required pointer from the revdbayes package prior_ptr <- revdbayes::create_prior_xptr("gp_flat") gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = prior_ptr, h_prior = list(min_xi = -1))
# Note: # 1. Smoother plots result from making n larger than the default n = 1000. # 2. In some examples below validation thresholds rather higher than is # advisable have been used, with far fewer excesses than the minimum of # 50 suggested by Jonathan and Ewans (2013). ## North Sea significant wave heights, default prior ----------------------- #' # A plot akin to the top left of Figure 7 in Northrop et al. (2017) #' # ... but with fewer training thresholds u_vec_ns <- quantile(ns, probs = seq(0.1, 0.9, by = 0.1)) ns_cv <- ithresh(data = ns, u_vec = u_vec_ns, n_v = 2) plot(ns_cv, lwd = 2, add_legend = TRUE, legend_pos = "topright") mtext("significant wave height / m", side = 3, line = 2.5) ## Gulf of Mexico significant wave heights, default prior ------------------ u_vec_gom <- quantile(gom, probs = seq(0.2, 0.9, by = 0.1)) # Setting a prior using its name and parameter value(s) -------------------- # This example gives the same prior as the default gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = "mdi", h_prior = list(a = 0.6)) ## Setting a user-defined (log-)prior R function --------------------------- # This example also gives the same prior as the default # (It will take longer to run than the example above because ithresh detects # that the prior is an R function and sets use_rcpp to FALSE.) user_prior <- function(pars, a, min_xi = -1) { if (pars[1] <= 0 | pars[2] < min_xi) { return(-Inf) } return(-log(pars[1]) - a * pars[2]) } user_bin_prior <- function(p, ab) { return(stats::dbeta(p, shape1 = ab[1], shape2 = ab[2], log = TRUE)) } gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = user_prior, h_prior = list(a = 0.6), bin_prior = user_bin_prior, h_bin_prior = list(ab = c(1 / 2, 1 / 2))) ## Setting a user-defined (log-)prior (pointer to a) C++ function ---------- # We make use of a C++ function and function create_prior_xptr() to create # the required pointer from the revdbayes package prior_ptr <- revdbayes::create_prior_xptr("gp_flat") gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 2, prior = prior_ptr, h_prior = list(min_xi = -1))
A numeric vector containing 628 hindcasts of storm peak significant wave heights, in metres, from 1964 to 1995 at an unnamed location in the North Sea.
ns
ns
A vector containing 628 observations.
Oceanweather Inc. (1995) NEXT – North Sea hindcast study.
Northrop, P. J., N. Attalides, and P. Jonathan. (2017). Cross-Validatory Extreme Value Threshold Selection and Uncertainty with Application to Ocean Storm Severity. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(1), 93-120. doi:10.1111/rssc.12159.
plot
method for class "ithresh"
. Produces an extreme value
threshold diagnostic plot based on an analysis performed by
ithresh
. Can also be used to produce a plot of
the posterior sample generated by ithresh
for a particular
training threshold.
## S3 method for class 'ithresh' plot( x, y, ..., which_v = NULL, prob = TRUE, top_scale = TRUE, add_legend = FALSE, legend_pos = "topleft", which_u = NULL )
## S3 method for class 'ithresh' plot( x, y, ..., which_v = NULL, prob = TRUE, top_scale = TRUE, add_legend = FALSE, legend_pos = "topleft", which_u = NULL )
x |
an object of class |
y |
Not used. |
... |
Additional arguments passed on to |
which_v |
A numeric scalar or vector. If If |
prob |
A logical scalar. If |
top_scale |
A logical scalar indicating Whether or not to add a scale
to the top horizontal axis. If this is added it gives the threshold on
the scale not chosen by |
add_legend |
A logical scalar indicating whether or not to add a
legend to the plot. If |
legend_pos |
The position of the legend (if required) specified using
the argument |
which_u |
Either a character scalar or a numeric scalar.
If If Otherwise, |
Produces plots of the threshold weights, defined in
equation (14) of Northrop et al. (2017) against training threshold. A line
is produced for each of the validation thresholds chosen in which_v
.
The result is a plot like those in the top row of Figure 7 in
Northrop et al. (2017).
It is possible that a curve on the plot may be incomplete. This indicates
that, for a particular threshold level, a measure of predictive
performance is -Inf
. This occurs when an observation in the data
lies above the estimated upper end point of the predictive distribution
produced when this observation is removed.
If which_u
is supplied then the object with which
plot.evpost
was called is returned (invisibly).
Otherwise, a list is returned (again invisibly) with two components.
x
is a vector containing the coordinates plotted on the
(lower) horizontal axis.
y
is an length(u_vec)
by n_v
matrix of
threshold weights obtained by normalising the columns of the
matrix pred_perf
returned by ithresh
.
See equation (14) of Northrop et al. (2017).
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
summary.ithresh
Summarizing measures of threshold
predictive performance.
print.ithresh
Prints the threshold weights.
predict.ithresh
for predictive inference for the
largest value observed in N years.
# [Smoother plots result from making n larger than the default n = 1000.] # Threshold diagnostic plot u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) plot(gom_cv, lwd = 2, add_legend = TRUE, legend_pos = "topleft") mtext("significant wave height / m", side = 3, line = 2.5) # Plot of Generalized Pareto posterior sample at the best threshold # (based on the lowest validation threshold) plot(gom_cv, which_u = "best") # See which threshold was used summary(gom_cv) # Plot of Generalized Pareto posterior sample at the highest threshold n_u <- length(u_vec_gom) plot(gom_cv, which_u = n_u, points_par = list(pch = 20, col = "grey"))
# [Smoother plots result from making n larger than the default n = 1000.] # Threshold diagnostic plot u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) plot(gom_cv, lwd = 2, add_legend = TRUE, legend_pos = "topleft") mtext("significant wave height / m", side = 3, line = 2.5) # Plot of Generalized Pareto posterior sample at the best threshold # (based on the lowest validation threshold) plot(gom_cv, which_u = "best") # See which threshold was used summary(gom_cv) # Plot of Generalized Pareto posterior sample at the highest threshold n_u <- length(u_vec_gom) plot(gom_cv, which_u = n_u, points_par = list(pch = 20, col = "grey"))
plot
method for class "ithreshpred"
. Produces plots to
summarise the predictive inferences made by predict.ithresh
.
## S3 method for class 'ithreshpred' plot(x, ..., ave_only = FALSE, add_best = FALSE)
## S3 method for class 'ithreshpred' plot(x, ..., ave_only = FALSE, add_best = FALSE)
x |
an object of class |
... |
Additional arguments passed on to
|
ave_only |
A logical scalar. Only relevant if
|
add_best |
A logical scalar. If |
Single threshold case, where
predict.ithresh
was called with numeric scalar
which_u
or which_u = "best"
.
plot.evpred
is called to produce the plot.
Multiple threshold case, where
predict.ithresh
was called with which_u = "all"
.
Again, plot.evpred
is called but now the
estimated predictive distribution function (type = "p"
used
in the call to predict.ithresh
) or density function
(type = "d"
) is plotted for each of the training thresholds
(grey lines) as is the result of the weighted average over the
different training thresholds (black line).
If graphical parameters, such as lty
, lwd
or col
are passed via ...
then the first element relates to the
weighted average and the remaining length(x$u_vec)
elements to
the respective training thresholds in u_vec
.
A list containing the graphical parameters using in producing the plot including any arguments supplied via ... is returned (invisibly).
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
predict.ithresh
for predictive inference for the
largest value observed in N years.
plot.ithresh
for the S3 plot method for objects of
class ithresh
.
summary.ithresh
Summarizing measures of threshold
predictive performance.
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) # Note: gom_cv$npy contains the correct value of npy (it was set in the # call to ithresh, via attr(gom, "npy"). # If object$npy doesn't exist then the argument npy must be supplied # in the call to predict(). ### Best training threshold based on the lowest validation threshold # Predictive distribution function npy_gom <- length(gom)/105 best_p <- predict(gom_cv, n_years = c(100, 1000)) plot(best_p) # Predictive density function best_d <- predict(gom_cv, type = "d", n_years = c(100, 1000)) plot(best_d) ### All thresholds plus weighted average of inferences over all thresholds # Predictive distribution function all_p <- predict(gom_cv, which_u = "all") plot(all_p) # Predictive density function all_d <- predict(gom_cv, which_u = "all", type = "d") plot(all_d) ### ... and highlight the best threshold plot(all_p, add_best = TRUE) plot(all_d, add_best = TRUE)
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) # Note: gom_cv$npy contains the correct value of npy (it was set in the # call to ithresh, via attr(gom, "npy"). # If object$npy doesn't exist then the argument npy must be supplied # in the call to predict(). ### Best training threshold based on the lowest validation threshold # Predictive distribution function npy_gom <- length(gom)/105 best_p <- predict(gom_cv, n_years = c(100, 1000)) plot(best_p) # Predictive density function best_d <- predict(gom_cv, type = "d", n_years = c(100, 1000)) plot(best_d) ### All thresholds plus weighted average of inferences over all thresholds # Predictive distribution function all_p <- predict(gom_cv, which_u = "all") plot(all_p) # Predictive density function all_d <- predict(gom_cv, which_u = "all", type = "d") plot(all_d) ### ... and highlight the best threshold plot(all_p, add_best = TRUE) plot(all_d, add_best = TRUE)
plot
method for objects of class "stability"
returned from
stability
## S3 method for class 'stability' plot( x, y, ..., prob = TRUE, top_scale = c("none", "excesses", "opposite"), vertical = TRUE )
## S3 method for class 'stability' plot( x, y, ..., prob = TRUE, top_scale = c("none", "excesses", "opposite"), vertical = TRUE )
x |
an object of class |
y |
Not used. |
... |
Additional arguments passed on to
|
prob |
A logical scalar. If |
top_scale |
A character scalar.
If |
vertical |
A logical scalar. Should the confidence intervals be
depicted using a vertical line for each threshold ( |
Produces a simple threshold diagnostic plot based on the object
returned from stability
.
The MLEs of the GP shape parameter $$ and
approximate
conf
% confidence intervals
for are plotted against the threshold used to fit the GP model.
This plot is used to choose a threshold above which the underlying GP
shape parameter may be approximately constant. See Chapter 4 of
Coles (2001). See also the vignette "Introducing threshr".
as described in .
See also the vignette "Introducing threshr".
In addition to producing the plot a list of the arguments used
by matplot
, axis
is
returned (invisibly).
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_stab <- stability(data = gom, u_vec = u_vec_gom) plot(gom_stab)
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_stab <- stability(data = gom, u_vec = u_vec_gom) plot(gom_stab)
predict
method for class "ithresh"
. Predictive inferences can
either be based on a single training threshold or using a weighted
average of inferences over multiple training thresholds. A single
threshold may chosen to be the best performing threshold, as judged by the
measure of predictive performance calculated by ithresh
or
chosen by the user. The weights used in the latter case are based on the
measures of predictive performance and prior probabilities assigned to the
training thresholds. By default all thresholds are given the same
prior probability but the user can specify their own prior.
## S3 method for class 'ithresh' predict( object, npy = NULL, n_years = 100, which_u = c("best", "all"), which_v = 1L, u_prior = rep(1, length(object$u_vec)), type = c("p", "d", "q", "i", "r"), hpd = FALSE, x = NULL, ... )
## S3 method for class 'ithresh' predict( object, npy = NULL, n_years = 100, which_u = c("best", "all"), which_v = 1L, u_prior = rep(1, length(object$u_vec)), type = c("p", "d", "q", "i", "r"), hpd = FALSE, x = NULL, ... )
object |
An object of class |
npy |
A numeric scalar. The mean number of observations per year of data, after excluding any missing values, i.e. the number of non-missing observations divided by total number of years of non-missing data. |
n_years |
A numeric vector. Value(s) of N. If |
which_u |
Either a character scalar or a numeric scalar.
If If If Otherwise, |
which_v |
A numeric scalar. Indicates which element of
|
u_prior |
A numeric vector. Prior probabilities for the training
thresholds in Only the first
If |
type |
A character vector.
Passed to
If |
hpd |
A logical scalar. The argument |
x |
A numeric vector. The argument |
... |
Additional arguments to be passed to
|
The function predict.evpost
is used to
perform predictive based on the binomial-GP posterior sample generated
using a given training threshold. For mathematical details of the
single threshold and multiple threshold cases see Sections 2.3 and 3 of
Northrop et al. (2017) respectively.
An list object of class "ithreshpred"
with a similar
structure to an object of class "evpred"
returned from
predict.evpost
is returned invisibly.
In addition, the object contains
u_vec = object$u_vec
and v_vec = object$v_vec
,
which_v
and the index best_u
in
u_vec = object$u_vec
of the best training threshold based on
the value of which_v
.
It also contains the value of the Box-Cox transformation parameter
lambda
. This will always be equal to 1 if object
was
returned from ithresh
.
If which_u = "all"
then
the list also contains the posterior threshold weights
in component post_thresh_wts
the component y
is a matrix with length{x}
rows
and 1 + length(object$u_vec) - length(object$v_vec) + which_v
columns. Column 1 contains the estimated predictive distribution
function (type = "p"
) or density function (type = "d"
)
obtained using a weighted average of the inferences over different
training thresholds. The other columns contain the estimated
functions for each of the training thresholds in u_vec
.
Northrop, P. J., Attalides, N. and Jonathan, P. (2017) Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society Series C: Applied Statistics, 66(1), 93-120. doi:10.1111/rssc.12159
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
plot.ithreshpred
for the S3 plot method for objects
of class ithreshpred
.
# Note: #' In the examples below validation thresholds rather higher than is # advisable have been used, with far fewer excesses than the minimum of # 50 suggested by Jonathan and Ewans (2013). # Gulf of Mexico significant wave heights, default priors. u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) # Note: gom_cv$npy contains the correct value of npy (it was set in the # call to ithresh, via attr(gom, "npy"). # If object$npy doesn't exist then the argument npy must be supplied # in the call to predict(). ### Best training threshold based on the lowest validation threshold # Predictive distribution function best_p <- predict(gom_cv, n_years = c(100, 1000)) plot(best_p) # Predictive density function best_d <- predict(gom_cv, type = "d", n_years = c(100, 1000)) plot(best_d) # Predictive intervals best_i <- predict(gom_cv, n_years = c(100, 1000), type = "i", hpd = TRUE, level = c(95, 99)) plot(best_i, which_int = "both") # See which threshold was used summary(gom_cv) ### All thresholds plus weighted average of inferences over all thresholds # Predictive distribution function all_p <- predict(gom_cv, which_u = "all") plot(all_p) # Predictive density function all_d <- predict(gom_cv, which_u = "all", type = "d") plot(all_d)
# Note: #' In the examples below validation thresholds rather higher than is # advisable have been used, with far fewer excesses than the minimum of # 50 suggested by Jonathan and Ewans (2013). # Gulf of Mexico significant wave heights, default priors. u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) # Note: gom_cv$npy contains the correct value of npy (it was set in the # call to ithresh, via attr(gom, "npy"). # If object$npy doesn't exist then the argument npy must be supplied # in the call to predict(). ### Best training threshold based on the lowest validation threshold # Predictive distribution function best_p <- predict(gom_cv, n_years = c(100, 1000)) plot(best_p) # Predictive density function best_d <- predict(gom_cv, type = "d", n_years = c(100, 1000)) plot(best_d) # Predictive intervals best_i <- predict(gom_cv, n_years = c(100, 1000), type = "i", hpd = TRUE, level = c(95, 99)) plot(best_i, which_int = "both") # See which threshold was used summary(gom_cv) ### All thresholds plus weighted average of inferences over all thresholds # Predictive distribution function all_p <- predict(gom_cv, which_u = "all") plot(all_p) # Predictive density function all_d <- predict(gom_cv, which_u = "all", type = "d") plot(all_d)
"ithresh"
print
method for class "ithresh"
.
## S3 method for class 'ithresh' print(x, digits = 2, ...)
## S3 method for class 'ithresh' print(x, digits = 2, ...)
x |
an object inheriting from class |
digits |
An integer. Used for number formatting with
|
... |
Additional optional arguments. At present no optional arguments are used. |
Prints a matrix of the estimated threshold weights. Each row gives the weights for each training threshold for a given validation threshold. The row and column names are the approximate quantile levels of the thresholds.
The argument x
, invisibly, as for all
print
methods.
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
summary.ithresh
Summarizing measures of threshold
predictive performance.
plot.ithresh
for the S3 plot method for objects of
class ithresh
.
predict.ithresh
for predictive inference for the
largest value observed in N years.
Uses maximum likelihood estimation to fit a Generalized Pareto (GP)
model to threshold excesses over a range of thresholds.
The threshold excesses are treated as independent and identically
distributed (i.i.d.) observations.
The resulting estimates and confidence intervals can be plotted,
using plot.stability
,
to produce a crude graphical diagnostic for threshold choice.
stability( data, u_vec, prof = FALSE, conf = 95, mult = 1:2, plot_prof = FALSE, ... )
stability( data, u_vec, prof = FALSE, conf = 95, mult = 1:2, plot_prof = FALSE, ... )
data |
A numeric vector of observations. |
u_vec |
A numeric vector of thresholds to be applied to the data.
Any duplicated values will be removed. These could be set at sample
quantiles of |
prof |
A logical scalar. Whether to calculate confidence intervals
for the GP shape parameter |
conf |
A numeric scalar in (0, 100). Confidence level for the confidence intervals. Default: 95%. |
mult |
A numeric vector of length 2. The range of values over
which the profile log-likelihood for |
plot_prof |
A logical scalar. Only relevant if |
... |
Further (optional) arguments to be passed to the
|
For each threshold in u_vec
a GP model is fitted by maximum
likelihood estimation to the threshold excesses, i.e. the amounts
by which the data exceed that threshold. The MLEs of the GP shape
parameter and approximate
conf
% confidence intervals
for are stored for plotting (by
plot.stability
)
to produce a simple graphical diagnostic to inform threshold selection.
This plot is used to choose a threshold above which the underlying GP
shape parameter may be approximately constant. See Chapter 4 of
Coles (2001). See also the vignette "Introducing threshr".
An object (list) of class "stability"
with components:
ests |
MLEs of the GP shape parameter |
ses |
Estimated SEs of the MLEs of |
lower |
Lower limit of 100 |
upper |
Upper limit of 100 |
nexc |
The number of threshold excesses. |
u_vec |
The thresholds supplied by the user. |
u_ps |
The approximate sample quantiles to which the thresholds
in |
data |
The input |
conf |
The input |
Each of these components is a numeric vector of length
length(u_vec)
.
Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. doi:10.1007/978-1-4471-3675-0_3
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
plot.stability
for the S3 plot
method for
objects of class stability
.
# Set a vector of thresholds u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) # Symmetric confidence intervals gom_stab <- stability(data = gom, u_vec = u_vec_gom) plot(gom_stab) # Profile-likelihood-based confidence intervals gom_stab <- stability(data = gom, u_vec = u_vec_gom, prof = TRUE) plot(gom_stab)
# Set a vector of thresholds u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) # Symmetric confidence intervals gom_stab <- stability(data = gom, u_vec = u_vec_gom) plot(gom_stab) # Profile-likelihood-based confidence intervals gom_stab <- stability(data = gom, u_vec = u_vec_gom, prof = TRUE) plot(gom_stab)
summary
method for class "ithresh"
## S3 method for class 'ithresh' summary(object, ...)
## S3 method for class 'ithresh' summary(object, ...)
object |
an object of class |
... |
Additional optional arguments. At present no optional arguments are used. |
Returns a numeric matrix with 5 columns and n_v
rows,
where n_v
is an argument to ithresh
that
determines how many of the largest training thresholds are used
a validation thresholds. The columns contain:
column 1: the validation threshold v
column 2: the sample quantile to which the validation threshold corresponds
column 3: the best training threshold u judged using the validation threshold v
column 4: the sample quantile to which the best training threshold corresponds
column 5: the index of the vector u_vec
of training
thresholds to which the threshold in column2 corresponds
ithresh
for threshold selection in the i.i.d. case
based on leave-one-out cross-validation.
plot.ithresh
for the S3 plot method for objects of
class ithresh
.
print.ithresh
Prints the threshold weights.
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) summary(gom_cv)
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05)) gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3) summary(gom_cv)