Title: | Generalized Additive Extreme Value Models for Location, Scale and Shape |
---|---|
Description: | Fits generalized additive models for the location, scale and shape parameters of a generalized extreme value response distribution. The methodology is based on Rigby, R.A. and Stasinopoulos, D.M. (2005), <doi:10.1111/j.1467-9876.2005.00510.x> and implemented using functions from the 'gamlss' package <doi:10.32614/CRAN.package.gamlss>. |
Authors: | Paul J. Northrop [aut, cre, cph], Jennifer Ji [aut] |
Maintainer: | Paul J. Northrop <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1 |
Built: | 2024-11-13 05:14:12 UTC |
Source: | https://github.com/paulnorthrop/gamlssx |
Fits generalized additive models for the location, scale and shape
parameters of a generalized extreme value response distribution. The
methodology is based on Rigby and Stasinopoulos (2005) and implemented using
functions from the gamlss
package doi:10.32614/CRAN.package.gamlss.
The main function in gamlssx
is fitGEV()
, which calls the
function gamlss::gamlss()
.
See the gamlssx package
page on Github for more information.
Maintainer: Paul J. Northrop [email protected] [copyright holder]
Authors:
Jennifer Ji
Rigby R.A. and Stasinopoulos D.M. (2005). Generalized additive models for location, scale and shape (with discussion), Appl. Statist., 54, part 3, pages 507-554. doi:10.1111/j.1467-9876.2005.00510.x
Describe
fitGEV( formula, data, scoring = c("fisher", "quasi"), mu.link = "identity", sigma.link = "log", xi.link = "identity", stepLength = 1, stepAttempts = 2, stepReduce = 2, steps = FALSE, ... )
fitGEV( formula, data, scoring = c("fisher", "quasi"), mu.link = "identity", sigma.link = "log", xi.link = "identity", stepLength = 1, stepAttempts = 2, stepReduce = 2, steps = FALSE, ... )
formula |
a formula object, with the response on the left of an ~ operator, and the terms, separated by |
data |
a data frame containing the variables occurring in the formula, e.g. |
scoring |
A character scalar. If |
mu.link , sigma.link , xi.link
|
Character scalars to set the respective
link functions for the location ( |
stepLength |
A numeric vector of positive values. The initial
values of the step lengths |
stepAttempts |
A non-negative integer. If the first call to
|
stepReduce |
A number greater than 1. The factor by which the step
lengths in |
steps |
A logical scalar. Pass |
... |
Further arguments passed to
|
See gamlss::gamlss()
for information about
the model and the fitting algorithm.
Returns a gamlss
object. See the Value section of
gamlss::gamlss()
. The class of the returned object is
c("gamlssx", "gamlss", "gam", "glm", "lm")
.
GEV
,
gamlss.dist::gamlss.family()
,
gamlss::gamlss()
# Load gamlss, for the function pb() library(gamlss) ##### Simulated data set.seed(17012023) n <- 100 x <- stats::runif(n) mu <- 1 + 2 * x sigma <- 1 xi <- 0.25 y <- nieve::rGEV(n = 1, loc = mu, scale = sigma, shape = xi) data <- data.frame(y = as.numeric(y), x = x) plot(x, y) # Fit model using the default RS method with Fisher's scoring mod <- fitGEV(y ~ gamlss::pb(x), data = data) # Summary of model fit summary(mod) # Residual diagnostic plots plot(mod, xlab = "x", ylab = "y") # Data plus fitted curve plot(data$x, data$y, xlab = "x", ylab = "y") lines(data$x, fitted(mod)) # Fit model using the mixed method and quasi-Newton scoring # Use trace = FALSE to prevent writing the global deviance to the console mod <- fitGEV(y ~ pb(x), data = data, method = mixed(), scoring = "quasi", trace = FALSE) # Fit model using the CG method # The default step length of 1 needs to be reduced to enable convergence # Use steps = TRUE to write the step lengths to the console mod <- fitGEV(y ~ pb(x), data = data, method = CG(), steps = TRUE) ##### Fremantle annual maximum sea levels ##### See also the gamlssx package README file # Transform Year so that it is centred on 0 fremantle <- transform(fremantle, cYear = Year - median(Year)) # Plot sea level against year and against SOI plot(fremantle$Year, fremantle$SeaLevel, xlab = "year", ylab = "sea level (m)") plot(fremantle$SOI, fremantle$SeaLevel, xlab = "SOI", ylab = "sea level (m)") # Fit a model with P-spline effects of cYear and SOI on location and scale # The default links are identity for location and log for scale mod <- fitGEV(SeaLevel ~ pb(cYear) + pb(SOI), sigma.formula = ~ pb(cYear) + pb(SOI), data = fremantle) # Summary of model fit summary(mod) # Model diagnostic plots plot(mod) # Worm plot wp(mod) # Plot of the fitted component smooth functions # Note: gamlss::term.plot() does not include uncertainty about the intercept # Location mu term.plot(mod, rug = TRUE, pages = 1) # Scale sigma term.plot(mod, what = "sigma", rug = TRUE, pages = 1)
# Load gamlss, for the function pb() library(gamlss) ##### Simulated data set.seed(17012023) n <- 100 x <- stats::runif(n) mu <- 1 + 2 * x sigma <- 1 xi <- 0.25 y <- nieve::rGEV(n = 1, loc = mu, scale = sigma, shape = xi) data <- data.frame(y = as.numeric(y), x = x) plot(x, y) # Fit model using the default RS method with Fisher's scoring mod <- fitGEV(y ~ gamlss::pb(x), data = data) # Summary of model fit summary(mod) # Residual diagnostic plots plot(mod, xlab = "x", ylab = "y") # Data plus fitted curve plot(data$x, data$y, xlab = "x", ylab = "y") lines(data$x, fitted(mod)) # Fit model using the mixed method and quasi-Newton scoring # Use trace = FALSE to prevent writing the global deviance to the console mod <- fitGEV(y ~ pb(x), data = data, method = mixed(), scoring = "quasi", trace = FALSE) # Fit model using the CG method # The default step length of 1 needs to be reduced to enable convergence # Use steps = TRUE to write the step lengths to the console mod <- fitGEV(y ~ pb(x), data = data, method = CG(), steps = TRUE) ##### Fremantle annual maximum sea levels ##### See also the gamlssx package README file # Transform Year so that it is centred on 0 fremantle <- transform(fremantle, cYear = Year - median(Year)) # Plot sea level against year and against SOI plot(fremantle$Year, fremantle$SeaLevel, xlab = "year", ylab = "sea level (m)") plot(fremantle$SOI, fremantle$SeaLevel, xlab = "SOI", ylab = "sea level (m)") # Fit a model with P-spline effects of cYear and SOI on location and scale # The default links are identity for location and log for scale mod <- fitGEV(SeaLevel ~ pb(cYear) + pb(SOI), sigma.formula = ~ pb(cYear) + pb(SOI), data = fremantle) # Summary of model fit summary(mod) # Model diagnostic plots plot(mod) # Worm plot wp(mod) # Plot of the fitted component smooth functions # Note: gamlss::term.plot() does not include uncertainty about the intercept # Location mu term.plot(mod, rug = TRUE, pages = 1) # Scale sigma term.plot(mod, what = "sigma", rug = TRUE, pages = 1)
This is a copy of the fremantle
dataset from the ismev
package.
The fremantle
data frame has 86 rows and 3 columns. The second column
gives 86 annual maximum sea levels recorded at Fremantle, Western
Australia, within the period 1897 to 1989. The first column gives the
corresponding years. The third column gives annual mean values of the
Southern Oscillation Index (SOI), which is a proxy for meteorological
volatility.
fremantle
fremantle
This data frame contains the following:
Year: A numeric vector of years.
SeaLevel: A numeric vector of annual sea level maxima.
SOI: A numeric vector of annual mean values of the Southern Oscillation Index.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
summary(fremantle)
summary(fremantle)
The functions GEVfisher()
and GEVquasi()
each define the generalized
extreme value (GEV) family distribution, a three parameter distribution, for
a gamlss.dist::gamlss.family()
object to
be used in GAMLSS fitting using the function
gamlss::gamlss()
. The only difference
between GEVfisher()
and GEVquasi()
is the form of scoring method used to
define the weights used in the fitting algorithm. Fisher's scoring,
based on the expected Fisher information is used in GEVfisher()
, whereas
a quasi-Newton scoring, based on the cross products of the first derivatives
of the log-likelihood, is used in GEVquasi()
. The functions
dGEV
, pGEV
, qGEV
and rGEV
define the density, distribution function,
quantile function and random generation for the specific parameterization of
the generalized extreme value distribution given in Details below.
GEVfisher(mu.link = "identity", sigma.link = "log", nu.link = "identity") GEVquasi(mu.link = "identity", sigma.link = "log", nu.link = "identity") dGEV(x, mu = 0, sigma = 1, nu = 0, log = FALSE) pGEV(q, mu = 0, sigma = 1, nu = 0, lower.tail = TRUE, log.p = FALSE) qGEV(p, mu = 0, sigma = 1, nu = 0, lower.tail = TRUE, log.p = FALSE) rGEV(n, mu = 0, sigma = 1, nu = 0)
GEVfisher(mu.link = "identity", sigma.link = "log", nu.link = "identity") GEVquasi(mu.link = "identity", sigma.link = "log", nu.link = "identity") dGEV(x, mu = 0, sigma = 1, nu = 0, log = FALSE) pGEV(q, mu = 0, sigma = 1, nu = 0, lower.tail = TRUE, log.p = FALSE) qGEV(p, mu = 0, sigma = 1, nu = 0, lower.tail = TRUE, log.p = FALSE) rGEV(n, mu = 0, sigma = 1, nu = 0)
mu.link |
Defines the |
sigma.link |
Defines the |
nu.link |
Defines the |
x , q
|
Vector of quantiles. |
mu , sigma , nu
|
Vectors of location, scale and shape parameter values. |
log , log.p
|
Logical. If |
lower.tail |
Logical. If |
p |
Vector of probabilities. |
n |
Number of observations. If |
The distribution function of a GEV distribution with parameters
loc
= ,
scale
= and
shape
= (
) is
where . If
the
distribution function is defined as the limit as
tends to zero.
The support of the distribution depends on
: it is
for
;
for
;
and
is unbounded for
.
See
https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution
and/or Chapter 3 of Coles (2001) for further information.
For each observation in the data, the restriction that is
imposed, which is necessary for the usual asymptotic likelihood theory to be
applicable.
GEVfisher()
and GEVquasi()
each return a
gamlss.dist::gamlss.family()
object
which can be used to fit a regression model with a GEV response
distribution using the
gamlss::gamlss()
function. dGEV()
gives the density,
pGEV()
gives the distribution function, qGEV()
gives the quantile
function, and rGEV()
generates random deviates.
See the examples in fitGEV()
.
Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. Chapter 3: doi:10.1007/978-1-4471-3675-0_3
fitGEV
,
gamlss.dist::gamlss.family()
,
gamlss::gamlss()
Calculates the expected information matrix for the GEV distribution.
gev11e(scale, shape) gev22e(scale, shape, eps = 0.003) gev33e(shape, eps = 0.003) gev12e(scale, shape, eps = 0.003) gev13e(scale, shape, eps = 0.003) gev23e(scale, shape, eps = 0.003) gevExpInfo(scale, shape, eps = 0.003)
gev11e(scale, shape) gev22e(scale, shape, eps = 0.003) gev33e(shape, eps = 0.003) gev12e(scale, shape, eps = 0.003) gev13e(scale, shape, eps = 0.003) gev23e(scale, shape, eps = 0.003) gevExpInfo(scale, shape, eps = 0.003)
scale , shape
|
Numeric vectors. Respective values of the GEV parameters
scale parameter |
eps |
A numeric scalar. For values of |
gevExpInfo
calculates, for single pair of values
(scale, shape)
, the expected information matrix for a
single observation from a GEV distribution with distribution function
where .
The GEV expected information is defined only for
and does
not depend on the value of
.
The other functions are vectorized and calculate the individual
contributions to the expected information matrix. For example, gev11e
calculates the expectation of the negated second
derivative of the GEV log-density with respect to
, that is, each
1
indicates one derivative with respect to . Similarly,
2
denotes one derivative with respect to and
3
one derivative
with respect to , so that, for example,
gev23e
calculates the
expectation of the negated GEV log-density after one
taking one derivative with respect to
and one derivative with
respect to
. Note that
, calculated using
gev33e
, depends only on .
The expectation in gev11e
can be calculated in a direct way for all
. For the other components, direct calculation of the
expectation is unstable when
is close to 0. Instead, we use
a quadratic approximation over
(-eps, eps)
, from a Lagrangian
interpolation of the values from the direct calculation for
-eps
, and
eps
.
gevExpInfo
returns a 3 by 3 numeric matrix with row and column
named loc, scale, shape
. The other functions return a numeric vector of
length equal to the maximum of the lengths of the arguments, excluding
eps
.
# Expected information matrices for ... # ... scale = 2 and shape = -0.4 gevExpInfo(2, -0.4) # ... scale = 3 and shape = 0.001 gevExpInfo(3, 0.001) # ... scale = 3 and shape = 0 gevExpInfo(3, 0) # ... scale = 1 and shape = 0.1 gevExpInfo(1, 0.1) # The individual components of the latter matrix gev11e(1, 0.1) gev12e(1, 0.1) gev13e(1, 0.1) gev22e(1, 0.1) gev23e(1, 0.1) gev33e(0.1)
# Expected information matrices for ... # ... scale = 2 and shape = -0.4 gevExpInfo(2, -0.4) # ... scale = 3 and shape = 0.001 gevExpInfo(3, 0.001) # ... scale = 3 and shape = 0 gevExpInfo(3, 0) # ... scale = 1 and shape = 0.1 gevExpInfo(1, 0.1) # The individual components of the latter matrix gev11e(1, 0.1) gev12e(1, 0.1) gev13e(1, 0.1) gev22e(1, 0.1) gev23e(1, 0.1) gev33e(0.1)