| Title: | Extreme Value Analyses with Missing Data |
|---|---|
| Description: | Performs likelihood-based extreme value inferences with adjustment for the presence of missing values based on Simpson and Northrop (2026) <doi:10.1002/env.70075>. A Generalised Extreme Value distribution is fitted to block maxima using maximum likelihood estimation, with the location and scale parameters reflecting the numbers of non-missing raw values in each block. A Bayesian version is also provided. For the purposes of comparison, there are options to make no adjustment for missing values or to discard any block maximum for which greater than a percentage of the underlying raw values are missing. Example datasets containing missing values are provided. |
| Authors: | Paul J. Northrop [aut, cre, cph], Emma S. Simpson [aut, cph] |
| Maintainer: | Paul J. Northrop <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.2 |
| Built: | 2026-05-23 09:17:57 UTC |
| Source: | https://github.com/paulnorthrop/evmissing |
Performs likelihood-based extreme value inferences with adjustment for the presence of missing values. A Generalised Extreme Value (GEV) distribution is fitted to block maxima using maximum likelihood estimation, with the GEV location and scale parameters reflecting the numbers of non-missing raw values in each block. A Bayesian version is also provided. For the purposes of comparison, there are options to make no adjustment for missing values or to discard any block maximum for which greater than a percentage of the underlying raw values are missing.
The evmissing package was created to accompany the research
paper Simpson, E. S. and Northrop, P. J. (2026) Accounting for missing data
when modelling block maxima, Environmetrics, 37(2): e70075
doi:10.1002/env.70075.
The main functions are
gev_mle: maximum likelihood inference for block maxima based on a GEV
distribution, with S3 methods including confint.
gev_bayes: Bayesian inference for block maxima based on a GEV
distribution.
For objects returned by gev_mle, inferences about return levels are
performed by gev_return, with with S3 methods
including confint.
The function gev_influence quantifies the influence that individual
extreme (small or large) block maxima have on the maximum likelihood
estimators of GEV parameters.
The following example datasets are provided.
BloomsburyOzoneMaxima: Annual maxima ozone levels at Bloomsbury,
London, UK, 1992-2024.
PlymouthOzoneMaxima: Annual maxima ozone levels at Plymouth, Devon,
UK, 1998-2024.
BrestSurgeMaxima: Annual maxima surge heights at Brest, France,
1846-2007.
Maintainer: Paul J. Northrop [email protected] [copyright holder]
Authors:
Emma S. Simpson [copyright holder]
Simpson, E. S. and Northrop, P. J. (2026) Accounting for Missing Data When Modelling Block Maxima, Environmetrics 37(2): e70075. doi:10.1002/env.70075.
Useful links:
Report bugs at https://github.com/paulnorthrop/evmissing/issues
Extracts disjoint block maxima from a time series of raw data. Can also provide information about the effect of missing value patterns on block maxima via pseudo-maxima created by applying blockwise missing value patterns in partially-observed (partial) blocks to fully-observed (full) blocks.
block_maxima( data, block_length, block, pseudo = FALSE, sliding = FALSE, season = FALSE )block_maxima( data, block_length, block, pseudo = FALSE, sliding = FALSE, season = FALSE )
data |
A numeric vector containing a time series of raw data. If
|
block_length |
A numeric scalar. Used calculate the maxima of
disjoint blocks of |
block |
A numeric vector with the same length as |
pseudo |
A logical scalar. If |
sliding |
A logical scalar. Only relevant if |
season |
A logical scalar. Only relevant if |
Exactly one of the arguments block_length or block must be
supplied.
If block_length is supplied and sliding = TRUE then the maxima are
calculated for all blocks of length block_length present in data,
starting with the first block data[1:block_length] and sliding the block
repeatedly by one observation until reaching the final block
data[(length(data) - block_length + 1):length(data)].
Also explain for block
A list, with class
c("list", "block_maxima", "disjoint", "evmissing"), containing the
following numeric vectors:
maxima: the block maxima.
notNA: the numbers of non-missing observations in each block.
n: the maximal block length, that is, the largest number of values that
could have been observed in each block.
If pseudo = TRUE then the returned list also contains the following:
whereNA: a named list containing, for each block, the positions of any
missing values in the block. For example, if (only) the first and fifth
observations in block 3 are missing then the third component (named
block3) of whereNA is c(1, 5). If a block has no missing values
then its component in whereNA is integer(0).
pseudo_maxima: a numeric matrix containing block maxima created by
applying the missing value patterns from partial blocks to all full
blocks. Each column contains the pseudo-maxima resulting from a
particular partial block. The columns are labelled by the number of the
partial block and the columns by the number of the full block. If a
partial block contains all missing values then its entry in
pseudo_maxima is NA. If there are no full blocks or no partial blocks
then pseudo_maxima is NA.
full_maxima: a numeric vector of maxima from full blocks.
partial_maxima: a numeric vector of maxima from partial blocks.
The input arguments pseudo and sliding are also included.
If a block contains only missing values then its value of maxima is NA
and its value of notNA is 0.
If block is supplied then these vectors are named using the values in
block. Otherwise, these vectors do not have names.
Plot method plot.block_maxima.
## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) # Supplying block_length, disjoint maxima block_length <- 3 block_maxima(data, block_length = block_length) # Supplying block_length, sliding maxima block_maxima(data, block_length = block_length, sliding = TRUE) # Supplying block block <- rep(1:5, each = 3) block_maxima(data, block = block) ## Data with a partially-observed block data <- c(data, 1:2) # Supplying block_length (the extra 2 observations are ignored) block_length <- 3 block_maxima(data, block_length = block_length) # Supplying block (with an extra group indicator) block <- c(block, 7, 7) block_maxima(data, block = block)## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) # Supplying block_length, disjoint maxima block_length <- 3 block_maxima(data, block_length = block_length) # Supplying block_length, sliding maxima block_maxima(data, block_length = block_length, sliding = TRUE) # Supplying block block <- rep(1:5, each = 3) block_maxima(data, block = block) ## Data with a partially-observed block data <- c(data, 1:2) # Supplying block_length (the extra 2 observations are ignored) block_length <- 3 block_maxima(data, block_length = block_length) # Supplying block (with an extra group indicator) block <- c(block, 7, 7) block_maxima(data, block = block)
Extracts block maxima and missing value information for each block.
Works like block_maxima but returns extra components, including:
whereNA, the positions of the missing values within each block, and
pseudo_maxima, the maxima created by applying blockwise missing value
patterns in incomplete blocks to full blocks, that is, blocks without any
missing values. To be useful, the input data, data, must contain at least
one full block.
block_maxima_ts(data, block_length, block)block_maxima_ts(data, block_length, block)
data |
A numeric vector containing a time series of raw data. |
block_length |
A numeric scalar. Used calculate the maxima of disjoint
blocks of |
block |
A numeric vector with the same length as |
Exactly one of the arguments block_length or block must be
supplied. If the block sizes implied by block are unequal then an
incomplete block and a full block may have different lengths. If this
occurs when pseudo_maxima are calculated, then the longer block is
trimmed, by discarding trailing values, so that the lengths match.
A list, with class c("list", "block_maxima", "evmissing"),
containing the following components:
maxima: the block maxima.
notNA: the numbers of non-missing observations in each block.
n: the maximal block length, that is, the largest number of values that
could have been observed in each block.
whereNA: a named list containing, for each block, the positions of any
missing values in the block. For example, if (only) the first and fifth
observations in block 3 are missing then the third component (named
block3) of whereNA is c(1, 5). If a block has no missing values
then its component in whereNA is integer(0).
pseudo_maxima: a numeric matrix containing (pseudo) block maxima
created by applying the missing value patterns from incomplete blocks to
all full blocks, that is, blocks without any missing values. Each column
contains the pseudo-maxima resulting from a particular incomplete block.
The columns are labelled by the number of the incomplete block and the
columns by the number of the full block. If an incomplete block contains
all missing values then its entry in pseudo_maxima is NA. If there
are no full blocks or no incomplete blocks then pseudo_maxima is NA.
full_maxima: a numeric vector of maxima from full blocks.
partial_maxima: a numeric vector of maxima from partial blocks.
If a block contains only missing values then its value of maxima is NA,
its value of notNA is 0 and whereNA contains the positions of all the
observations in the block.
If block is supplied then these vectors are named using the values in
block. Otherwise, these vectors do not have names.
## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) # Supplying block_length block_length <- 3 block_maxima_ts(data, block_length = block_length) # Supplying block block <- rep(1:5, each = 3) block_maxima_ts(data, block = block) ## Data with an incomplete block data <- c(data, 1:2) # Supplying block_length (the extra 2 observations are ignored) block_length <- 3 block_maxima_ts(data, block_length = block_length) # Supplying block (with an extra group indicator) block <- c(block, 7, 7) block_maxima_ts(data, block = block)## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) # Supplying block_length block_length <- 3 block_maxima_ts(data, block_length = block_length) # Supplying block block <- rep(1:5, each = 3) block_maxima_ts(data, block = block) ## Data with an incomplete block data <- c(data, 1:2) # Supplying block_length (the extra 2 observations are ignored) block_length <- 3 block_maxima_ts(data, block_length = block_length) # Supplying block (with an extra group indicator) block <- c(block, 7, 7) block_maxima_ts(data, block = block)
Daily maximum ozone levels at Bloomsbury in London (UK) for the years 1992-2024 inclusive.
BloomsburyOzoneBloomsburyOzone
BloomsburyOzone is a data frame with 12054 rows and the 3 variables:
Date: with class "Date" in the format YYYY-MM-DD.
Year: Values in 1992-2024.
Ozone: daily maximum ozone level in g/m.
The Department for Environment Food and Rural Affair (DEFRA). The London Bloomsbury monitoring site at the UK-AIR database Data Selector.
BloomsburyOzoneMaxima for the annual maxima and numbers of
missing values per year.
head(BloomsburyOzone) # Time series plot of annual maxima ozone levels plot(BloomsburyOzone$Date, BloomsburyOzone$Ozone, xlab = "year", ylab = "ozone (micrograms / metre cubed)", pch = 16)head(BloomsburyOzone) # Time series plot of annual maxima ozone levels plot(BloomsburyOzone$Date, BloomsburyOzone$Ozone, xlab = "year", ylab = "ozone (micrograms / metre cubed)", pch = 16)
Annual maxima of daily maximum ozone levels at Bloomsbury in London (UK) for the years 1992-2024 inclusive.
BloomsburyOzoneMaximaBloomsburyOzoneMaxima
BloomsburyOzoneMaxima is a data frame with 33 rows (years 1992 to
2024) and the 4 variables:
maxima: annual maximum ozone level in g/m.
notNA : the number of days of the year for which raw data were available.
n : the number of days in the year (365 or 366).
block : a block number of 1 for year 1992 through to 33 for year 2024.
The row names of BloomsburyOzoneMaxima are the years 1992:2024.
The raw data are missing for approximately of the days.
The Department for Environment Food and Rural Affair (DEFRA). The London Bloomsbury monitoring site at the UK-AIR database Data Selector.
BloomsburyOzone for the raw time series.
head(BloomsburyOzoneMaxima) # Time series plot of annual maxima ozone levels plot(rownames(BloomsburyOzoneMaxima), BloomsburyOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(BloomsburyOzoneMaxima), BloomsburyOzoneMaxima$notNA / BloomsburyOzoneMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot ozone levels against the proportion of non-missing days plot(BloomsburyOzoneMaxima$notNA / BloomsburyOzoneMaxima$n, BloomsburyOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "proportion of non-missing days", pch = 16)head(BloomsburyOzoneMaxima) # Time series plot of annual maxima ozone levels plot(rownames(BloomsburyOzoneMaxima), BloomsburyOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(BloomsburyOzoneMaxima), BloomsburyOzoneMaxima$notNA / BloomsburyOzoneMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot ozone levels against the proportion of non-missing days plot(BloomsburyOzoneMaxima$notNA / BloomsburyOzoneMaxima$n, BloomsburyOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "proportion of non-missing days", pch = 16)
Number of days in each month relevant to the Brest sea surge heights data
BrestSurgeMaxima.
BrestSurgeDaysBrestSurgeDays
BrestSurgeDays is a data frame with 162 rows (years 1846 to
2007) and the 12 variables (one for each month of the year). Each value
in the data frame gives the number of days in the month in question.
The row names of BrestSurgeMaxima are the years 1946:2007 and the column
names are the abbreviated names of the months.
BrestSurgeMaxima: Annual maxima surge heights at Brest, France.
BrestSurgeMissing: numbers of missing values in each month.
head(BrestSurgeDays)head(BrestSurgeDays)
Annual maxima of sea surge heights near high tide at Brest tide gauge station (France) for the years 1846-2007 inclusive.
BrestSurgeMaximaBrestSurgeMaxima
BrestSurgeMaxima is a data frame with 162 rows (years 1846 to
2007) and the 4 variables:
maxima: annual maximum surge height at high tide in cm.
notNA : the number of days of the year for which raw data were available.
n : the number of days in the year (365 or 366).
block : a block number of 1 for year 1846 through to 162 for year 2007.
The row names of BrestSurgeMaxima are the years 1946:2007.
The raw data are missing for approximately of the days.
The data were declustered by the original providers in order to provide a
series of independent surge heights at high tide. Specifically, these
surge heights are separated by at least two days. A correction was applied
to account for trend in the sea-level over the observation period.
Although the declustering of the data means that the effective block size is
smaller than n, it may be reasonable to suppose that the proportion
notNA/n of non-missing values provides a useful measure of the extent to
which the size of an annual maximum is likely to be affected by missingness.
The dataset Brest in the Renext R package, specifically
Brest$OTdata and Brest$OTmissing. Originally, the source was
https://data.shom.fr/.
Deville Y. and Bardet L. (2023). Renext: Renewal Method for Extreme Values Extrapolation. R package version 3.1-4. doi:10.32614/CRAN.package.Renext
BrestSurgeMissing: numbers of missing values in each month.
BrestSurgeDays: Number of days per month in 1846-2007.
head(BrestSurgeMaxima) # Time series plot of annual maxima surges plot(rownames(BrestSurgeMaxima), BrestSurgeMaxima$maxima, ylab = "surge (cm)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(BrestSurgeMaxima), BrestSurgeMaxima$notNA / BrestSurgeMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot surges against the proportion of non-missing days plot(BrestSurgeMaxima$notNA / BrestSurgeMaxima$n, BrestSurgeMaxima$maxima, ylab = "surge (cm)", xlab = "proportion of non-missing days", pch = 16)head(BrestSurgeMaxima) # Time series plot of annual maxima surges plot(rownames(BrestSurgeMaxima), BrestSurgeMaxima$maxima, ylab = "surge (cm)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(BrestSurgeMaxima), BrestSurgeMaxima$notNA / BrestSurgeMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot surges against the proportion of non-missing days plot(BrestSurgeMaxima$notNA / BrestSurgeMaxima$n, BrestSurgeMaxima$maxima, ylab = "surge (cm)", xlab = "proportion of non-missing days", pch = 16)
Numbers of missing values in each month of the Brest sea surge heights data
BrestSurgeMaxima.
BrestSurgeMissingBrestSurgeMissing
BrestSurgeMissing is a data frame with 162 rows (years 1846 to
2007) and the 12 variables (one for each month of the year). Each value
in the data frame gives the number of days for which the surge height data
were missing in the month in question.
The row names of BrestSurgeMaxima are the years 1946:2007 and the column
names are the abbreviated names of the months.
The dataset Brest in the Renext R package, specifically
Brest$OTmissing. Originally, the source was
https://data.shom.fr/.
Deville Y. and Bardet L. (2023). Renext: Renewal Method for Extreme Values Extrapolation. R package version 3.1-4. doi:10.32614/CRAN.package.Renext
BrestSurgeMaxima: Annual maxima surge heights at Brest, France.
BrestSurgeDays: Number of days per month in 1846-2007.
head(BrestSurgeMissing) # Proportion of missing values by year propn_year <- rowSums(BrestSurgeMissing) / days_in_year(rownames(BrestSurgeMissing)) plot(rownames(BrestSurgeMissing), propn_year, ylab = "proportion of missing values", xlab = "year", pch = 16) # Proportion of missing values by year and month propn_year_month <- BrestSurgeMissing / BrestSurgeDays # Proportion of missing values by month plot(1:12, colMeans(propn_year_month), axes = FALSE, ylab = "proportion of missing values", xlab = "month", pch = 16) axis(1, at = 1:12, labels = 1:12) axis(2) box()head(BrestSurgeMissing) # Proportion of missing values by year propn_year <- rowSums(BrestSurgeMissing) / days_in_year(rownames(BrestSurgeMissing)) plot(rownames(BrestSurgeMissing), propn_year, ylab = "proportion of missing values", xlab = "year", pch = 16) # Proportion of missing values by year and month propn_year_month <- BrestSurgeMissing / BrestSurgeDays # Proportion of missing values by month plot(1:12, colMeans(propn_year_month), axes = FALSE, ylab = "proportion of missing values", xlab = "month", pch = 16) axis(1, at = 1:12, labels = 1:12) axis(2) box()
"confint_gev"
Methods for objects of class "confint_gev" returned from
confint.evmissing.
## S3 method for class 'confint_gev' print(x, ...) ## S3 method for class 'confint_gev' plot(x, parm = c("mu", "sigma", "xi"), add = TRUE, digits = 2, ...)## S3 method for class 'confint_gev' print(x, ...) ## S3 method for class 'confint_gev' plot(x, parm = c("mu", "sigma", "xi"), add = TRUE, digits = 2, ...)
x |
An object inheriting from class |
... |
Further arguments. For |
parm |
A character scalar specifying the parameter for which
a profile log-likelihood is plotted. Must be a single component of
|
add |
A logical scalar. If |
digits |
An integer. Passed to |
print.confint_gev. A numeric matrix with 2 columns giving the
lower and upper confidence limits for the parameters specified by the
argument parm in confint.evmissing. These columns are labelled as
(1-level)/2 and 1-(1-level)/2, expressed as a percentage, by default
2.5% and 97.5%.
plot.confint_gev. A plot is produced of the profile log-likelihood for
the parameter chosen by parm. Only the parameter values used to profile
the log-likelihood in the call to confint.evmissing are included, so
if faster = TRUE was used then the plot will not be of a smooth curve
but will be triangular in the middle.
print.confint_gev: the argument x is returned, invisibly.
plot.confint_gev: a numeric vector containing the confidence limits for
the parameter requested in parm is returned invisibly.
See evmissing_methods.
gev_mle and evmissing_methods.
"confint_return_level"
Methods for objects of class "confint_return_level" returned from
confint.return_level.
## S3 method for class 'confint_return_level' print(x, ...) ## S3 method for class 'confint_return_level' plot(x, parm = 1, add = TRUE, digits = 2, ...)## S3 method for class 'confint_return_level' print(x, ...) ## S3 method for class 'confint_return_level' plot(x, parm = 1, add = TRUE, digits = 2, ...)
x |
An object inheriting from class |
... |
Further arguments. For |
parm |
An integer scalar. For which component, that is, which return
level, in |
add |
A logical scalar. If |
digits |
An integer. Passed to |
print.confint_return_level. A numeric matrix with 2 columns
giving the lower and upper confidence limits for the parameters specified
by the argument parm in confint.return_level. These columns are
labelled as (1-level)/2 and 1-(1-level)/2, expressed as a percentage,
by default 2.5% and 97.5%.
plot.confint.return_level. A plot is produced of the profile log-likelihood for
the parameter chosen by parm.
print.confint_return_level: the argument x is returned, invisibly.
plot.confint_return_level: a numeric vector containing the confidence
interval for the return level chosen for the plot.
See return_level_methods.
gev_mle, gev_return and return_level_methods.
Returns the number of days in each of a vector of years or months.
days_in_year(year) days_in_month(year, month)days_in_year(year) days_in_month(year, month)
year |
An integer vector. The years of interest. |
month |
An integer vector. A subset of |
The length of the output vector is equal to the length of month.
The argument year is recycled to the length of the output vector if
necessary.
A numeric vector of the numbers of days in each of the years in
year or the months specified by year and month.
days_in_year(1999:2025) days_in_month(2024, 1:12) days_in_month(2025, 1:12) days_in_month(2024:2025, 1:3)days_in_year(1999:2025) days_in_month(2024, 1:12) days_in_month(2025, 1:12) days_in_month(2024:2025, 1:3)
"evmissing"
Methods for objects of class "evmissing" returned from gev_mle.
## S3 method for class 'evmissing' coef(object, ...) ## S3 method for class 'evmissing' vcov(object, ...) ## S3 method for class 'evmissing' nobs(object, ...) ## S3 method for class 'evmissing' logLik(object, ...) ## S3 method for class 'evmissing' summary(object, digits = max(3, getOption("digits") - 3L), ...) ## S3 method for class 'summary.evmissing' print(x, ...) ## S3 method for class 'evmissing' confint( object, parm = "all", level = 0.95, profile = FALSE, mult = 2, faster = FALSE, epsilon = 1e-04, ... ) ## S3 method for class 'evmissing' plot( x, adjust = TRUE, which = c("pp", "qq", "return", "density"), m = c(2, 10, 100, 1000), level = 0.95, profile = TRUE, num, npy = 1, ... )## S3 method for class 'evmissing' coef(object, ...) ## S3 method for class 'evmissing' vcov(object, ...) ## S3 method for class 'evmissing' nobs(object, ...) ## S3 method for class 'evmissing' logLik(object, ...) ## S3 method for class 'evmissing' summary(object, digits = max(3, getOption("digits") - 3L), ...) ## S3 method for class 'summary.evmissing' print(x, ...) ## S3 method for class 'evmissing' confint( object, parm = "all", level = 0.95, profile = FALSE, mult = 2, faster = FALSE, epsilon = 1e-04, ... ) ## S3 method for class 'evmissing' plot( x, adjust = TRUE, which = c("pp", "qq", "return", "density"), m = c(2, 10, 100, 1000), level = 0.95, profile = TRUE, num, npy = 1, ... )
object |
An object inheriting from class |
... |
Further arguments. Only used in the following cases.
|
digits |
An integer. Passed to |
x |
An object returned by |
parm |
A character vector specifying the parameters for which
confidence intervals are to be calculated. The default, |
level |
The confidence level required. A numeric scalar in (0, 1). |
profile |
A logical scalar. If |
mult |
A positive numeric scalar. Controls the increment by which the
parameter of interest is increased/decreased when profiling above/below
its MLE. The increment is |
faster |
A logical scalar. If |
epsilon |
Only relevant if
|
adjust |
If |
which |
If supplied, this must either be a character scalar, one of
|
m |
A numeric vector of return periods to label on the
horizontal axis of the return level plot. Along with the data, the
smallest and largest return period values in |
num |
An integer scalar. The number of return level estimates to
calculate to produce the return level curve and pointwise confidence
limits in the return level plot. The default setting is approximately
5 times |
npy |
A numeric scalar. The number |
The plots produced by plot.evmissing are of a similar form to the
visual diagnostics is the ismev package and described in Coles (2001),
that is, a probability plot (which = "pp" or which = 1), a quantile
plot (which ="qq" of which = 2), a return level plot
(which = "return" or which = 3) and a histogram of block maxima with a
fitted GEV density superimposed (which = "density" or which = 4).
Pointwise confidence bands of level level are added to the probability
plot and quantile plot.
The default setting for confidence intervals for a return level plot
produced by plot.evmissing is profile = TRUE, which uses gev_return
and confint.return_level. The plot takes longer to produce, but it
avoids the unrealistic feature of the lower confidence limits decreasing
as we extrapolate to long return periods.
If adjust = TRUE then the empirical values based on the observed block
maxima are adjusted for the number of non-missing raw observations in
each block based on the fitted GEV parameter values for reduced block
sizes. Passing adjust = FALSE is not sensible, but, if there are missing
data, then it can serve to show that making the adjustment is necessary to
give the correct impression of how well the model has fitted the data.
For confint.evmissing, the default, epsilon = -1, should work well
enough in most circumstances, but to achieve a specific accuracy set
epsilon to be a small positive value, for example, epsilon = 1e-4.
coef.evmissing: a numeric vector of length 3 with names
c("mu", "sigma", "xi"). The MLEs of the parameters ,
and .
vcov.evmissing: a matrix with row and column
names c("mu", "sigma", "xi"). The estimated variance-covariance matrix
for the model parameters , and .
nobs.evmissing: a numeric scalar. The number of maxima used in the model
fit.
logLik.evmissing: an object of class "logLik": a numeric scalar
with value equal to the maximised log-likelihood. The returned object
also has attributes nobs, the number of maxima used in the model fit
and "df" (degrees of freedom), which is equal to the number of total
number of parameters estimated (3).
summary.evmissing: an object with class "summary.evmissing" containing
the original function call and a matrix of estimates and estimated
standard errors with row names c("mu", "sigma", "xi"). The object is
printed by print.summary.evmissing.
print.summary.evmissing: the argument x is returned, invisibly.
confint.evmissing: an object of class c("confint_gev", "evmissing").
A numeric matrix with 2 columns giving the lower and upper confidence
limits for each parameter. These columns are labelled as (1-level)/2 and
1-(1-level)/2, expressed as a percentage, by default 2.5% and 97.5%.
The row names are the names of the parameters supplied in parm.
The ordering "mu", "sigma", "xi" is retained regardless of the
ordering of the parameters in parm.
If profile = TRUE then the returned object has extra attributes crit,
level and for_plot. The latter is a named list of length 3 with
components mu, sigma and xi. Each components is a 2-column numeric
matrix. The first column (named mu_values etc) contains values of the
parameter and the second column the corresponding values of the profile
log-likelihood. The profile log-likelihood is equal to the attribute
crit at the limits of the confidence interval. The attribute level is
the input argument level.
plot.evmissing: if a return level plot has been requested, a 3-column
matrix containing the values plotted in the return level plot. Column 2
contains the estimated return levels and columns 1 and 3 the lower and
upper confidence limits.
Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. doi:10.1007/978-1-4471-3675-0_3
Heffernan, J. E. and Stephenson, A. G. (2018). ismev: An Introduction to Statistical Modeling of Extreme Values. R package version 1.42. doi:10.32614/CRAN.package.ismev
gev_mle and confint_gev_methods.
## Plymouth ozone data # Make adjustment for the numbers of non-missing values per block fit <- gev_mle(PlymouthOzoneMaxima) coef(fit) vcov(fit) nobs(fit) logLik(fit) summary(fit) ## Model diagnostic plots # When profile = FALSE the return confidence limits are unrealistic # for long return periods plot(fit, profile = FALSE) # Create the return level plot only # When profile = TRUE (the default) the confidence limits are fine # but the plot takes longer # For speed, we reduce the number, num, of points used to plot the curves plot(fit, which = 3, num = 8) # If we do not reflect the adjustment in the plot then it gives a false # impression of how well the model has fitted the data plot(fit, adjust = FALSE, profile = FALSE) ## Confidence intervals # Confidence limits that are symmetric about the respective MLEs confint(fit) # Calling confint to produce a smooth profile log-likelihood plot x <- confint(fit, profile = TRUE) x plot(x, parm = "xi") # Doing this more quickly when we only want the approximate confidence limits x <- confint(fit, profile = TRUE, mult = 32, faster = TRUE) x plot(x, parm = "xi", type = "b")## Plymouth ozone data # Make adjustment for the numbers of non-missing values per block fit <- gev_mle(PlymouthOzoneMaxima) coef(fit) vcov(fit) nobs(fit) logLik(fit) summary(fit) ## Model diagnostic plots # When profile = FALSE the return confidence limits are unrealistic # for long return periods plot(fit, profile = FALSE) # Create the return level plot only # When profile = TRUE (the default) the confidence limits are fine # but the plot takes longer # For speed, we reduce the number, num, of points used to plot the curves plot(fit, which = 3, num = 8) # If we do not reflect the adjustment in the plot then it gives a false # impression of how well the model has fitted the data plot(fit, adjust = FALSE, profile = FALSE) ## Confidence intervals # Confidence limits that are symmetric about the respective MLEs confint(fit) # Calling confint to produce a smooth profile log-likelihood plot x <- confint(fit, profile = TRUE) x plot(x, parm = "xi") # Doing this more quickly when we only want the approximate confidence limits x <- confint(fit, profile = TRUE, mult = 32, faster = TRUE) x plot(x, parm = "xi", type = "b")
Performs Bayesian inference using a GEV distribution using block maxima, with the option to make an adjustment for the numbers of non-missing raw values in each block.
gev_bayes( data, block_length, block, adjust = TRUE, discard = 0, init = "quartiles", prior = revdbayes::set_prior(prior = "flat", model = "gev"), n = 1000, ... )gev_bayes( data, block_length, block, adjust = TRUE, discard = 0, init = "quartiles", prior = revdbayes::set_prior(prior = "flat", model = "gev"), n = 1000, ... )
data |
Either
|
block_length |
A numeric scalar. Used calculate the maxima of disjoint
blocks of |
block |
A numeric vector with the same length as |
adjust |
A logical scalar or a numeric scalar in
|
discard |
A numeric scalar. Any block maximum for which greater than
|
init |
Either a character scalar, one of |
prior |
Specifies a prior distribution for the GEV parameters. This is
most easily set using |
n |
A non-negative integer. The number of values to simulate from the
posterior distribution for |
... |
Further arguments to be passed to |
The likelihood described in gev_mle is combined with the prior
density provided by prior to produce, up to proportionality, a
posterior density for .
A function to evaluate the log-posterior is passed to rust::ru to
simulate a random sample from this posterior distribution using the
generalised ratio-of-uniforms method, using relocation of the mode of the
density to the origin to increase efficiency. The value of init is used
as an initial estimate in a search for the posterior mode. Arguments to
rust::ru can be passed via .... The default setting is
trans = "none", that is, no transformation of the margins, and
rotate = TRUE, rotation of the parameter axes to improve isotropy
with a view to increasing efficiency.
An object returned from rust::ru. The following components are
added to this list
model: = "gev".
data,prior: the inputs data and prior.
call: the call to gev_bayes.
maxima: the vector of block maxima used to fit the model.
notNA: the number of non-missing raw values on which the maxima in
maxima are based.
n: the maximal block length, that is, the largest number of values that
could have been observed in each of these blocks.
adjust: a logical scalar indicating whether or not the adjustment in
the Details section of gev_mle was performed. This is TRUE
only if the input argument adjust was TRUE.
adjust,discard : the values of these input arguments.
The class of the returned object is
c("evpost", "ru", "bayes", "evmissing").
Objects of class "evpost" have print,
summary and plot
S3 methods.
## Simulate data with missing values set.seed(24032025) blocks <- 50 block_length <- 365 # Simulate raw data from an exponential distribution sdata <- sim_data(blocks = blocks, block_length = block_length) block_length <- sdata$block_length # Sample from the posterior based on block maxima from full data post1 <- gev_bayes(sdata$data_full, block_length = block_length) summary(post1) # Sample with adjustment for the number of non-missing values per block post2 <- gev_bayes(sdata$data_miss, block_length = block_length) summary(post2) # Do not make the adjustment post3 <- gev_bayes(sdata$data_miss, block_length = block_length, adjust = FALSE) summary(post3) # Remove all block maxima with greater than 25% missing values and # do not make the adjustment post4 <- gev_bayes(sdata$data_miss, block_length = block_length, adjust = FALSE, discard = 25) summary(post4) ## Brest sea surge data post <- gev_bayes(BrestSurgeMaxima) summary(post) plot(post)## Simulate data with missing values set.seed(24032025) blocks <- 50 block_length <- 365 # Simulate raw data from an exponential distribution sdata <- sim_data(blocks = blocks, block_length = block_length) block_length <- sdata$block_length # Sample from the posterior based on block maxima from full data post1 <- gev_bayes(sdata$data_full, block_length = block_length) summary(post1) # Sample with adjustment for the number of non-missing values per block post2 <- gev_bayes(sdata$data_miss, block_length = block_length) summary(post2) # Do not make the adjustment post3 <- gev_bayes(sdata$data_miss, block_length = block_length, adjust = FALSE) summary(post3) # Remove all block maxima with greater than 25% missing values and # do not make the adjustment post4 <- gev_bayes(sdata$data_miss, block_length = block_length, adjust = FALSE, discard = 25) summary(post4) ## Brest sea surge data post <- gev_bayes(BrestSurgeMaxima) summary(post) plot(post)
Calculates influence function curves for maximum likelihood estimators of Generalised Extreme Value (GEV) parameters.
gev_influence(z, mu = 0, sigma = 1, xi = 0) ## S3 method for class 'gev_influence' plot(x, xvar = c("z", "y"), sep_xi = TRUE, vlines, ...)gev_influence(z, mu = 0, sigma = 1, xi = 0) ## S3 method for class 'gev_influence' plot(x, xvar = c("z", "y"), sep_xi = TRUE, vlines, ...)
z |
A numeric vector. Values of normal quantiles |
mu, sigma, xi
|
Numeric scalars supplying the values of the GEV
parameters |
x |
An object inheriting from class |
xvar |
A logical scalar.
If |
sep_xi |
A logical scalar. If |
vlines |
A numeric vector. If |
... |
For |
An influence function measures the effect on a parameter estimator
of changing one observation in a sample. See Hampel (2005) and, in the
context of extreme value analyses of threshold exceedances,
Davison and Smith (1990).
Let . The GEV influence curve is defined,
as a function of an observation , by
,
where is the GEV log-likelihood function and
is the GEV expected information matrix.
The value of is related to by
, where and are the
distribution functions of a standard normal and
GEV() distribution, respectively. Plotting influence
curves on the standard normal z scale can help to aid interpretation.
The value of is invariant to the value of .
For a given value of , the values of and
scale with the scale parameter , that is,
doubling doubles and
Typically, interest focuses on the shape parameter , but if the
input scale parameter is large then this may hide the
influence of on . Therefore, the default setting of
plot.gev_influence, sep_xi = TRUE, separates the plotting of the
influence curve for from those of and .
The example in Examples shows that extremely large block maxima have a
strong positive influence on the MLE and also that
extremely small block maxima have a strong negative influence on
,
gev_influence: an object with class
c("gev_influence", "matrix", "array"), a length(z) by 5 numeric
matrix. The first two columns contain the input values in z and the
corresponding values of y. Columns 3-5 contain the values of the GEV
influence function for , and respectively
at the values of z.
plot.gev_influence: a list of the graphical parameters used in producing
the plot, either the defaults or supplied via ..., is returned
invisibly.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (2005). Robust Statistics. Wiley-Interscience, New York. doi:10.1002/9781118186435
Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3):393–425. doi:10.1111/j.2517-6161.1990.tb01796.x
# Influence curve for the default mu = 0, sigma = 1, xi = 0 case z <- seq(from = -3, to = 3, by = 0.01) inf <- gev_influence(z = z) plot(inf) # Influence curves based on the adjusted fit to the Plymouth ozone data fit <- gev_mle(PlymouthOzoneMaxima) pars <- coef(fit) infp <- gev_influence(z = z, mu = pars[1], sigma = pars[2], xi = pars[3]) plot(infp)# Influence curve for the default mu = 0, sigma = 1, xi = 0 case z <- seq(from = -3, to = 3, by = 0.01) inf <- gev_influence(z = z) plot(inf) # Influence curves based on the adjusted fit to the Plymouth ozone data fit <- gev_mle(PlymouthOzoneMaxima) pars <- coef(fit) infp <- gev_influence(z = z, mu = pars[1], sigma = pars[2], xi = pars[3]) plot(infp)
Calculates influence function curves for maximum likelihood estimators of 3 return levels based on Generalised Extreme Value (GEV) parameters.
gev_influence_rl(z, mu = 0, sigma = 1, xi = 0, m, npy = 1) ## S3 method for class 'gev_influence_rl' plot(x, xvar = c("z", "y"), vlines, ...)gev_influence_rl(z, mu = 0, sigma = 1, xi = 0, m, npy = 1) ## S3 method for class 'gev_influence_rl' plot(x, xvar = c("z", "y"), vlines, ...)
z |
A numeric vector. Values of normal quantiles |
mu, sigma, xi
|
Numeric scalars supplying the values of the GEV
parameters |
m |
A numeric vector of length 3 containing 3 unique return periods
in years. All entries in |
npy |
A numeric scalar. The number |
x |
An object inheriting from class |
xvar |
A logical scalar.
If |
vlines |
A numeric vector. If |
... |
For |
See gev_influence for information about influence functions in
general and influence curves for the parameters of a GEV distribution in
particular. The GEV influence curves are reparameterised from
to the required return levels.
gev_influence_rl: an object with class
c("gev_influence_rl", "matrix", "array"), a length(z) by 5 numeric
matrix. The first two columns contain the input values in z and the
corresponding values of y. Columns 3-5 contain the values of the GEV
influence function for the return levels in m respectively at the values
of z.
plot.gev_influence_rl: a list of the graphical parameters used in producing
the plot, either the defaults or supplied via ..., is returned
invisibly.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (2005). Robust Statistics. Wiley-Interscience, New York. doi:10.1002/9781118186435
Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3):393–425. doi:10.1111/j.2517-6161.1990.tb01796.x
# Influence curves based on the adjusted fit to the Plymouth ozone data z <- seq(from = -3, to = 3, by = 0.01) fit <- gev_mle(PlymouthOzoneMaxima) pars <- coef(fit) m <- c(25, 50, 100) infp <- gev_influence_rl(z = z, mu = pars[1], sigma = pars[2], xi = pars[3], m = m) plot(infp)# Influence curves based on the adjusted fit to the Plymouth ozone data z <- seq(from = -3, to = 3, by = 0.01) fit <- gev_mle(PlymouthOzoneMaxima) pars <- coef(fit) m <- c(25, 50, 100) infp <- gev_influence_rl(z = z, mu = pars[1], sigma = pars[2], xi = pars[3], m = m) plot(infp)
Fits a GEV distribution to block maxima using maximum likelihood estimation, with the option to make an adjustment for the numbers of non-missing raw values in each block. The GEV location and scale parameters are adjusted to reflect the proportion of raw values that are missing.
gev_mle( data, block_length, block, adjust = TRUE, discard = 0, init = "quartiles", ... )gev_mle( data, block_length, block, adjust = TRUE, discard = 0, init = "quartiles", ... )
data |
Either
|
block_length |
A numeric scalar. Used calculate the maxima of disjoint
blocks of |
block |
A numeric vector with the same length as |
adjust |
A logical scalar or a numeric scalar in
|
discard |
A numeric scalar. Any block maximum for which greater than
|
init |
Either a character scalar, one of |
... |
Further arguments to be passed to |
If data is a numeric vector then exactly one of the arguments
block_length or block must be supplied. The parameters are fitted
using maximum likelihood estimation.
The adjustment for the numbers of non-missing values underlying the block
maxima is based on the strong assumptions that missing values occur
completely at random and that the raw data are independent and identically
distributed. We suppose that a block maximum based on
a full (no missing raw values) block of length has a
distribution, with distribution function
. Let be the number of non-missing values in block
and let denote the block maximum of such a block. We suppose
that has a
distribution, where
These expressions are based on inferring the parameters of an approximate
GEV distribution for from its approximate distribution function
.
A likelihood is constructed as the product of contributions from the maxima
from distinct blocks, under the assumption that these maxima are
independent. Let and let
denote the usual, unadjusted, GEV
log-likelihood for the full-data case where there are no missing values.
It can be shown that our adjusted log-likelihood
is given by
where is the proportion of missing values in block
.
The negated log-likelihood is minimised using a call to
stats::optim with hessian = TRUE. If stats::optim throws an error
then a warning is produced and the returned object has NA values for
the components par, loglik, vcov and se and an extra component
optim_error containing the error message. If the estimated observed
information matrix is singular then a warning is produced and the returned
object has NA values for the components vcov and se.
A list, returned from stats::optim (the MLEs are in the
component par), with the additional components:
loglik: value of the maximised log-likelihood.
vcov: estimated variance-covariance matrix of the parameters.
se: estimated standard errors of the parameters.
maxima: the vector of block maxima used to fit the model.
notNA: the number of non-missing raw values on which the maxima in
maxima are based.
n: the maximal block length, that is, the largest number of values that
could have been observed in each of these blocks.
adjust,discard : the values of these input arguments.
The call to gev_mle is provided in the attribute "call".
The class of the returned object is c("evmissing", "mle", "list").
Objects inheriting from class "evmissing" have coef, logLik, nobs,
summary, vcov and confint methods. See evmissing_methods.
Simpson, E. S. and Northrop, P. J. (2026) Accounting for Missing Data When Modelling Block Maxima, Environmetrics 37(2): e70075. doi:10.1002/env.70075.
## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially # Fit a GEV distribution to block maxima from the full data fit1 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit1) # An identical fit supplying the block indicator instead of block_length fit1b <- gev_mle(sdata$data_full, block = sdata$block) summary(fit1b) # Make adjustment for the numbers of non-missing values per block fit2 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit2) # Do not make the adjustment fit3 <- gev_mle(sdata$data_miss, block_length = sdata$block_length, adjust = FALSE) summary(fit3) # Remove all block maxima with greater than 25% missing values and # do not make the adjustment fit4 <- gev_mle(sdata$data_miss, block_length = sdata$block_length, adjust = FALSE, discard = 25) summary(fit4) ## Plymouth ozone data # Calculate the values in Table 4 of Simpson and Northrop (2026) # discard = 50 is chosen to discard data from 2001 and 2006 fit1 <- gev_mle(PlymouthOzoneMaxima) fit2 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE) fit3 <- gev_mle(PlymouthOzoneMaxima, discard = 50) fit4 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE, discard = 50) se <- function(x) return(sqrt(diag(vcov(x)))) MLEs <- cbind(coef(fit1), coef(fit2), coef(fit3), coef(fit4)) SEs <- cbind(se(fit1), se(fit2), se(fit3), se(fit4)) round(MLEs, 2) round(SEs, 2)## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially # Fit a GEV distribution to block maxima from the full data fit1 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit1) # An identical fit supplying the block indicator instead of block_length fit1b <- gev_mle(sdata$data_full, block = sdata$block) summary(fit1b) # Make adjustment for the numbers of non-missing values per block fit2 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit2) # Do not make the adjustment fit3 <- gev_mle(sdata$data_miss, block_length = sdata$block_length, adjust = FALSE) summary(fit3) # Remove all block maxima with greater than 25% missing values and # do not make the adjustment fit4 <- gev_mle(sdata$data_miss, block_length = sdata$block_length, adjust = FALSE, discard = 25) summary(fit4) ## Plymouth ozone data # Calculate the values in Table 4 of Simpson and Northrop (2026) # discard = 50 is chosen to discard data from 2001 and 2006 fit1 <- gev_mle(PlymouthOzoneMaxima) fit2 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE) fit3 <- gev_mle(PlymouthOzoneMaxima, discard = 50) fit4 <- gev_mle(PlymouthOzoneMaxima, adjust = FALSE, discard = 50) se <- function(x) return(sqrt(diag(vcov(x)))) MLEs <- cbind(coef(fit1), coef(fit2), coef(fit3), coef(fit4)) SEs <- cbind(se(fit1), se(fit2), se(fit3), se(fit4)) round(MLEs, 2) round(SEs, 2)
Calculates point estimates of -year return levels for fitted model
objects returned from gev_mle.
gev_return(x, m = 100, npy = 1)gev_return(x, m = 100, npy = 1)
x |
An object inheriting from class |
m |
A numeric vector. Values of |
npy |
A numeric scalar. The number |
For , the -year return level is given by
, where
and .
For , .
Equivalently, we could note that ,
where is a Box-Cox transformation.
An object with class c("return_level", "numeric", "evmissing").
A numeric vector containing the MLEs of the required return levels, with
names indicating the return period. The fitted model object returned
from gev_mle is included as an attribute called "gev_mle".
The input arguments m and npy are also included as attributes as is
the call to gev_return.
Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. doi:10.1007/978-1-4471-3675-0_3
return_level_methods for print, summary, coef, vcov and
confint methods.
## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially # Fit a GEV distribution to block maxima from the full data fit1 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit1) # Make adjustment for the numbers of non-missing values per block fit2 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit2) gev_return(fit1, m = c(100, 1000)) gev_return(fit2, m = c(100, 1000)) ## Plymouth ozone data fit <- gev_mle(PlymouthOzoneMaxima) rl <- gev_return(fit, m = c(100, 200)) # Symmetric confidence intervals sym <- confint(rl) # Profile-based confidence intervals prof <- confint(rl, profile = TRUE) prof plot(prof, digits = 4) plot(prof, parm = 2, digits = 3) # Doing this more quickly when we only care about the confidence limits prof <- confint(rl, profile = TRUE, mult = 32, faster = TRUE) plot(prof, digits = 3, type = "b") plot(prof, parm = 2, digits = 3, type = "b")## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially # Fit a GEV distribution to block maxima from the full data fit1 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit1) # Make adjustment for the numbers of non-missing values per block fit2 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit2) gev_return(fit1, m = c(100, 1000)) gev_return(fit2, m = c(100, 1000)) ## Plymouth ozone data fit <- gev_mle(PlymouthOzoneMaxima) rl <- gev_return(fit, m = c(100, 200)) # Symmetric confidence intervals sym <- confint(rl) # Profile-based confidence intervals prof <- confint(rl, profile = TRUE) prof plot(prof, digits = 4) plot(prof, parm = 2, digits = 3) # Doing this more quickly when we only care about the confidence limits prof <- confint(rl, profile = TRUE, mult = 32, faster = TRUE) plot(prof, digits = 3, type = "b") plot(prof, parm = 2, digits = 3, type = "b")
Fits a GEV distribution to block maxima using maximum likelihood estimation, making an adjustment for the locations of missing raw values in each block. The GEV location and scale parameters are adjusted to reflect the proportion of raw values that are missing and the time series dependence in the data.
gev_ts( data, block_length, block, pseudo = TRUE, sliding = FALSE, init = "quartiles", ... )gev_ts( data, block_length, block, pseudo = TRUE, sliding = FALSE, init = "quartiles", ... )
data |
Either
There must be at least one full block of data, that is, at least one block for which no data are missing. |
block_length |
A numeric scalar. Used calculate the maxima of disjoint
blocks of |
block |
A numeric vector with the same length as |
pseudo |
A logical scalar. If |
sliding |
A logical scalar. If |
init |
Either a character scalar, one of |
... |
Further arguments to be passed to |
If data is a numeric vector then exactly one of the arguments
block_length or block must be supplied if sliding = FALSE and only
block_length can be supplied if sliding = TRUE. The parameters are
fitted using maximum likelihood estimation.
The adjustment for the numbers of non-missing values underlying the block
maxima is based on the strong assumption that missing values occur
completely at random. We suppose that a block maximum based on
a full (no missing raw values) block of length has a
distribution, with distribution function
. Let be the number of non-missing values in block
and let denote the block maximum of such a block. We suppose
that, has a
distribution, where
for some .
These expressions are based on the having approximately a
GEV distribution with distribution function .
For a full block, . If pseudo = TRUE, then, for an
incomplete, partially-observed block, the value of is estimated
using the pseudo-maxima returned from block_maxima_ts and the GEV
distribution function based on the current value of
in the optimisation routine. Suppose that we have a vector of
pseudo-maxima resulting from a particular incomplete block . It can
be shown that the components of each have an
exponential distribution with mean . We estimate using
the reciprocal of the mean of the values in .
The negated log-likelihood is minimised using a call to
stats::optim with hessian = TRUE. If stats::optim throws an error
then a warning is produced and the returned object has NA values for
the components par, loglik, vcov and se and an extra component
optim_error containing the error message. If the estimated observed
information matrix is singular then a warning is produced and the returned
object has NA values for the components vcov and se.
A list, returned from stats::optim (the MLEs are in the
component par), with the additional components:
loglik: value of the maximised log-likelihood.
vcov: estimated variance-covariance matrix of the parameters.
se: estimated standard errors of the parameters.
maxima: the vector of block maxima used to fit the model.
notNA: the number of non-missing raw values on which the maxima in
maxima are based.
n: the maximal block length, that is, the largest number of values that
could have been observed in each of these blocks.
rvec: a vector of the values used for , where
is the number of blocks. The content depends on the argument pseudo.
rhats: if pseudo = TRUE, a vector of the subset of rvec for
partially-observed blocks only. The attributes "propn_notNA" and
"unconstrained" give, respectively, the values of for these
blocks and the estimates of before they are constrained to lie
in the interval .
sliding: the input argument sliding.
The call to gev_ts is provided in the attribute "call".
gev_mle provides an adjustment for missing data in the
case where the raw data can be assumed to be independent and identically
distributed.
set.seed(1632026) blocks <- 50 block_length <- 90 missing_args <- list(p0miss = 0.5, min = 0, max = 0.4) sdata <- sim_data(blocks = blocks, block_length = block_length, missing_args = missing_args) # Using disjoint blocks pt <- gev_ts(sdata$data_miss, block_length = 90, pseudo = TRUE) pf <- gev_ts(sdata$data_miss, block_length = 90, pseudo = FALSE) pf2 <- gev_mle(sdata$data_miss, block_length = 90) # Using sliding blocks ## Not run: pts <- gev_ts(sdata$data_miss, block_length = 90, pseudo = TRUE, sliding = TRUE) pfs <- gev_ts(sdata$data_miss, block_length = 90, pseudo = FALSE, sliding = TRUE) ## End(Not run)set.seed(1632026) blocks <- 50 block_length <- 90 missing_args <- list(p0miss = 0.5, min = 0, max = 0.4) sdata <- sim_data(blocks = blocks, block_length = block_length, missing_args = missing_args) # Using disjoint blocks pt <- gev_ts(sdata$data_miss, block_length = 90, pseudo = TRUE) pf <- gev_ts(sdata$data_miss, block_length = 90, pseudo = FALSE) pf2 <- gev_mle(sdata$data_miss, block_length = 90) # Using sliding blocks ## Not run: pts <- gev_ts(sdata$data_miss, block_length = 90, pseudo = TRUE, sliding = TRUE) pfs <- gev_ts(sdata$data_miss, block_length = 90, pseudo = FALSE, sliding = TRUE) ## End(Not run)
Fits a GEV distribution to block maxima using maximum likelihood estimation, with the option to make an adjustment for the numbers of non-missing raw values in each block using one of the two weighting schemes proposed in McVittie and Murphy (2025).
gev_weighted(data, scheme = 1, block_length, block, init = "quartiles", ...)gev_weighted(data, scheme = 1, block_length, block, init = "quartiles", ...)
data |
Either
|
scheme |
A numeric scalar. Pass |
block_length |
A numeric scalar. Used calculate the maxima of disjoint
blocks of |
block |
A numeric vector with the same length as |
init |
Either a character scalar, one of |
... |
Further arguments to be passed to |
Suppose that a full (no missing values) block will contain
raw values. Let be the number of non-missing values, and
the observed block maximum, in block .
The contribution of block maximum to the GEV log-likelihood is
weighted (multiplied) by the weight . The set of weights depends
on the weighting scheme chosen by scheme as follows.
If scheme = 1 then , for .
If scheme = 2 then , for
, where is the empirical distribution
function of .
For any other value of scheme all weights are set to 1, that is, an
unweighted fit is performed.
For each weighting scheme, the larger the number of missing
values the smaller the weight.
See McVittie and Murphy (2025) for further details.
A list, returned from stats::optim (the MLEs are in the
component par), with the additional components:
loglik: value of the maximised log-likelihood.
vcov: estimated variance-covariance matrix of the parameters.
se: estimated standard errors of the parameters.
maxima: the vector of block maxima used to fit the model.
notNA: the number of non-missing raw values on which the maxima in
maxima are based.
n: the maximal block length, that is, the largest number of values that
could have been observed in each of these blocks.
weights: the weights used in the fitting.
The call to gev_mle is provided in the attribute "call".
The class of the returned object is
c("evmissing", "weighted_mle", "list").
Objects inheriting from class "evmissing" have coef, logLik, nobs,
summary, vcov and confint methods. See evmissing_methods.
McVittie, J. H. and Murphy, O. A. (2025) Weighted Parameter Estimators of the Generalized Extreme Value Distribution in the Presence of Missing Observations. doi:10.48550/arXiv.2506.15964
## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially ## Fits to full data: fit0, fit 1 and fit2 should give the same results # Fit a GEV distribution to block maxima from the full data fit0 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit0) # Fit to the full data using weighting scheme 1 fit1 <- gev_weighted(sdata$data_full, scheme = 1, block_length = sdata$block_length) summary(fit1) # Fit to the full data using weighting scheme 2 fit2 <- gev_weighted(sdata$data_full, scheme = 2, block_length = sdata$block_length) summary(fit2) ## Fits to the data with missings # Make adjustment for the numbers of non-missing values per block fit0 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit0) # Make adjustment using weighting scheme 1 fit1 <- gev_weighted(sdata$data_miss, scheme = 1, block_length = sdata$block_length) summary(fit1) # Make adjustment using weighting scheme 2 fit2 <- gev_weighted(sdata$data_miss, scheme = 2, block_length = sdata$block_length) summary(fit2)## Simulate raw data from an exponential distribution set.seed(13032025) blocks <- 50 block_length <- 365 sdata <- sim_data(blocks = blocks, block_length = block_length) # sdata$data_full have no missing values # sdata$data_miss have had missing values created artificially ## Fits to full data: fit0, fit 1 and fit2 should give the same results # Fit a GEV distribution to block maxima from the full data fit0 <- gev_mle(sdata$data_full, block_length = sdata$block_length) summary(fit0) # Fit to the full data using weighting scheme 1 fit1 <- gev_weighted(sdata$data_full, scheme = 1, block_length = sdata$block_length) summary(fit1) # Fit to the full data using weighting scheme 2 fit2 <- gev_weighted(sdata$data_full, scheme = 2, block_length = sdata$block_length) summary(fit2) ## Fits to the data with missings # Make adjustment for the numbers of non-missing values per block fit0 <- gev_mle(sdata$data_miss, block_length = sdata$block_length) summary(fit0) # Make adjustment using weighting scheme 1 fit1 <- gev_weighted(sdata$data_miss, scheme = 1, block_length = sdata$block_length) summary(fit1) # Make adjustment using weighting scheme 2 fit2 <- gev_weighted(sdata$data_miss, scheme = 2, block_length = sdata$block_length) summary(fit2)
"block_maxima"
Plot method for objects inheriting from "block_maxima" returned from
block_maxima, block_maxima_ts or sliding_block_maxima_ts.
## S3 method for class 'block_maxima' plot(x, which = 1, ...)## S3 method for class 'block_maxima' plot(x, which = 1, ...)
x |
An object inheriting from class |
which |
If |
... |
Further arguments to |
When which = 1 we obtain a time series plot in which there are
periods where the value does not change. When which = 2 we expect to
see that the sliding block maximum tends to be smaller for blocks with a
larger proportion of missing values.
Nothing is returned.
### Plymouth Ozone Data ## Sliding maxima # Time series plots of sliding block maxima plot(PlymouthOzoneSlidingMaxima) # Plot maxima against the proportion of non-missing daily values plot(PlymouthOzoneSlidingMaxima, which = 2) ## Disjoint maxima bm <- block_maxima(PlymouthOzone$Ozone, block = PlymouthOzone$Year) # Time series plots of block maxima plot(bm) # Plot maxima against the proportion of non-missing daily values plot(bm, which = 2)### Plymouth Ozone Data ## Sliding maxima # Time series plots of sliding block maxima plot(PlymouthOzoneSlidingMaxima) # Plot maxima against the proportion of non-missing daily values plot(PlymouthOzoneSlidingMaxima, which = 2) ## Disjoint maxima bm <- block_maxima(PlymouthOzone$Ozone, block = PlymouthOzone$Year) # Time series plots of block maxima plot(bm) # Plot maxima against the proportion of non-missing daily values plot(bm, which = 2)
Daily maximum ozone levels at Plymouth in London (UK) for the years 1998-2024 inclusive.
PlymouthOzonePlymouthOzone
PlymouthOzone is a data frame with 9862 rows and the 3 variables:
Date: with class "Date" in the format YYYY-MM-DD.
Year: Values in 1998-2024.
Ozone: daily maximum ozone level in g/m.
The Department for Environment Food and Rural Affair (DEFRA). The Plymouth Centre monitoring site at the UK-AIR database Data Selector.
PlymouthOzoneMaxima for the annual maxima and numbers of
missing values per year.
head(PlymouthOzone) # Time series plot of annual maxima ozone levels plot(PlymouthOzone$Date, PlymouthOzone$Ozone, xlab = "year", ylab = "ozone (micrograms / metre cubed)", pch = 16)head(PlymouthOzone) # Time series plot of annual maxima ozone levels plot(PlymouthOzone$Date, PlymouthOzone$Ozone, xlab = "year", ylab = "ozone (micrograms / metre cubed)", pch = 16)
Annual maxima of daily maximum ozone levels at Plymouth in Devon (UK) for the years 1998-2024 inclusive.
PlymouthOzoneMaximaPlymouthOzoneMaxima
PlymouthOzoneMaxima is a data frame with 27 rows (years 1998 to
2024) and the 4 variables:
maxima: annual maximum ozone level in g/m.
notNA : the number of days of the year for which raw data were available.
n : the number of days in the year (365 or 366).
block : a block number of 1 for year 1998 through to 27 for year 2024.
The row names of PlymouthOzoneMaxima are the years 1998:2024.
The raw data are missing for approximately of the days.
The Department for Environment Food and Rural Affair (DEFRA). The Plymouth Centre monitoring site at the UK-AIR database Data Selector.
PlymouthOzone for the raw time series.
head(PlymouthOzoneMaxima) # Time series plot of annual maxima ozone levels plot(rownames(PlymouthOzoneMaxima), PlymouthOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(PlymouthOzoneMaxima), PlymouthOzoneMaxima$notNA / PlymouthOzoneMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot ozone levels against the proportion of non-missing days plot(PlymouthOzoneMaxima$notNA / PlymouthOzoneMaxima$n, PlymouthOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "proportion of non-missing days", pch = 16)head(PlymouthOzoneMaxima) # Time series plot of annual maxima ozone levels plot(rownames(PlymouthOzoneMaxima), PlymouthOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "year", pch = 16) # Time series plot of proportion of non-missing days plot(rownames(PlymouthOzoneMaxima), PlymouthOzoneMaxima$notNA / PlymouthOzoneMaxima$n, ylab = "proportion of non-missing days", xlab = "year", pch = 16) # Plot ozone levels against the proportion of non-missing days plot(PlymouthOzoneMaxima$notNA / PlymouthOzoneMaxima$n, PlymouthOzoneMaxima$maxima, ylab = "ozone (micrograms / metre cubed)", xlab = "proportion of non-missing days", pch = 16)
Maxima of sliding (overlapping) block of length 365 days of daily maximum ozone levels at Plymouth in Devon (UK) for the years 1998-2024 inclusive.
PlymouthOzoneSlidingMaximaPlymouthOzoneSlidingMaxima
PlymouthOzoneSlidingMaxima is an object returned from
sliding_block_maxima_ts, a list inheriting from class
sliding_block_maxima_ts. It was created using
sliding_block_maxima_ts(PlymouthOzone$Ozone, block_length = 365).
This includes components
full_maxima: a numeric vector of 130 full-block maxima, all equal to 143
g/m.
partial_maxima: a numeric vector of 9368 partial-block maxima.
maxima: a numeric vector of all maxima.
See sliding_block_maxima_ts for a description of the other components.
The Department for Environment Food and Rural Affair (DEFRA). The Plymouth Centre monitoring site at the UK-AIR database Data Selector.
PlymouthOzone for the raw time series.
# Time series plots of sliding block maxima plot(PlymouthOzoneSlidingMaxima) # Plot maxima against the proportion of non-missing daily values plot(PlymouthOzoneSlidingMaxima, which = 2)# Time series plots of sliding block maxima plot(PlymouthOzoneSlidingMaxima) # Plot maxima against the proportion of non-missing daily values plot(PlymouthOzoneSlidingMaxima, which = 2)
"return_level"
Methods for objects of class "return_level" returned from gev_return.
## S3 method for class 'return_level' coef(object, ...) ## S3 method for class 'return_level' print(x, ...) ## S3 method for class 'return_level' vcov(object, ...) ## S3 method for class 'return_level' summary(object, digits = max(3, getOption("digits") - 3L), ...) ## S3 method for class 'summary.return_level' print(x, ...) ## S3 method for class 'return_level' confint( object, parm = 1:length(object), level = 0.95, profile = FALSE, mult = 2, faster = FALSE, epsilon = 1e-04, ... )## S3 method for class 'return_level' coef(object, ...) ## S3 method for class 'return_level' print(x, ...) ## S3 method for class 'return_level' vcov(object, ...) ## S3 method for class 'return_level' summary(object, digits = max(3, getOption("digits") - 3L), ...) ## S3 method for class 'summary.return_level' print(x, ...) ## S3 method for class 'return_level' confint( object, parm = 1:length(object), level = 0.95, profile = FALSE, mult = 2, faster = FALSE, epsilon = 1e-04, ... )
object, x
|
An object inheriting from class For |
... |
Further arguments. Only used for |
digits |
An integer. Passed to |
parm |
A numeric vector. For which components, that is, which return
levels, in |
level |
The confidence level required. A numeric scalar in (0, 1). |
profile |
A logical scalar. If |
mult |
A positive numeric scalar. Controls the increment by which the
parameter of interest is increased/decreased when profiling above/below
its MLE. The increment is |
faster |
A logical scalar. If |
epsilon |
Only relevant if
|
For confint.return_level, the default, epsilon = -1, should
work well enough in most circumstances, but to achieve a specific accuracy
set epsilon to be a small positive value, for example, epsilon = 1e-4.
print.return_level and coef.return_level: a numeric vector
containing the MLEs of return return levels.
vcov.return_level: a length(object) by length(object) matrix with
row and column names indicating the return periods of the return levels.
The estimated variance-covariance matrix for the return levels in
object. The diagonal elements give the estimated variances associated
with the individual return level estimates.
summary.return_level: an object containing the original function call
and a matrix of estimates of return levels and associated estimated
standard errors with row names indicating the respective return periods.
The object is printed by print.summary.return_level.
print.summary.return_level: the argument x is returned, invisibly.
confint.return_level: an object of class
c("confint_return_level", "evmissing"). A numeric matrix with 2 columns
giving the lower and upper confidence limits for each return level. These
columns are labelled as (1-level)/2 and 1-(1-level)/2, expressed as a
percentage, by default 2.5% and 97.5%. The row names indicate the
return levels.
If profile = TRUE then the returned object has extra attributes crit,
level and for_plot. The latter is a named list of length parm with
components named after the return periods. Each components is a 2-column
numeric matrix. The first column contains values of the return level and
the second column the corresponding values of the profile log-likelihood.
The profile log-likelihood is equal to the attribute crit at the limits
of the confidence interval. The attribute level is the input argument
level.
gev_mle and gev_return, for examples of the use of
confint.return_level.
## Plymouth ozone data # See ?gev_return for confidence intervals for return levels fit <- gev_mle(PlymouthOzoneMaxima) rl <- gev_return(fit, m = c(100, 200)) rl vcov(rl) summary(rl)## Plymouth ozone data # See ?gev_return for confidence intervals for return levels fit <- gev_mle(PlymouthOzoneMaxima) rl <- gev_return(fit, m = c(100, 200)) rl vcov(rl) summary(rl)
Simulates data from a user-supplied distribution and creates missing values
artificially. Functions mcar and mcar2 provides an example mechanisms
for doing this based on a Missing Completely At Random (MCAR) assumption.
sim_data( blocks = 50, block_length = 365, distn = "exp", missing_fn = mcar, missing_args = formals(missing_fn)$missing_args, ... ) mcar( sim_data, blocks, block_length, missing_args = list(p0miss = 0, min = 0, max = 0.5) ) mcar2(sim_data, missing_args = list(pmiss = 0.5))sim_data( blocks = 50, block_length = 365, distn = "exp", missing_fn = mcar, missing_args = formals(missing_fn)$missing_args, ... ) mcar( sim_data, blocks, block_length, missing_args = list(p0miss = 0, min = 0, max = 0.5) ) mcar2(sim_data, missing_args = list(pmiss = 0.5))
blocks |
A numeric scalar. The number of blocks of data required.
Usually, this will be a positive integer, but |
block_length |
A numeric scalar. The number of raw observations per block. |
distn |
A character scalar. Specifies the distribution from which raw
data are simulated. The name in the |
missing_fn |
A function to simulate the positions of the missing values within each block year. See Details. |
missing_args |
Arguments to be passed to |
... |
Further arguments to the function |
sim_data |
A numeric vector of raw observations into, some of which will be made missing. |
The function missing_fn must return a, possibly empty,
subset of c(1, 2, ..., block_length). This function is applied within
each simulated block, independently of other blocks.
The default function mcar simulates the numbers of missing values in the
blocks as follows.
A proportion p0miss of the blocks have no missing values.
In the other blocks, the number of missing values is
ceiling(prop_miss * block_length), where prob_miss is a value
simulated from a Uniform(min, max) distribution. The positions of
these missing values within the block is random.
The function mcar2 identifies at random a proportion pmiss of the
simulated raw observations to become missing.
Care may need to be taken if these simulated data are used as input to
gev_mle using an approach that discards block maxima based on more
than a certain percentage of missing values, that is, with discard > 0.
For example, using the default argument blocks = 50 and
missing_fn = mcar, with its default missing_args, may result
in a sample size of retained block maxima that contains insufficient
information to make reliable inferences, leading to difficulties finding
an appropriate MLE for the shape parameter and/or a singular
observed information matrix.
If blocks > 0, a list with the following components:
data_full: simulated raw data with no missing values.
data_miss: simulated data after missing values have been created.
blocks, block_length: the respective input values of blocks and
block_length.
block: a block indicator vector, suitable as an argument to gev_mle.
distn: the input argument distn.
distn_args: further arguments to stats::rxxx supplied via ....
If blocks = 0, a list containing all the inputs arguments.
# Using missing_fn = mcar sdata <- sim_data() # Using missing_fn = mcar2 sdata2 <- sim_data(missing_fn = mcar2)# Using missing_fn = mcar sdata <- sim_data() # Using missing_fn = mcar2 sdata2 <- sim_data(missing_fn = mcar2)
Extracts sliding block maxima and missing value information for each block.
Works like block_maxima_ts but returns information for sliding
(overlapping) blocks rather than disjoint blocks and defines blocks using
only block length, that is, there is no block argument.
sliding_block_maxima_ts(data, block_length)sliding_block_maxima_ts(data, block_length)
data |
A numeric vector containing a time series of raw data. |
block_length |
A numeric scalar. Used calculate the maxima of sliding
blocks of |
The maxima are calculated for all blocks of length
block_length present in data, starting with the first block
data[1:block_length] and sliding the block repeatedly by one observation
until reaching the final block
data[(length(data) - block_length + 1):length(data)].
A list, with class
c("list", "block_maxima", "sliding", "evmissing"), containing the
following components:
maxima: the block maxima.
notNA: the numbers of non-missing observations in each block.
n: the maximal block length, that is, the largest number of values that
could have been observed in each block.
whereNA: a named list containing, for each block, the positions of any
missing values in the block. For example, if (only) the first and fifth
observations in block 3 are missing then the third component (named
block3) of whereNA is c(1, 5). If a block has no missing values
then its component in whereNA is integer(0).
pseudo_maxima: a numeric matrix containing (pseudo) block maxima
created by applying the missing value patterns from incomplete blocks to
all full blocks, that is, blocks without any missing values. Each column
contains the pseudo-maxima resulting from a particular incomplete block.
The columns are labelled by the number of the incomplete block and the
columns by the number of the full block. If an incomplete block contains
all missing values then its entry in pseudo_maxima is NA. If there
are no full blocks or no incomplete blocks then pseudo_maxima is NA.
full_maxima: a numeric vector of maxima from full blocks.
partial_maxima: a numeric vector of maxima from partial blocks.
If a block contains only missing values then its value of maxima is NA,
its value of notNA is 0 and whereNA contains the positions of all the
observations in the block.
If block is supplied then these vectors are named using the values in
block. Otherwise, these vectors do not have names.
## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) block_length <- 3 x <- sliding_block_maxima_ts(data, block_length = block_length)## Simulate example data set.seed(7032025) data <- rexp(15) # Create some missing values data[c(5, 7:8)] <- NA # 5 blocks (columns), each with 3 observations matrix(data, ncol = 5) block_length <- 3 x <- sliding_block_maxima_ts(data, block_length = block_length)