Package 'smovie'

Title: Some Movies to Illustrate Concepts in Statistics
Description: Provides movies to help students to understand statistical concepts. The 'rpanel' package <https://cran.r-project.org/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the median, the sample maximum (extremal types theorem) and (the Fisher transformation of the) product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution.
Authors: Paul J. Northrop [aut, cre, cph]
Maintainer: Paul J. Northrop <[email protected]>
License: GPL (>= 2)
Version: 1.1.6
Built: 2025-01-27 05:26:12 UTC
Source: https://github.com/paulnorthrop/smovie

Help Index


smovie: some movies to illustrate concepts in statistics

Description

These movies are animations used to illustrate key statistical ideas. They are produced using the rpanel package.

Details

When one of these functions is called R opens up a small parameter window containing clickable buttons that can be used to change parameters underlying the plot. For the effects of these buttons see the documentation of the individual functions.

See vignette("smovie-vignette", package = "smovie") for an overview of the package and the user-friendly menu panel.

There are movies on the following topics.

Probability distributions

Sampling distributions

Regression

Hypothesis testing

Author(s)

Maintainer: Paul J. Northrop [email protected] [copyright holder]

References

Bowman, A., Crawford, E., Alexander, G. and Bowman, R. W. (2007). rpanel: Simple Interactive Controls for R Functions Using the tcltk Package. Journal of Statistical Software, 17(9), 1-18. doi:10.18637/jss.v017.i09.

See Also

Useful links:


Central Limit Theorem (CLT)

Description

A movie to illustrate the ideas of the sampling distribution of a mean and the central limit theorem.

Usage

clt(
  n = 20,
  distn,
  params = list(),
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  n_add = 1,
  delta_n = 1,
  arrow = TRUE,
  leg_cex = 1.25,
  ...
)

Arguments

n

An integer scalar. The size of the samples drawn from the distribution chosen using distn.

distn

A character scalar specifying the distribution from which observations are sampled. Distributions "beta", "binomial", "chisq", "chi-squared", "exponential", "f", "gamma", "geometric", "gev", "gp", "hypergeometric", "lognormal", "log-normal", "negative binomial", "normal", "poisson", "t", "uniform" and "weibull" are recognised, case being ignored.

If distn is not supplied then distn = "exponential" is used.

The "gev" and "gp" cases use the gev and gp distributional functions in the revdbayes package.

The other cases use the distributional functions in the stats-package. If distn = "gamma" then the (shape, rate) parameterisation is used. If scale is supplied via params then rate is inferred from this. If distn = "negative binomial" then the (size, prob) parameterisation is used. If mu is supplied via params then prob is inferred from this (and size). If distn = "beta" then ncp is forced to be zero.

params

A named list of additional arguments to be passed to the density function associated with distribution distn. The (shape, rate) parameterisation is used for the gamma distribution (see GammaDist) even if the value of the scale parameter is set using params.

If a parameter value is not supplied then the default values in the relevant distributional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2), "chisq" (df = 4), "f" (df1 = 4, df2 = 8), "gev" (shape = 0.2). "gamma" (shape = 2, "gp" (shape = 0.1), "poisson" (lambda = 5) and "t" (df = 4) and "weibull" (shape = 2).

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add

An integer scalar. The number of simulated datasets to add to each new frame of the movie.

delta_n

A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window.

arrow

A logical scalar. Should an arrow be included to show the simulated sample mean from the top plot being placed into the bottom plot?

leg_cex

The argument cex to legend. Allows the size of the legend to be controlled manually.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the Central Limit Theorem is that the mean of a large number of independent and identically distributed random variables, each with mean μ\mu and finite standard deviation σ\sigma, has approximately a normal distribution, even if these original variables are not normally distributed.

This movie considers examples where this limiting result holds and illustrates graphically the closeness of the limiting approximation provided by the relevant normal limit to the true finite-nn distribution. Of course, when distn = "normal" this result is exact.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samples are summarized using a plot that appears at the top of the movie screen. For each sample the mean of these n values is calculated, stored and added to another plot, situated below the first plot. This plot is either a histogram or an empirical c.d.f., chosen using a radio button. A rug is added to a histogram provided that it contains no more than 1000 points.

The p.d.f. (for a continuous variable) or p.m.f. (for a discrete variable) of the original variables is added to the top plot.

Once it starts, four aspects of this movie are controlled by the user.

  • There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a mean is calculated.

  • Each time the button labelled "simulate another n_add samples of size n" is clicked n_add new samples are simulated and their sample mean are added to the bottom histogram.

  • There is a button to switch the bottom plot from displaying a histogram of the simulated means and the limiting normal p.d.f. to the empirical c.d.f. of the simulated data and the limiting normal c.d.f.

  • There is a checkbox to add to the bottom plot the approximate (large n) normal p.d.f./c.d.f. (with mean μ\mu and standard deviation σ/n\sigma / \sqrt{n}), implied by the CLT.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

cltq: Central Limit Theorem for sample quantiles.

Examples

# Exponential data
clt()

# Uniform data
clt(distn = "uniform")

# Poisson data
clt(distn = "poisson")

Central Limit Theorem (CLT) for sample quantiles

Description

A movie to illustrate the ideas of the sampling distribution of the sample 100pp% quantile and the central limit theorem for sample quantiles.

Usage

cltq(
  n = 20,
  p = 0.5,
  distn,
  params = list(),
  type = 7,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  n_add = 1,
  delta_n = 1,
  arrow = TRUE,
  leg_cex = 1.25,
  ...
)

Arguments

n

An integer scalar. The size of the samples drawn from the distribution chosen using distn.

p

A numeric scalar in (0, 1). The value of pp.

distn

A character scalar specifying the (continuous) distribution from which observations are sampled. Distributions "beta", "chisq", "chi-squared", "exponential", "f", "gamma", "gev", "gp", "lognormal", "log-normal", "normal", "t", "uniform" and "weibull" are recognised, case being ignored.

If distn is not supplied then distn = "exponential" is used.

The "gev" and "gp" cases use the gev and gp distributional functions in the revdbayes package.

The other cases use the distributional functions in the stats-package. If distn = "gamma" then the (shape, rate) parameterisation is used. If scale is supplied via params then rate is inferred from this. If distn = "beta" then ncp is forced to be zero.

params

A named list of additional arguments to be passed to the density function associated with distribution distn. The (shape, rate) parameterisation is used for the gamma distribution (see GammaDist) even if the value of the scale parameter is set using params.

If a parameter value is not supplied then the default values in the relevant distributional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2), "chisq" (df = 4), "f" (df1 = 4, df2 = 8), "gev" (shape = 0.2). "gamma" (shape = 2, "gp" (shape = 0.1), "t" (df = 4) and "weibull" (shape = 2).

type

An integer between 1 and 9. The value of the argument type to be passed to quantile to when calculating a sample quantile.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add

An integer scalar. The number of simulated datasets to add to each new frame of the movie.

delta_n

A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window.

arrow

A logical scalar. Should an arrow be included to show the simulated sample quantile from the top plot being placed into the bottom plot?

leg_cex

The argument cex to legend. Allows the size of the legend to be controlled manually.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the CLT for sample quantiles is that the 100pp% sample quantile of a large number of identically distributed random variables, each with probability density function ff and 100pp% quantile ξ(p)\xi(p), has approximately a normal distribution. See, for example, Lehmann (1999) for a precise statement and conditions.

This movie considers examples where this limiting result holds and illustrates graphically the closeness of the limiting approximation provided by the relevant normal limit to the true finite-nn distribution.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samples are summarized using a plot that appears at the top of the movie screen. For each sample the 100pp% sample quantile of these n values is calculated, stored and added to another plot, situated below the first plot. This plot is either a histogram or an empirical c.d.f., chosen using a radio button. A rug is added to a histogram provided that it contains no more than 1000 points.

The p.d.f. of the original variables is added to the top plot.

Once it starts, four aspects of this movie are controlled by the user.

  • There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values for which a sample quantile is calculated.

  • Each time the button labelled "simulate another n_add samples of size n" is clicked n_add new samples are simulated and their sample quantile are added to the bottom histogram.

  • There is a button to switch the bottom plot from displaying a histogram of the simulated sample quantiles and the limiting normal p.d.f. to the empirical c.d.f. of the simulated data and the limiting normal c.d.f.

  • There is a checkbox to add to the bottom plot the approximate (large nn) normal p.d.f./c.d.f. implied by the CLT for sample quantiles: the mean is equal to ξ(p)\xi(p) and standard deviation is equal to pq/nf(ξ(p))\sqrt p \sqrt q / n f(\xi(p)), where q=1pq = 1-p.

Value

Nothing is returned, only the animation is produced.

References

Lehman, E. L. (1999) Elements of Large-Sample Theory, Springer-Verlag, London. doi:10.1007/b98855

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

clt: Central Limit Theorem.

Examples

# Exponential data
cltq()

# Uniform data
cltq(distn = "t", params = list(df = 2))

Univariate Continuous Distributions: p.d.f and c.d.f.

Description

A movie to illustrate how the probability density function (p.d.f.) and cumulative distribution function (c.d.f.) of a continuous random variable depend on the values of its parameters.

Usage

continuous(
  distn,
  var_range = NULL,
  params = list(),
  param_step = list(),
  param_range = list(),
  p_vec = NULL,
  smallest = 0.01,
  plot_par = list(),
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  ...
)

Arguments

distn

Either a character string or a function to choose the continuous random variable.

Strings "beta", "cauchy", "chisq" "chi-squared", "exponential", "f", "gamma", "gev", "gp", "lognormal", "log-normal", "normal", "t", "uniform" and "weibull" are recognised, case being ignored. The relevant distributional functions dxxx and pxxx in the stats-package are used. The abbreviations xxx are also recognised. The "gev" and "gp" cases use the gev and gp distributional functions in the revdbayes package. If distn = "gamma" then the (shape, rate) parameterisation is used, unless a value for scale is provided via the argument params when the (shape, scale) parameterisation is used.

Valid functions are set up like a standard distributional function dxxx, with first argument x, last argument log and with arguments to set the parameters of the distribution in between. See the CRAN task view on distributions.

If distn is not supplied then distn = "normal" is used.

var_range

A numeric vector of length 2. Can be used to set a fixed range of values over which to plot the p.d.f. and c.d.f., in order better to see the effects of changing the parameter values. If var_range is set then it overrides p_vec (see below).

params

A named list of initial parameter values with which to start the movie. If distn is a string and a particular parameter value is not supplied then the following values are used. "beta": shape1 = 2, shape2 = 2, ncp = 0; "cauchy": location = 0, scale = 1; "chi-squared": df = 4, ncp = 0; "exponential": rate = 1; "f": df1 = 4, df2 = 8, ncp =0; "gamma": shape = 2, rate = 1; "gev": loc = 0, scale = 1, shape = 0.1; "gp": loc = 0, scale = 1, shape = 0.1; "lognormal": meanlog = 0, sdlog = 1; "normal": mean = 0, sd = 1; "t": df = 4, ncp = 0; "uniform": min = 0, max = 1; "weibull": shape = 2, scale = 1.

If distn is a function then params must set any required parameters.

If parameter value is outside the corresponding range specified by param_range then it is set to the closest limit of the range.

param_step

A named list of the amounts by which the respective parameters in params are increased/decreased after one click of the +/- button. If distn is a function then the default is 0.1 for all parameters. If distn is a string then a sensible distribution-specific default is set internally.

param_range

A named list of the ranges over which the respective parameters in params are allowed to vary. Each element of the list should be a vector of length 2: the first element gives the lower limit of the range, the second element the upper limit. Use NA to impose no limit. If distn is a function then all parameters are unconstrained.

p_vec

A numeric vector of length 2. The p.d.f. and c.d.f. are plotted between the 100p_vec[1]% and 100p_vec[2]% quantiles of the distribution. If p_vec is not supplied then a sensible distribution-specific default is used. If distn is a function then the default is p_vec = c(0.001, 0.999).

smallest

A positive numeric scalar. The smallest value to be used for any strictly positive parameters when distn is a string.

plot_par

A named list of graphical parameters (see par) to be passed to plot. This may be used to alter the appearance of the plots of the p.m.f. and c.d.f.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

...

Additional arguments to be passed to rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

The movie starts with a plot of the p.d.f. of the distribution for the initial values of the parameters. Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot from the p.d.f. to the c.d.f. and back.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Normal example
continuous()
# Fix the range of values over which to plot
continuous(var_range = c(-10, 10))

# The same example, but using a user-supplied function and setting manually
# the initial parameters, parameter step size and range
continuous(distn = dnorm, params = list(mean = 0, sd = 1),
           param_step = list(mean = 1, sd = 1),
           param_range = list(sd = c(0, NA)))

# Gamma distribution. Show the use of var_range
continuous(distn = "gamma", var_range = c(0, 15))

Sampling distribution of the correlation coefficient movie

Description

A movie to illustrate how the sampling distribution of the product moment sample correlation coefficient rr depends on the sample size nn and on the true correlation ρ\rho.

Usage

correlation(
  n = 30,
  rho = 0,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  delta_n = 1,
  delta_rho = 0.1,
  ...
)

Arguments

n

An integer scalar. The initial value of the sample size. Must not be less than 2.

rho

A numeric scalar. The initial value of the true correlation ρ\rho. Must be in [-1, 1].

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_n

An integer scalar. The amount by which the value of the sample size is increased/decreased after one click of the +/- button.

delta_rho

A numeric scalar. The amount by which the value of rho is increased/decreased after one click of the +/- button.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

Random samples of size nn are simulated from a bivariate normal distribution in which each of the variables has a mean of 0 and a variance of 1 and the correlation ρ\rho between the variables is chosen by the user.

The movie contains two plots. On the top is a scatter plot of the simulated sample, illustrating the strength of the association between the simulated values of the variables. A new sample is produced by clicking "simulate another sample. For each simulated sample the sample (product moment) correlation coefficient rr is calculated and displayed in the title of the top plot.

The values of the sample correlation coefficients are stored and are plotted in a histogram in the bottom plot. A rug displays the individual values, with the most recent value coloured red. As we accumulate a large number of values in this histogram the shape of the sampling distribution of rr emerges. The exact p.d.f. of rr is superimposed on this histogram, as is the value of ρ\rho.

The bottom plot can be changed in two ways: (i) a radio button can be pressed to replace the histogram and pdf with a plot of the empirical c.d.f. and exact cdf; (ii) the variable can be changed from ρ\rho to Fisher's z-transformation F(ρ)=arctanh(ρ)=[ln(1+ρ)ln(1ρ)]/2F(\rho) = arctanh(\rho) = [ln(1+\rho) - ln(1-\rho)]/2. For sufficiently large values of nn, F(ρ)F(\rho) has approximately a normal distribution with mean ρ\rho and variance 1/(n3)1 / (n - 3).

The values of the sample size nn or true correlation coefficient ρ\rho can be changed using the respective +/- buttons. If one of these is changed then the bottom plot is reset using the sample correlation coefficient of the first sample simulated using the new combination of nn and ρ\rho.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

correlation(rho = 0.8)
correlation(n = 10)

Univariate Discrete Distributions: p.m.f and c.d.f.

Description

A movie to illustrate how the probability mass function (p.m.f.) and cumulative distribution function (c.d.f.) of a discrete random variable depend on the values of its parameters.

Usage

discrete(
  distn,
  var_support = NULL,
  params = list(),
  param_step = list(),
  param_range = list(),
  p_vec = NULL,
  smallest = 0.01,
  plot_par = list(),
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  observed_value = NA,
  ...
)

Arguments

distn

Either a character string or a function to choose the discrete random variable.

Strings "binomial", "geometric", "hypergeometric", "negative binomial" and "poisson" are recognised, case being ignored. The relevant distributional functions dxxx and pxxx in the stats-package are used. The abbreviations xxx are also recognised. If distn = "negative binomial" then the (size, prob) parameterisation is used, unless a value for mu is provided via the argument params when the (size, mu) parameterisation is used.

Valid functions are set up like a standard distributional function dxxx, with first argument x, last argument log and with arguments to set the parameters of the distribution in between. See the CRAN task view on distributions. It is assumed that the support of the random variable is a subset of the integers, unless var_support is set to the contrary.

If distn is not supplied then distn = "binomial" is used.

var_support

A numeric vector. Can be used to set a fixed set of values for which to plot the p.m.f. and c.d.f., in order better to see the effects of changing the parameter values or to set a support that isn't a subset of the integers. If var_support is set then it overrides p_vec (see below).

params

A named list of initial parameter values with which to start the movie. If distn is a string and a particular parameter value is not supplied then the following values are used. "binomial": size = 10, prob = 0.5; "geometric": prob = 0.5; "hypergeometric": m = 10, n = 7, k = 8; "negative binomial": size = 10, prob = 0.5; "poisson": lambda = 5.

If distn is a function then params must set any required parameters.

If parameter value is outside the corresponding range specified by param_range then it is set to the closest limit of the range.

param_step

A named list of the amounts by which the respective parameters in params are increased/decreased after one click of the +/- button. If distn is a function then the default is 0.1 for all parameters. If distn is a string then a sensible distribution-specific default is set internally.

param_range

A named list of the ranges over which the respective parameters in params are allowed to vary. Each element of the list should be a vector of length 2: the first element gives the lower limit of the range, the second element the upper limit. Use NA to impose no limit. If distn is a function then all parameters are unconstrained.

p_vec

A numeric vector of length 2. The p.d.f. and c.d.f. are plotted between the 100p_vec[1]% and 100p_vec[2]% quantiles of the distribution. If p_vec is not supplied then a sensible distribution-specific default is used. If distn is a function then the default is p_vec = c(0.001, 0.999).

smallest

A positive numeric scalar. The smallest value to be used for any strictly positive parameters when distn is a string.

plot_par

A named list of graphical parameters (see par) to be passed to plot. This may be used to alter the appearance of the plots of the p.m.f. and c.d.f.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

observed_value

A non-negative integer. If observed_value is supplied then the corresponding line in the plot of the p.m.f. is coloured in red.

...

Additional arguments to be passed to rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

The movie starts with a plot of the p.m.f. of the distribution for the initial values of the parameters. Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot from the p.m.f. to the c.d.f. and back.

If distn == "geometric" then there are radio buttons to switch between the version of the the geometric distribution based on the number of trials up to including the first success and the number of failures until the first success.

Owing to a conflict with the argument size of the function rp.control the parameter size of, for example, the binomial and negative binomial distributions, is labelled as n.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Binomial example
discrete()

# The same example, but using a user-supplied function and setting manually
# the initial parameters, parameter step size and range
discrete(distn = dbinom, params = list(size = 10, prob = 0.5),
         param_step = list(size = 1),
         param_range = list(size = c(1, NA), prob = c(0, 1)))

# Poisson distribution. Show the use of var_support
discrete(distn = "poisson", var_support = 0:20)

Extremal Types Theorem (ETT)

Description

A movie to illustrate the extremal types theorem, that is, convergence of the distribution of the maximum of a random sample of size nn from certain distributions to a member of the Generalized Extreme Value (GEV) family, as nn tends to infinity. Samples of size nn are simulated repeatedly from the chosen distribution. The distributions (simulated empirical and true) of the sample maxima are compared to the relevant GEV limit.

Usage

ett(
  n = 20,
  distn,
  params = list(),
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  n_add = 1,
  delta_n = 1,
  arrow = TRUE,
  leg_cex = 1.25,
  ...
)

Arguments

n

An integer scalar. The size of the samples drawn from the distribution chosen using distn. n must be no smaller than 2.

distn

A character scalar specifying the distribution from which observations are sampled. Distributions "beta", "cauchy", "chisq", "chi-squared", "exponential", "f", "gamma", "gp", "lognormal", "log-normal", "ngev", "normal", "t", "uniform" and "weibull" are recognised, case being ignored.

If distn is not supplied then distn = "exponential" is used.

The "gp" case uses the gp distributional functions in the revdbayes package.

The "ngev" case is a negated GEV(1 / ξ\xi, 1, ξ\xi) distribution, for ξ\xi > 0, and uses the gev distributional functions in the revdbayes package. If ξ\xi = 1 then this coincides with Example 1.7.5 in Leadbetter, Lindgren and Rootzen (1983).

The other cases use the distributional functions in the stats-package. If distn = "gamma" then the (shape, rate) parameterisation is used. If scale is supplied via params then rate is inferred from this. If distn = "beta" then ncp is forced to be zero.

params

A named list of additional arguments to be passed to the density function associated with distribution distn. The (shape, rate) parameterisation is used for the gamma distribution (see GammaDist) even if the value of the scale parameter is set using params.

If a parameter value is not supplied then the default values in the relevant distributional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2), "chisq" (df = 4), "f" (df1 = 4, df2 = 8), "ngev" (shape = 0.2). "gamma" (shape = 2, "gp" (shape = 0.1), "t" (df = 4) and "weibull" (shape = 2).

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add

An integer scalar. The number of simulated datasets to add to each new frame of the movie.

delta_n

A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window.

arrow

A logical scalar. Should an arrow be included to show the simulated sample maximum from the top plot being placed into the bottom plot?

leg_cex

The argument cex to legend. Allows the size of the legend to be controlled manually.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the Extremal Types Theorem is that, in many situations, the maximum of a large number nn of independent random variables has approximately a GEV(μ,σ,ξ)\mu, \sigma, \xi)) distribution, where μ\mu is a location parameter, σ\sigma is a scale parameter and ξ\xi is a shape parameter. See Coles (2001) for an introductory account and Leadbetter et al (1983) for greater detail and more examples. The Extremal Types Theorem is an asymptotic result that considers the possible limiting distribution of linearly normalised maxima as nn tends to infinity. This movie considers examples where this limiting result holds and illustrates graphically the closeness of the limiting approximation provided by the relevant GEV limit to the true finite-nn distribution.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samples are summarized using a histogram that appears at the top of the movie screen. For each sample the maximum of these n values is calculated, stored and added to another plot, situated below the first plot. A rug is added to a histogram provided that it contains no more than 1000 points. This plot is either a histogram or an empirical c.d.f., chosen using a radio button.

The probability density function (p.d.f.) of the original variables is superimposed on the top histogram. There is a checkbox to add to the bottom plot the exact p.d.f./c.d.f. of the sample maxima and an approximate (large n) GEV p.d.f./c.d.f. implied by the ETT. The GEV shape parameter ξ\xi that applies in the limiting case is used. The GEV location μ\mu and scale σ\sigma are set based on constants used to normalise the maxima to achieve the GEV limit. Specifically, μ\mu is set at the 100(1-1/nn)% quantile of the distribution distn and σ\sigma at (1 / nn) / f(μ)f(\mu), where ff is the density function of the distribution distn.

Once it starts, four aspects of this movie are controlled by the user.

  • There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a maximum is calculated.

  • Each time the button labelled "simulate another n_add samples of size n" is clicked n_add new samples are simulated and their sample maxima are added to the bottom histogram.

  • There is a button to switch the bottom plot from displaying a histogram of the simulated maxima, the exact p.d.f. and the limiting GEV p.d.f. to the empirical c.d.f. of the simulated data, the exact c.d.f. and the limiting GEV c.d.f.

  • There is a box that can be used to display only the bottom plot. This option is selected automatically if the sample size nn exceeds 100000.

  • There is a box that can be used to display only the bottom plot. This option is selected automatically if the sample size nn exceeds 100000.

For further detail about the examples specified by distn see Chapter 1 of Leadbetter et al. (1983) and Chapter 3 of Coles (2001). In many of these examples ("exponential", "normal", "gamma", "lognormal", "chi-squared", "weibull", "ngev") the limiting GEV distribution has a shape parameter that is equal to 0. In the "uniform" case the limiting shape parameter is -1 and in the "beta" case it is -1 / shape2, where shape2 is the second parameter of the Beta distribution. In the other cases the limiting shape parameter is positive, with respective values shape ("gp", see gp), 1 / df ("t", see TDist), 1 ("cauchy", see Cauchy), 2 / df2 ("f", see FDist).

Value

Nothing is returned, only the animation is produced.

References

Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. doi:10.1007/978-1-4471-3675-0_3

Leadbetter, M., Lindgren, G. and Rootzen, H. (1983) Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. doi:10.1007/978-1-4612-5449-2

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Exponential data: xi = 0
ett()

# Uniform data: xi =-1
ett(distn = "uniform")

# Student t data: xi = 1 / df
ett(distn = "t", params = list(df = 5))

Fisher's transformation of the product moment correlation coefficient

Description

Density, distribution function, quantile function and random generator for the distribution of Fisher's transformation of product moment correlation, based on a random sample from a bivariate normal distribution

Usage

dFcorr(x, N, rho = 0, log = FALSE)

pFcorr(q, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

qFcorr(p, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

rFcorr(n, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

Arguments

x, q

Numeric vectors of quantiles.

N

Numeric vector. Number of observations, (N > 3).

rho

Numeric vector. Population correlations, (-1 < rho < 1).

log, log.p

A logical scalar; if TRUE, probabilities p are given as log(p).

lower.tail

A logical scalar. If TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].

p

A numeric vector of probabilities in [0,1].

n

Numeric scalar. The number of observations to be simulated. If length(n) > 1 then length(n) is taken to be the number required.

Details

These functions rely on the correlation coefficient functions in the SuppDists package. SuppDists must be installed in order for these functions to work.

References

Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika, 10(4), 507-521.

Fisher, R. A. (1921). On the "probable error" of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32. https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15169/1/14.pdf

See Also

correlation coefficient in the SuppDists package for dpqr functions for the untransformed product moment correlation coefficient.

correlation: correlation sampling distribution movie.

Examples

got_SuppDists <- requireNamespace("SuppDists", quietly = TRUE)

if (got_SuppDists) {
  dFcorr(-1:1, N = 10)
  dFcorr(0, N = 11:20)

  pFcorr(0.5, N = 10)
  pFcorr(0.5, N = 10, rho = c(0, 0.3))

  qFcorr((1:9)/10, N = 10, rho = 0.2)
  qFcorr(0.5, N = c(10, 20), rho = c(0, 0.3))

  rFcorr(6, N = 10, rho = 0.6)
}

Leverage and influence in simple linear regression movie

Description

A movie to examine the influence of a single outlying observation on a least squares regression line.

Usage

lev_inf(
  association = c("positive", "negative", "none"),
  n = 25,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale
)

Arguments

association

A character scalar. Determines the type of association between (not-outlying) observations: "positive" for positive linear association; "negative" negative linear association; "none" for no association.

n

An integer scalar. The size of the sample of (non-outlying) observations.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

Details

n pairs of observations are simulated with the property that the mean of response variable yy is a linear function of the values of the explanatory variable xx. These pairs of observations are plotted using filled black circles. An extra observation is plotted using a filled red circle. Initially this observation is placed in the middle of the plot.

Superimposed on the plot are two least squares regression lines: one based on all the data ('with observation') and one in which the 'red' observation has been removed ('without observation'). Initially these lines coincide.

The location of the ‘red’ observation can be changed using the +/- buttons so that the effect of the position of this observation on the ‘with observation’ line can be seen.

We see that if the red observation is outlying, that is, it is far from the least squares line fitted to the other observations, then its influence on the least squares regression line depends on its x-coordinate. If its x-coordinate is much larger or smaller than the x-coordinate of the other observations (high leverage) then the influence is higher than if it has a similar x-coordinate to the other observations (low leverage). An observation with high leverage does not necessarily have high influence: if its y-coordinate falls very close to the regression line fitted to the other observations then its influence will be low.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Positive association
lev_inf()

# No association
lev_inf(association = "none")

Sample mean vs sample median

Description

A movie to compare the sampling distributions of the sample mean and sample median based on a random sample of size nn from either a standard normal distribution or a standard Student's tt distribution. An interesting comparison is between the normal and Student t with 2 degrees of freedom cases (see Examples).

Usage

mean_vs_median(
  n = 10,
  t_df = NULL,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  n_add = 1,
  delta_n = 1,
  arrow = TRUE,
  leg_cex = 1.75,
  ...
)

Arguments

n

An integer scalar. The size of the samples drawn from a standard normal distribution.

t_df

A positive scalar. The degrees of freedom df of a Student t distribution, as in TDist. If t_df is not supplied then data are simulated from a standard normal distribution.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add

An integer scalar. The number of simulated datasets to add to each new frame of the movie.

delta_n

A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window.

arrow

A logical scalar. Should an arrow be included to show the simulated sample maximum from the top plot being placed into the bottom plot?

leg_cex

The argument cex to legend. Allows the size of the legend to be controlled manually.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

The movie is based on simulating repeatedly samples of size n from either a standard normal N(0,1) distribution or a standard Student t distribution. The latter is selected by supplying the degrees of freedom of this distribution, using t_df. The movie contains three plots. The top plot contains a histogram of the most recently simulated dataset, with the relevant probability density function (p.d.f.) superimposed. A rug is added to a histogram provided that it contains no more than 1000 points.

Each time a sample is simulated the sample mean and sample median are calculated. These values are indicated on the top plot using an arrow (if arrow = TRUE) or a vertical (rug) line on the horizontal axis (arrow = FALSE), coloured red for the sample mean and blue for the sample median. If arrow = TRUE then the arrows show the positionings of most recent mean and median in the two plots below. If arrow = FALSE then the rug lines are replicated in these plots.

The plot in the middle contains a histogram of the sample means of all the simulated samples. The plot on the bottom contains a histogram of the sample medians of all the simulated samples. A rug is added to these histograms provided that they contains no more than 1000 points.

Once it starts, three aspects of this movie are controlled by the user.

  • There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a maximum is calculated.

  • Each time the button labelled "simulate another n_add samples of size n" is clicked n_add new samples are simulated and their sample mean are added to the bottom histogram.

  • For the N(0,1) case only, there is a checkbox to add to the bottom plot the p.d.f.s of the distribution of the sample mean and the (approximate, large n) distribution of the sample median.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Sampling from a standard normal distribution
mean_vs_median()

# Sampling from a standard t(2) distribution
mean_vs_median(t_df = 2)

Main menu for smovie movies

Description

Uses the template rp.cartoons function to produce a menu panel from which any of the movies in smovie package can be launched. For greater control of an individual example call the relevant function directly.

Usage

movies(fixed_range = TRUE, hscale = NA, vscale = hscale)

Arguments

fixed_range

A logical scalar. Only relevant to the Discrete and Continuous menus. If TRUE then in the call to discrete or continuous the argument var_support (discrete) or var_range (continuous) is set so that the values on the horizontal axes are fixed at values that enable the movie to show the effects of changing the parameters of the distribution, at least locally to the default initial values for the parameters. For greater control call discrete or continuous directly.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

See Also

discrete, continuous, clt, cltq, ett, mean_vs_median, correlation, lev_inf, wws, shypo.

smovie: general information about smovie.

Examples

movies()

Testing simple hypotheses

Description

A movie to illustrate statistical concepts involved in the testing of one simple hypothesis against another. The example used is a random sample from a normal distribution whose variance is assumed to be known. The simple hypotheses relate to the value of the mean μ\mu.

Usage

shypo(
  mu0 = 0,
  sd = 6,
  eff = sd,
  n = 10,
  a = mu0 + eff/2,
  target_alpha = 0.05,
  target_beta = 0.1,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  delta_n = 1,
  delta_a = sd/(10 * sqrt(n)),
  delta_eff = sd,
  delta_mu0 = 1,
  delta_sd = 1
)

Arguments

mu0

A numeric scalar. The value of μ\mu under the null hypothesis H0 with which to start the movie.

sd

A positive numeric scalar. The (common) standard deviation σ\sigma of the normal distributions of the data under the two hypotheses.

eff

A numeric scalar. The effect size. The amount by which the value of μ\mu under the alternative hypothesis is greater than the value mu0 under the null hypothesis. That is, mu1 = eff + mu0. eff must be non-negative.

n

A positive integer scalar. The sample size with which to start the movie.

a

A numeric scalar. The critical value of the test with which to start the movie. H0 is rejected if the sample mean is greater than a.

target_alpha

A numeric scalar in (0,1). The target value of the type I error to be achieved by setting a and/or n if the user asks for this using a radio button.

target_beta

A numeric scalar in (0,1). The target value of the type II error to be achieved by setting a and/or n if the user asks for this using a radio button.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_mu0, delta_eff, delta_a, delta_n, delta_sd

Numeric scalars. The respective amounts by which the values of mu0, eff, a, n and sd are increased (or decreased) after one click of the + (or -) button in the parameter window.

Details

The movie is based on two plots.

The top plot shows the (normal) probability density functions of the sample mean under the null hypothesis H0 (mean mu0) and the alternative hypothesis H1 (mean mu1, where mu1 > mu0), with the values of mu0 and mu1 indicated by vertical dashed lines. H0 is rejected if the sample mean exceeds the critical value a, which is indicated by a vertical black line.

The bottom plot shows how the probabilities of making a type I or type II error (alpha and beta respectively) depend on the value of a, by plotting these probabilities against a.

A parameter window enables the user to change the values of n, a, mu0, eff = mu1 - mu0 or sd by clicking the +/- buttons.

Radio buttons can be used either to:

  • set a to achieve the target type I error probability target_alpha, based on the current value of n;

  • set a and (integer) n to achieve (or better) the respective target type I and type II error probabilities of target_alpha and target_beta.

If eff = 0 then a plot will be produced even though this case is not practically meaningful. In the "set a and n to achieve target alpha and beta" case, the plot will be the same as the case "set a and n by hand" case.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# 1. Change a (for fixed n) to achieve alpha = 0.05
# 2. Change a and n to achieve alpha <= 0.05 and beta <= 0.1
shypo(mu0 = 0, eff = 5, n = 16, a = 2.3, delta_a = 0.01)

Wald, Wilks and Score tests

Description

A movie to illustrate the nature of the Wald, Wilks and score likelihood-based test statistics, for a model with a scalar unknown parameter θ\theta. The user can change the value of the parameter under a simple null hypothesis and observe the effect on the test statistics and (approximate) p-values associated with the tests of this hypothesis against the general alternative. The user can specify their own log-likelihood or use one of two in-built examples.

Usage

wws(
  model = c("norm", "binom"),
  theta_range = NULL,
  ...,
  mult = 3,
  theta0 = if (!is.null(theta_range)) sum(c(0.25, 0.75) * theta_range) else NULL,
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  delta_theta0 = if (!is.null(theta_range)) abs(diff(theta_range))/20 else NULL,
  theta_mle = NULL,
  loglik = NULL,
  alg_score = NULL,
  alg_obs_info = NULL,
  digits = 3
)

Arguments

model

A character scalar. Name of the the distribution on which one of two in-built examples are based.

If model = "norm" then the setting is a random sample of size n from a normal distribution with unknown mean mu = θ\theta and known standard deviation sigma.

If model = "binom" then the setting is a random sample from a Bernoulli distribution with unknown success probability θ\theta.

The behaviour of these examples can be controlled using arguments supplied via .... In particular, the data can be supplied using data. If model = "norm" then n, mu, and sigma can also be chosen. The default cases for these examples are:

  • model = "norm": n = 10, mu = 0, sigma = 1 and data contains a sample of a sample of size n simulated, using Normal, from a normal distribution with mean mu and standard deviation sigma.

  • model = "binom": data = c(7, 13), that is, 7 successes and 13 failures observed in 20 trials. For the purposes of this movie there must be at least one success and at least one failure.

theta_range

A numeric vector of length 2. The range of values of θ\theta over which to plot the log-likelihood. If theta_range is not supplied then the argument mult is used to set the range automatically.

...

Additional arguments to be passed to loglik, alg_score and alg_obs_info if loglik is supplied, or to functions functions relating to the in-built examples otherwise. See the description of model above for details.

mult

A positive numeric scalar. If theta_range is not supplied then an interval of width 2 x mult standard errors centred on theta_mle is used. If model = "binom" then theta_range is truncated to (0,1) if necessary.

theta0

A numeric scalar. The value of θ\theta under the null hypothesis to use at the start of the movie.

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_theta0

A numeric scalar. The amount by which the value of theta0 is increased (or decreased) after one click of the + (or -) button in the parameter window.

theta_mle

A numeric scalar. The user may use this to supply the value of the maximum likelihood estimate (MLE) of θ\theta. Otherwise, optim is used to search for the MLE, using theta0 as the initial value and theta_range as bounds within which to search.

loglik

An R function, vectorised with respect to its first argument, that returns the value of the log-likelihood (up to an additive constant). The movie will not work if the observed information is not finite at the maximum likelihood estimate.

alg_score

A R function that returns the score function, that is, the derivative of loglik with respect to θ\theta.

alg_obs_info

A R function that returns the observed information that is, the negated second derivative of loglik with respect to θ\theta.

digits

An integer indicating the number of significant digits to be used in the displayed values of the test statistics and p-values. See signif.

Details

The Wald, Wilks (or likelihood ratio) and Score tests are asymptotically equivalent tests of a simple hypothesis that a parameter of interest θ\theta is equal to a particular value θ0\theta_0. The test statistics are all based on the log-likelihood l(θl(\theta for θ\theta but they differ in the way that they measure the distance between the maximum likelihood estimate (MLE) of θ\theta and θ0\theta_0. The Wilks statistic is the amount by which the log-likelihood evaluated θ0\theta_0 is smaller than the log-likelihood evaluated at the MLE. The Walk statistics is based on the absolute difference between the MLE and θ0\theta_0. The score test is based on the gradient of the log-likelihood (the score function) at θ0\theta_0. For details see Azzalini (1996).

This movie illustrates the differences between the test statistics for simple models with a single scalar parameter. In the (default) normal example the three test statistics coincide. This is not true in general, as shown by the other in-built example (distn = "binom").

A user-supplied log-likelihood can be provided via loglik.

Value

Nothing is returned, only the animation is produced.

References

Azzalini, A. (1996) Statistical Inference Based on the Likelihood, Chapman & Hall / CRC, London.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# N(theta, 1) example, test statistics equivalent
wws(theta0 = 0.8)

# binomial(20, theta) example, test statistics similar
wws(theta0 = 0.5, model = "binom")

# binomial(20, theta) example, test statistic rather different
# for theta0 distant from theta_mle
wws(theta0 = 0.9, model = "binom", data = c(19, 1), theta_range = c(0.1, 0.99))

# binomial(2000, theta) example, test statistics very similar
wws(theta0 = 0.5, model = "binom", data = c(1000, 1000))

set.seed(47)
x <- rnorm(10)
wws(theta0 = 0.2, model = "norm", theta_range = c(-1, 1))

# Log-likelihood for a binomial experiment (up to an additive constant)
bin_loglik <- function(p, n_success, n_failure) {
  return(n_success * log(p) + n_failure * log(1 - p))
}

wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7),
    theta_mle = 7 / 20, n_success = 7, n_failure = 13)

bin_alg_score <- function(p, n_success, n_failure) {
  return(n_success / p - n_failure / (1 - p))
}
bin_alg_obs_info <- function(p, n_success, n_failure) {
  return(n_success / p ^ 2 + n_failure / (1 - p) ^ 2)
}
wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7),
    theta_mle = 7 / 20, n_success = 7, n_failure = 13,
    alg_score = bin_alg_score, alg_obs_info = bin_alg_obs_info)