Title: | Some Movies to Illustrate Concepts in Statistics |
---|---|
Description: | Provides movies to help students to understand statistical concepts. The 'rpanel' package <https://cran.r-project.org/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the median, the sample maximum (extremal types theorem) and (the Fisher transformation of the) product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution. |
Authors: | Paul J. Northrop [aut, cre, cph] |
Maintainer: | Paul J. Northrop <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.6 |
Built: | 2025-01-27 05:26:12 UTC |
Source: | https://github.com/paulnorthrop/smovie |
These movies are animations used to illustrate key statistical ideas.
They are produced using the rpanel
package.
When one of these functions is called R opens up a small parameter window containing clickable buttons that can be used to change parameters underlying the plot. For the effects of these buttons see the documentation of the individual functions.
See vignette("smovie-vignette", package = "smovie")
for an overview
of the package and the user-friendly menu panel.
There are movies on the following topics.
Central Limit Theorem: sampling distribution of a sample mean
Central Limit Theorem for sample quantiles: sampling distribution of the 100p% sample quantile
Extremal Types Theorem: sampling distribution of a sample maximum
Maintainer: Paul J. Northrop [email protected] [copyright holder]
Bowman, A., Crawford, E., Alexander, G. and Bowman, R. W. (2007). rpanel: Simple Interactive Controls for R Functions Using the tcltk Package. Journal of Statistical Software, 17(9), 1-18. doi:10.18637/jss.v017.i09.
Useful links:
Report bugs at https://github.com/paulnorthrop/smovie/issues
A movie to illustrate the ideas of the sampling distribution of a mean and the central limit theorem.
clt( n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
clt( n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
n |
An integer scalar. The size of the samples drawn from the
distribution chosen using |
distn |
A character scalar specifying the distribution from which
observations are sampled. Distributions If The The other cases use the distributional functions in the
|
params |
A named list of additional arguments to be passed to the
density function associated with distribution If a parameter value is not supplied then the default values in the
relevant distributional function set using |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
n_add |
An integer scalar. The number of simulated datasets to add to each new frame of the movie. |
delta_n |
A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window. |
arrow |
A logical scalar. Should an arrow be included to show the simulated sample mean from the top plot being placed into the bottom plot? |
leg_cex |
The argument |
... |
Additional arguments to the rpanel functions
|
Loosely speaking, a consequence of the
Central Limit Theorem
is that the mean of a large number of independent and
identically distributed random variables, each with mean and
finite standard deviation
, has approximately a
normal distribution, even if these original variables are not normally
distributed.
This movie considers examples where this limiting result holds and
illustrates graphically the closeness of the limiting approximation
provided by the relevant normal limit to the true finite-
distribution. Of course, when
distn = "normal"
this result is
exact.
Samples of size n
are repeatedly simulated from the distribution
chosen using distn
. These samples are summarized using a plot
that appears at the top of the movie screen. For each sample the mean
of these n
values is calculated, stored and added to another plot,
situated below the first plot.
This plot is either a histogram or an empirical c.d.f., chosen using a
radio button.
A rug
is added to a histogram provided that it
contains no more than 1000 points.
The p.d.f. (for a continuous variable) or p.m.f. (for a discrete variable) of the original variables is added to the top plot.
Once it starts, four aspects of this movie are controlled by the user.
There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a mean is calculated.
Each time the button labelled "simulate another n_add
samples of size n" is clicked n_add
new samples are simulated
and their sample mean are added to the bottom histogram.
There is a button to switch the bottom plot from displaying a histogram of the simulated means and the limiting normal p.d.f. to the empirical c.d.f. of the simulated data and the limiting normal c.d.f.
There is a checkbox to add to the bottom plot the approximate
(large n
) normal p.d.f./c.d.f. (with mean and
standard deviation
), implied by the CLT.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
cltq
: Central Limit Theorem for sample quantiles.
# Exponential data clt() # Uniform data clt(distn = "uniform") # Poisson data clt(distn = "poisson")
# Exponential data clt() # Uniform data clt(distn = "uniform") # Poisson data clt(distn = "poisson")
A movie to illustrate the ideas of the sampling distribution of the
sample 100% quantile and the central limit theorem for sample
quantiles.
cltq( n = 20, p = 0.5, distn, params = list(), type = 7, panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
cltq( n = 20, p = 0.5, distn, params = list(), type = 7, panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
n |
An integer scalar. The size of the samples drawn from the
distribution chosen using |
p |
A numeric scalar in (0, 1). The value of |
distn |
A character scalar specifying the (continuous) distribution
from which observations are sampled. Distributions If The The other cases use the distributional functions in the
|
params |
A named list of additional arguments to be passed to the
density function associated with distribution If a parameter value is not supplied then the default values in the
relevant distributional function set using |
type |
An integer between 1 and 9. The value of the argument
|
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
n_add |
An integer scalar. The number of simulated datasets to add to each new frame of the movie. |
delta_n |
A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window. |
arrow |
A logical scalar. Should an arrow be included to show the simulated sample quantile from the top plot being placed into the bottom plot? |
leg_cex |
The argument |
... |
Additional arguments to the rpanel functions
|
Loosely speaking, a consequence of the CLT for sample quantiles
is that the 100% sample quantile of a large number of
identically distributed random variables, each with probability density
function
and 100
% quantile
, has
approximately a normal distribution. See, for example,
Lehmann (1999) for a precise statement and conditions.
This movie considers examples where this limiting result holds and
illustrates graphically the closeness of the limiting approximation
provided by the relevant normal limit to the true finite-
distribution.
Samples of size n
are repeatedly simulated from the distribution
chosen using distn
. These samples are summarized using a plot
that appears at the top of the movie screen. For each sample the
100% sample quantile of these
n
values is calculated,
stored and added to another plot, situated below the first plot.
This plot is either a histogram or an empirical c.d.f., chosen using a
radio button.
A rug
is added to a histogram provided that it
contains no more than 1000 points.
The p.d.f. of the original variables is added to the top plot.
Once it starts, four aspects of this movie are controlled by the user.
There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values for which a sample quantile is calculated.
Each time the button labelled "simulate another n_add
samples of size n" is clicked n_add
new samples are simulated
and their sample quantile are added to the bottom histogram.
There is a button to switch the bottom plot from displaying a histogram of the simulated sample quantiles and the limiting normal p.d.f. to the empirical c.d.f. of the simulated data and the limiting normal c.d.f.
There is a checkbox to add to the bottom plot the approximate
(large ) normal p.d.f./c.d.f. implied by the CLT for sample
quantiles: the mean is equal to
and standard deviation is
equal to
, where
.
Nothing is returned, only the animation is produced.
Lehman, E. L. (1999) Elements of Large-Sample Theory, Springer-Verlag, London. doi:10.1007/b98855
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
clt
: Central Limit Theorem.
# Exponential data cltq() # Uniform data cltq(distn = "t", params = list(df = 2))
# Exponential data cltq() # Uniform data cltq(distn = "t", params = list(df = 2))
A movie to illustrate how the probability density function (p.d.f.) and cumulative distribution function (c.d.f.) of a continuous random variable depend on the values of its parameters.
continuous( distn, var_range = NULL, params = list(), param_step = list(), param_range = list(), p_vec = NULL, smallest = 0.01, plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, ... )
continuous( distn, var_range = NULL, params = list(), param_step = list(), param_range = list(), p_vec = NULL, smallest = 0.01, plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, ... )
distn |
Either a character string or a function to choose the continuous random variable. Strings Valid functions are set up like a standard distributional function
If |
var_range |
A numeric vector of length 2. Can be used to set a fixed
range of values over which to plot the p.d.f. and c.d.f., in order better
to see the effects of changing the parameter values.
If |
params |
A named list of initial parameter values with which to start
the movie. If If If parameter value is outside the corresponding range specified by
|
param_step |
A named list of the amounts by which the respective
parameters in |
param_range |
A named list of the ranges over which the respective
parameters in |
p_vec |
A numeric vector of length 2. The p.d.f. and c.d.f. are
plotted between the 100 |
smallest |
A positive numeric scalar. The smallest value to be
used for any strictly positive parameters when |
plot_par |
A named list of graphical parameters
(see |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
... |
Additional arguments to be passed to
|
The movie starts with a plot of the p.d.f. of the distribution for the initial values of the parameters. Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot from the p.d.f. to the c.d.f. and back.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# Normal example continuous() # Fix the range of values over which to plot continuous(var_range = c(-10, 10)) # The same example, but using a user-supplied function and setting manually # the initial parameters, parameter step size and range continuous(distn = dnorm, params = list(mean = 0, sd = 1), param_step = list(mean = 1, sd = 1), param_range = list(sd = c(0, NA))) # Gamma distribution. Show the use of var_range continuous(distn = "gamma", var_range = c(0, 15))
# Normal example continuous() # Fix the range of values over which to plot continuous(var_range = c(-10, 10)) # The same example, but using a user-supplied function and setting manually # the initial parameters, parameter step size and range continuous(distn = dnorm, params = list(mean = 0, sd = 1), param_step = list(mean = 1, sd = 1), param_range = list(sd = c(0, NA))) # Gamma distribution. Show the use of var_range continuous(distn = "gamma", var_range = c(0, 15))
A movie to illustrate how the sampling distribution of the product moment
sample correlation coefficient depends on the sample size
and on the true correlation
.
correlation( n = 30, rho = 0, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_n = 1, delta_rho = 0.1, ... )
correlation( n = 30, rho = 0, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_n = 1, delta_rho = 0.1, ... )
n |
An integer scalar. The initial value of the sample size. Must not be less than 2. |
rho |
A numeric scalar. The initial value of the true correlation
|
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
delta_n |
An integer scalar. The amount by which the value of the sample size is increased/decreased after one click of the +/- button. |
delta_rho |
A numeric scalar. The amount by which the value of rho is increased/decreased after one click of the +/- button. |
... |
Additional arguments to the rpanel functions
|
Random samples of size are simulated from a
bivariate normal distribution
in which each of the variables has a mean of 0 and a variance of 1 and
the correlation
between the variables is chosen by the user.
The movie contains two plots. On the top is a scatter plot of the
simulated sample, illustrating the strength of the association between
the simulated values of the variables.
A new sample is produced by clicking "simulate another sample.
For each simulated sample the sample (product moment) correlation
coefficient is calculated and displayed in the title of the top
plot.
The values of the sample correlation coefficients are stored and are
plotted in a histogram in the bottom plot. A rug displays the individual
values, with the most recent value coloured red. As we accumulate a large
number of values in this histogram the shape of the sampling
distribution of emerges. The exact p.d.f. of
is
superimposed on this histogram, as is the value of
.
The bottom plot can be changed in two ways:
(i) a radio button can be pressed to replace the histogram and pdf with
a plot of the empirical c.d.f. and exact cdf;
(ii) the variable can be changed from to Fisher's
z-transformation
.
For sufficiently large values of
,
has approximately
a normal distribution with mean
and variance
.
The values of the sample size or true correlation coefficient
can be changed using the respective +/- buttons.
If one of these is changed then the bottom plot is
reset using the sample correlation coefficient of the first sample
simulated using the new combination of
and
.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
correlation(rho = 0.8) correlation(n = 10)
correlation(rho = 0.8) correlation(n = 10)
A movie to illustrate how the probability mass function (p.m.f.) and cumulative distribution function (c.d.f.) of a discrete random variable depend on the values of its parameters.
discrete( distn, var_support = NULL, params = list(), param_step = list(), param_range = list(), p_vec = NULL, smallest = 0.01, plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, observed_value = NA, ... )
discrete( distn, var_support = NULL, params = list(), param_step = list(), param_range = list(), p_vec = NULL, smallest = 0.01, plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, observed_value = NA, ... )
distn |
Either a character string or a function to choose the discrete random variable. Strings Valid functions are set up like a standard distributional function
If |
var_support |
A numeric vector. Can be used to set a fixed set of
values for which to plot the p.m.f. and c.d.f., in order better
to see the effects of changing the parameter values or to set a support
that isn't a subset of the integers.
If |
params |
A named list of initial parameter values with which to start
the movie. If If If parameter value is outside the corresponding range specified by
|
param_step |
A named list of the amounts by which the respective
parameters in |
param_range |
A named list of the ranges over which the respective
parameters in |
p_vec |
A numeric vector of length 2. The p.d.f. and c.d.f. are
plotted between the 100 |
smallest |
A positive numeric scalar. The smallest value to be
used for any strictly positive parameters when |
plot_par |
A named list of graphical parameters
(see |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
observed_value |
A non-negative integer. If |
... |
Additional arguments to be passed to
|
The movie starts with a plot of the p.m.f. of the distribution for the initial values of the parameters. Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot from the p.m.f. to the c.d.f. and back.
If distn == "geometric"
then there are radio buttons to switch
between the version of the the geometric distribution based on the
number of trials up to including the first success and the number of
failures until the first success.
Owing to a conflict with the argument size
of the function
rp.control
the parameter size
of,
for example, the binomial and negative binomial distributions, is
labelled as n
.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# Binomial example discrete() # The same example, but using a user-supplied function and setting manually # the initial parameters, parameter step size and range discrete(distn = dbinom, params = list(size = 10, prob = 0.5), param_step = list(size = 1), param_range = list(size = c(1, NA), prob = c(0, 1))) # Poisson distribution. Show the use of var_support discrete(distn = "poisson", var_support = 0:20)
# Binomial example discrete() # The same example, but using a user-supplied function and setting manually # the initial parameters, parameter step size and range discrete(distn = dbinom, params = list(size = 10, prob = 0.5), param_step = list(size = 1), param_range = list(size = c(1, NA), prob = c(0, 1))) # Poisson distribution. Show the use of var_support discrete(distn = "poisson", var_support = 0:20)
A movie to illustrate the extremal types theorem, that is, convergence
of the distribution of the maximum of a random sample of size
from certain distributions to a member of the Generalized Extreme Value
(GEV) family, as
tends to infinity.
Samples of size
are simulated repeatedly from the chosen
distribution. The distributions (simulated empirical and true) of the
sample maxima are compared to the relevant GEV limit.
ett( n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
ett( n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.25, ... )
n |
An integer scalar. The size of the samples drawn from the
distribution chosen using |
distn |
A character scalar specifying the distribution from which
observations are sampled. Distributions If The The The other cases use the distributional functions in the
|
params |
A named list of additional arguments to be passed to the
density function associated with distribution If a parameter value is not supplied then the default values in the
relevant distributional function set using |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
n_add |
An integer scalar. The number of simulated datasets to add to each new frame of the movie. |
delta_n |
A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window. |
arrow |
A logical scalar. Should an arrow be included to show the simulated sample maximum from the top plot being placed into the bottom plot? |
leg_cex |
The argument |
... |
Additional arguments to the rpanel functions
|
Loosely speaking, a consequence of the
Extremal Types Theorem
is that, in many situations, the maximum of a large number
of independent random variables has approximately a
GEV(
) distribution, where
is a location
parameter,
is a scale parameter and
is a shape
parameter. See Coles (2001) for an introductory account and
Leadbetter et al (1983) for greater detail and more examples.
The Extremal Types Theorem is an asymptotic result that considers the
possible limiting distribution of linearly normalised maxima
as
tends to infinity.
This movie considers examples where this limiting result holds and
illustrates graphically the closeness of the limiting approximation
provided by the relevant GEV limit to the true finite-
distribution.
Samples of size n
are repeatedly simulated from the distribution
chosen using distn
. These samples are summarized using a histogram
that appears at the top of the movie screen. For each sample the maximum
of these n
values is calculated, stored and added to another plot,
situated below the first plot.
A rug
is added to a histogram provided that it
contains no more than 1000 points.
This plot is either a histogram or an empirical c.d.f., chosen using a
radio button.
The probability density function (p.d.f.) of the original
variables is superimposed on the top histogram.
There is a checkbox to add to the bottom plot the exact p.d.f./c.d.f. of
the sample maxima and an approximate (large n
) GEV p.d.f./c.d.f.
implied by the ETT.
The GEV shape parameter that applies in the limiting
case is used. The GEV location
and scale
are set based on constants used to normalise the maxima
to achieve the GEV limit.
Specifically,
is set at the 100(1-1/
)% quantile of the
distribution
distn
and at
(1 /
) /
, where
is the
density function of the distribution
distn
.
Once it starts, four aspects of this movie are controlled by the user.
There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a maximum is calculated.
Each time the button labelled "simulate another n_add
samples of size n" is clicked n_add
new samples are simulated
and their sample maxima are added to the bottom histogram.
There is a button to switch the bottom plot from displaying a histogram of the simulated maxima, the exact p.d.f. and the limiting GEV p.d.f. to the empirical c.d.f. of the simulated data, the exact c.d.f. and the limiting GEV c.d.f.
There is a box that can be used to display only the bottom
plot. This option is selected automatically if the sample size
exceeds 100000.
There is a box that can be used to display only the bottom
plot. This option is selected automatically if the sample size
exceeds 100000.
For further detail about the examples specified by distn
see Chapter 1 of Leadbetter et al. (1983) and Chapter 3 of
Coles (2001). In many of these examples
("exponential"
, "normal"
, "gamma"
,
"lognormal"
, "chi-squared"
, "weibull"
, "ngev"
)
the limiting GEV distribution has a shape
parameter that is equal to 0. In the "uniform"
case the limiting
shape parameter is -1 and in the "beta"
case it is
-1 / shape2
, where shape2
is the
second parameter of the Beta
distribution.
In the other cases the limiting shape parameter is positive,
with respective values shape
("gp"
, see gp
),
1 / df
("t"
, see TDist
),
1 ("cauchy"
, see Cauchy
),
2 / df2
("f"
, see FDist
).
Nothing is returned, only the animation is produced.
Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag, London. doi:10.1007/978-1-4471-3675-0_3
Leadbetter, M., Lindgren, G. and Rootzen, H. (1983) Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. doi:10.1007/978-1-4612-5449-2
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# Exponential data: xi = 0 ett() # Uniform data: xi =-1 ett(distn = "uniform") # Student t data: xi = 1 / df ett(distn = "t", params = list(df = 5))
# Exponential data: xi = 0 ett() # Uniform data: xi =-1 ett(distn = "uniform") # Student t data: xi = 1 / df ett(distn = "t", params = list(df = 5))
Density, distribution function, quantile function and random generator for the distribution of Fisher's transformation of product moment correlation, based on a random sample from a bivariate normal distribution
dFcorr(x, N, rho = 0, log = FALSE) pFcorr(q, N, rho = 0, lower.tail = TRUE, log.p = FALSE) qFcorr(p, N, rho = 0, lower.tail = TRUE, log.p = FALSE) rFcorr(n, N, rho = 0, lower.tail = TRUE, log.p = FALSE)
dFcorr(x, N, rho = 0, log = FALSE) pFcorr(q, N, rho = 0, lower.tail = TRUE, log.p = FALSE) qFcorr(p, N, rho = 0, lower.tail = TRUE, log.p = FALSE) rFcorr(n, N, rho = 0, lower.tail = TRUE, log.p = FALSE)
x , q
|
Numeric vectors of quantiles. |
N |
Numeric vector. Number of observations, (N > 3). |
rho |
Numeric vector. Population correlations, (-1 < rho < 1). |
log , log.p
|
A logical scalar; if TRUE, probabilities p are given as log(p). |
lower.tail |
A logical scalar. If TRUE (default), probabilities are P[X <= x], otherwise, P[X > x]. |
p |
A numeric vector of probabilities in [0,1]. |
n |
Numeric scalar. The number of observations to be simulated.
If |
These functions rely on the
correlation coefficient
functions in the SuppDists package. SuppDists must be installed in order
for these functions to work.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika, 10(4), 507-521.
Fisher, R. A. (1921). On the "probable error" of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32. https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15169/1/14.pdf
correlation coefficient
in the
SuppDists package for dpqr functions for the untransformed product moment
correlation coefficient.
correlation
: correlation sampling distribution movie.
got_SuppDists <- requireNamespace("SuppDists", quietly = TRUE) if (got_SuppDists) { dFcorr(-1:1, N = 10) dFcorr(0, N = 11:20) pFcorr(0.5, N = 10) pFcorr(0.5, N = 10, rho = c(0, 0.3)) qFcorr((1:9)/10, N = 10, rho = 0.2) qFcorr(0.5, N = c(10, 20), rho = c(0, 0.3)) rFcorr(6, N = 10, rho = 0.6) }
got_SuppDists <- requireNamespace("SuppDists", quietly = TRUE) if (got_SuppDists) { dFcorr(-1:1, N = 10) dFcorr(0, N = 11:20) pFcorr(0.5, N = 10) pFcorr(0.5, N = 10, rho = c(0, 0.3)) qFcorr((1:9)/10, N = 10, rho = 0.2) qFcorr(0.5, N = c(10, 20), rho = c(0, 0.3)) rFcorr(6, N = 10, rho = 0.6) }
A movie to examine the influence of a single outlying observation on a least squares regression line.
lev_inf( association = c("positive", "negative", "none"), n = 25, panel_plot = TRUE, hscale = NA, vscale = hscale )
lev_inf( association = c("positive", "negative", "none"), n = 25, panel_plot = TRUE, hscale = NA, vscale = hscale )
association |
A character scalar. Determines the type of association between (not-outlying) observations: "positive" for positive linear association; "negative" negative linear association; "none" for no association. |
n |
An integer scalar. The size of the sample of (non-outlying) observations. |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
n
pairs of observations are simulated with the property
that the mean of response variable is a linear function of the
values of the explanatory variable
. These pairs of observations
are plotted using filled black circles. An extra observation is plotted
using a filled red circle. Initially this observation is placed in the
middle of the plot.
Superimposed on the plot are two least squares regression lines: one based on all the data ('with observation') and one in which the 'red' observation has been removed ('without observation'). Initially these lines coincide.
The location of the ‘red’ observation can be changed using the +/- buttons so that the effect of the position of this observation on the ‘with observation’ line can be seen.
We see that if the red observation is outlying, that is, it is far from the least squares line fitted to the other observations, then its influence on the least squares regression line depends on its x-coordinate. If its x-coordinate is much larger or smaller than the x-coordinate of the other observations (high leverage) then the influence is higher than if it has a similar x-coordinate to the other observations (low leverage). An observation with high leverage does not necessarily have high influence: if its y-coordinate falls very close to the regression line fitted to the other observations then its influence will be low.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# Positive association lev_inf() # No association lev_inf(association = "none")
# Positive association lev_inf() # No association lev_inf(association = "none")
A movie to compare the sampling distributions of the sample mean
and sample median based on a random sample of size from
either a standard normal distribution or a standard Student's
distribution. An interesting comparison is between the normal
and Student t with 2 degrees of freedom cases (see Examples).
mean_vs_median( n = 10, t_df = NULL, panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.75, ... )
mean_vs_median( n = 10, t_df = NULL, panel_plot = TRUE, hscale = NA, vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE, leg_cex = 1.75, ... )
n |
An integer scalar. The size of the samples drawn from a standard normal distribution. |
t_df |
A positive scalar. The degrees of freedom |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
n_add |
An integer scalar. The number of simulated datasets to add to each new frame of the movie. |
delta_n |
A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window. |
arrow |
A logical scalar. Should an arrow be included to show the simulated sample maximum from the top plot being placed into the bottom plot? |
leg_cex |
The argument |
... |
Additional arguments to the rpanel functions
|
The movie is based on simulating repeatedly samples of size
n
from either a standard normal N(0,1) distribution or a standard
Student t distribution. The latter is selected by supplying the degrees
of freedom of this distribution, using t_df
. The movie contains
three plots. The top plot contains a histogram of the most recently
simulated dataset, with the relevant probability density function (p.d.f.)
superimposed. A rug
is added to a histogram
provided that it contains no more than 1000 points.
Each time a sample is simulated the sample mean and sample median are
calculated. These values are indicated on the top plot using an
arrow (if arrow = TRUE
) or a vertical (rug) line on the horizontal
axis (arrow = FALSE
), coloured red for the sample mean and blue for
the sample median.
If arrow = TRUE
then the arrows show the positionings of most
recent mean and median in the two plots below. If arrow = FALSE
then the rug lines are replicated in these plots.
The plot in the middle contains a histogram of
the sample means of all the simulated samples.
The plot on the bottom contains a histogram of
the sample medians of all the simulated samples.
A rug
is added to these histograms
provided that they contains no more than 1000 points.
Once it starts, three aspects of this movie are controlled by the user.
There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a maximum is calculated.
Each time the button labelled "simulate another n_add
samples of size n" is clicked n_add
new samples are simulated
and their sample mean are added to the bottom histogram.
For the N(0,1) case only, there is a checkbox to add to the
bottom plot the p.d.f.s of the distribution of the sample mean and
the (approximate, large n
) distribution of the sample median.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# Sampling from a standard normal distribution mean_vs_median() # Sampling from a standard t(2) distribution mean_vs_median(t_df = 2)
# Sampling from a standard normal distribution mean_vs_median() # Sampling from a standard t(2) distribution mean_vs_median(t_df = 2)
Uses the template rp.cartoons
function to produce
a menu panel from which any of the movies in
smovie
package can be launched. For greater control
of an individual example call the relevant function directly.
movies(fixed_range = TRUE, hscale = NA, vscale = hscale)
movies(fixed_range = TRUE, hscale = NA, vscale = hscale)
fixed_range |
A logical scalar. Only relevant to the Discrete
and Continuous menus. If |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
discrete
, continuous
,
clt
, cltq
, ett
,
mean_vs_median
, correlation
,
lev_inf
, wws
, shypo
.
smovie
: general information about smovie.
movies()
movies()
A movie to illustrate statistical concepts involved in the testing
of one simple hypothesis against another. The example used is a
random sample from a normal distribution whose variance is assumed
to be known. The simple hypotheses relate to the value of the mean
.
shypo( mu0 = 0, sd = 6, eff = sd, n = 10, a = mu0 + eff/2, target_alpha = 0.05, target_beta = 0.1, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_n = 1, delta_a = sd/(10 * sqrt(n)), delta_eff = sd, delta_mu0 = 1, delta_sd = 1 )
shypo( mu0 = 0, sd = 6, eff = sd, n = 10, a = mu0 + eff/2, target_alpha = 0.05, target_beta = 0.1, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_n = 1, delta_a = sd/(10 * sqrt(n)), delta_eff = sd, delta_mu0 = 1, delta_sd = 1 )
mu0 |
A numeric scalar. The value of |
sd |
A positive numeric scalar. The (common) standard deviation
|
eff |
A numeric scalar. The effect size. The amount by which
the value of |
n |
A positive integer scalar. The sample size with which to start the movie. |
a |
A numeric scalar. The critical value of the test with which to
start the movie. H0 is rejected if the sample mean is greater than
|
target_alpha |
A numeric scalar in (0,1). The target value of the
type I error to be achieved by setting |
target_beta |
A numeric scalar in (0,1). The target value of the
type II error to be achieved by setting |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
delta_mu0 , delta_eff , delta_a , delta_n , delta_sd
|
Numeric scalars. The
respective amounts by which the values of |
The movie is based on two plots.
The top plot shows the (normal)
probability density functions of the sample mean under the null
hypothesis H0 (mean mu0
) and the alternative hypothesis H1
(mean mu1
, where mu1
> mu0
), with the values
of mu0
and mu1
indicated by vertical dashed lines.
H0 is rejected if the sample mean exceeds the critical value a
,
which is indicated by a vertical black line.
The bottom plot shows how the probabilities of making a type I or type II
error (alpha and beta respectively) depend on the value of a
,
by plotting these probabilities against a
.
A parameter window enables the user to change the values of n
,
a
, mu0
, eff
= mu1
- mu0
or sd
by clicking the +/- buttons.
Radio buttons can be used either to:
set a
to achieve the target type I error probability
target_alpha
, based on the current value of n
;
set a
and (integer) n
to achieve (or better) the
respective target type I and type II error probabilities of
target_alpha
and target_beta
.
If eff = 0
then a plot will be produced even though this case is
not practically meaningful. In the "set a and n to achieve target alpha
and beta" case, the plot will be the same as the case "set a and n by
hand" case.
Nothing is returned, only the animation is produced.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# 1. Change a (for fixed n) to achieve alpha = 0.05 # 2. Change a and n to achieve alpha <= 0.05 and beta <= 0.1 shypo(mu0 = 0, eff = 5, n = 16, a = 2.3, delta_a = 0.01)
# 1. Change a (for fixed n) to achieve alpha = 0.05 # 2. Change a and n to achieve alpha <= 0.05 and beta <= 0.1 shypo(mu0 = 0, eff = 5, n = 16, a = 2.3, delta_a = 0.01)
A movie to illustrate the nature of the Wald, Wilks and score
likelihood-based test statistics, for a model with a scalar unknown
parameter . The user can change the value of the parameter
under a simple null hypothesis and observe the effect on the test
statistics and (approximate) p-values associated with the tests of
this hypothesis against the general alternative. The user can
specify their own log-likelihood or use one of two in-built examples.
wws( model = c("norm", "binom"), theta_range = NULL, ..., mult = 3, theta0 = if (!is.null(theta_range)) sum(c(0.25, 0.75) * theta_range) else NULL, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_theta0 = if (!is.null(theta_range)) abs(diff(theta_range))/20 else NULL, theta_mle = NULL, loglik = NULL, alg_score = NULL, alg_obs_info = NULL, digits = 3 )
wws( model = c("norm", "binom"), theta_range = NULL, ..., mult = 3, theta0 = if (!is.null(theta_range)) sum(c(0.25, 0.75) * theta_range) else NULL, panel_plot = TRUE, hscale = NA, vscale = hscale, delta_theta0 = if (!is.null(theta_range)) abs(diff(theta_range))/20 else NULL, theta_mle = NULL, loglik = NULL, alg_score = NULL, alg_obs_info = NULL, digits = 3 )
model |
A character scalar. Name of the the distribution on which one of two in-built examples are based. If If The behaviour of these examples can be controlled using arguments
supplied via
|
theta_range |
A numeric vector of length 2. The range of values of
|
... |
Additional arguments to be passed to |
mult |
A positive numeric scalar. If |
theta0 |
A numeric scalar. The value of |
panel_plot |
A logical parameter that determines whether the plot
is placed inside the panel ( |
hscale , vscale
|
Numeric scalars. Scaling parameters for the size
of the plot when |
delta_theta0 |
A numeric scalar. The amount by which the value of
|
theta_mle |
A numeric scalar. The user may use this to supply the
value of the maximum likelihood estimate (MLE) of |
loglik |
An R function, vectorised with respect to its first argument, that returns the value of the log-likelihood (up to an additive constant). The movie will not work if the observed information is not finite at the maximum likelihood estimate. |
alg_score |
A R function that returns the score function, that is,
the derivative of |
alg_obs_info |
A R function that returns the observed information
that is, the negated second derivative of |
digits |
An integer indicating the number of significant digits to
be used in the displayed values of the test statistics and
p-values. See |
The Wald,
Wilks
(or likelihood ratio)
and Score tests are
asymptotically equivalent tests of a simple hypothesis that a parameter
of interest is equal to a particular value
.
The test statistics are all based on the log-likelihood
for
but they differ in the way that they measure the
distance between the maximum likelihood estimate (MLE) of
and
. The Wilks statistic is the amount by which the
log-likelihood evaluated
is smaller than the log-likelihood
evaluated at the MLE. The Walk statistics is based on the absolute
difference between the MLE and
. The score test is
based on the gradient of the log-likelihood (the score function)
at
.
For details see Azzalini (1996).
This movie illustrates the differences between the test
statistics for simple models with a single scalar parameter.
In the (default) normal example the three test statistics coincide.
This is not true in general, as shown by the other in-built example
(distn
= "binom").
A user-supplied log-likelihood can be provided via loglik
.
Nothing is returned, only the animation is produced.
Azzalini, A. (1996) Statistical Inference Based on the Likelihood, Chapman & Hall / CRC, London.
movies
: a user-friendly menu panel.
smovie
: general information about smovie.
# N(theta, 1) example, test statistics equivalent wws(theta0 = 0.8) # binomial(20, theta) example, test statistics similar wws(theta0 = 0.5, model = "binom") # binomial(20, theta) example, test statistic rather different # for theta0 distant from theta_mle wws(theta0 = 0.9, model = "binom", data = c(19, 1), theta_range = c(0.1, 0.99)) # binomial(2000, theta) example, test statistics very similar wws(theta0 = 0.5, model = "binom", data = c(1000, 1000)) set.seed(47) x <- rnorm(10) wws(theta0 = 0.2, model = "norm", theta_range = c(-1, 1)) # Log-likelihood for a binomial experiment (up to an additive constant) bin_loglik <- function(p, n_success, n_failure) { return(n_success * log(p) + n_failure * log(1 - p)) } wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7), theta_mle = 7 / 20, n_success = 7, n_failure = 13) bin_alg_score <- function(p, n_success, n_failure) { return(n_success / p - n_failure / (1 - p)) } bin_alg_obs_info <- function(p, n_success, n_failure) { return(n_success / p ^ 2 + n_failure / (1 - p) ^ 2) } wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7), theta_mle = 7 / 20, n_success = 7, n_failure = 13, alg_score = bin_alg_score, alg_obs_info = bin_alg_obs_info)
# N(theta, 1) example, test statistics equivalent wws(theta0 = 0.8) # binomial(20, theta) example, test statistics similar wws(theta0 = 0.5, model = "binom") # binomial(20, theta) example, test statistic rather different # for theta0 distant from theta_mle wws(theta0 = 0.9, model = "binom", data = c(19, 1), theta_range = c(0.1, 0.99)) # binomial(2000, theta) example, test statistics very similar wws(theta0 = 0.5, model = "binom", data = c(1000, 1000)) set.seed(47) x <- rnorm(10) wws(theta0 = 0.2, model = "norm", theta_range = c(-1, 1)) # Log-likelihood for a binomial experiment (up to an additive constant) bin_loglik <- function(p, n_success, n_failure) { return(n_success * log(p) + n_failure * log(1 - p)) } wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7), theta_mle = 7 / 20, n_success = 7, n_failure = 13) bin_alg_score <- function(p, n_success, n_failure) { return(n_success / p - n_failure / (1 - p)) } bin_alg_obs_info <- function(p, n_success, n_failure) { return(n_success / p ^ 2 + n_failure / (1 - p) ^ 2) } wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7), theta_mle = 7 / 20, n_success = 7, n_failure = 13, alg_score = bin_alg_score, alg_obs_info = bin_alg_obs_info)