The purpose of this page is to collect various material related to the
Skew-Normal (SN) probability distribution and related distributions.
The SN distribution is an extension of the normal (Gaussian) probability
distribution, allowing for the presence of skewness.
Similarly to the SN density, a
skew-t (ST) distribution has been developed, which allows
ro regulate both skewness and kurthosis of the fitted model.
The distribution is obtained by introducing a skewness parameter
to the usual t density.
Introduction
If you have never read about the skew-normal probability distribution,
you may want to look at a
very brief account.
To view the shape of the density function, here are some
graphical demostration programs:
A detailed (but last updated on 2008-07-22) bibliography is available. The list includes
only published material or papers accepted for publication or other
material having `firm form', such as a Ph. D. thesis. No further
update of this list is planned at the moment.
IV Skew Workshop,
Pontificia Universidad Católica de Chile, 16th to 19th May 2011
A pioneer
In 1908, Fernando de Helguero
presented
a paper
which examines a selection mechanism of a normal population as a
model of departure from normality. This construction essentially
perturbates the normal density via a uniform distribution function,
leading to a form of skew-normal density. Although mathematically
somewhat different from the above-described form of skew-normal
density, the underlying stochastic mechanism is intimately
related. (2004-12-13)
The 'sn' package (or library, here the term is used as a synonym)
is a suite of functions for handling skew-normal and
skew-t distributions, in the univariate and the multivariate case.
The available facilities include various standard operations
(density function, random number generation, etc), data fitting via MLE,
plotting log-likelihood surfaces and others. For data fitting,
simple random samples and regression models are dealth with.
Current development is take place in environment
R.
Some porting to other languages are available but they are not really
maintained: if you want the most recent version, use the one for R.
Notice that existing portings to other environments have been carried out
before version 0.3-0, and therefore they do not include
any facilities for the skew-t distribution.
The key edition of the library is the one for R.
Its version 1.1-2 is now available (2014-11-30).
The upgrade from version 0.4-18 to the 1.x-y series has implied that the
new version is only partly backward compatible with the 0.4-xx version.
For those who want to keep both versions, a modified package called `sn0'
has been created. This is simply a rename of the previous version
(with renaming of some internal references), available from
here.
It has been packaged for Unix, but it should work with MS-windows as well
since no binaries are used. (2014-01-07).
Notice that the R
versions of 'sn' for various platforms are also obtainable directly from
CRAN,
and they are simply installed
using the install.packages("sn") command, provided that
your installation is suitably configured (and that you are actually
connected on-line!).
The library has been ported to Matlab by
Nicola Sartori.
However, this porting refers to update 0.21 of the package, that is,
to year 2000! Hence many facilities are not included, notably those
for the skew-t distribution. A subset of the
facilities for the skew-t distribution is available
via a set of Matlab functions
which have been written and made available by
Enrique Batiz (Enrique.Batiz [at] postgrad.mbs.ac.uk).
Please see the
note for Matlab users
(2013-12-11).
An MS-DOS executable program is available which implements a
small portion of the `sn library', namely MLE estimation for random
samples and for regression models with errors having scalar SN
distribution.
The above-described executable program is based on some Fortran90 code
which represents the porting of a few basic routines from R. About 10
years after writing it, prompted by a user request, I have thought that
after all this Fortran code could be of use to other people; so now it is
available. (Written in 1998, on line since 2008-09-25)
Excel users can make use of
VBA routines kindly made
available by Stephen H. Gersuk (2008-09-22);
a Perl
module has been provided by Jiri Vaclavik (added 2011-10-21);
On-line procedures
Data fitting
You can fit a skew-normal distribution to your data using
this form. This procedure also serves as
a demonstration of the library sn
functionality, although only in a simple case. If you have a
more complex problem (large data set, data with covariates,
multivariate data, etc), then you must download the full library
and run it yourself. (Created on 2003-02-17,
updated 2003-04-22, 2008-12-02, 2011-08-03).
Random numbers generation
You can generate random numbers with SN or ST distribution
in 1 or 2 dimensions using this form
(2003-11-12). See also the FAQ below.
In the multivariate case, the feasible region for the set of
correlations and the indices of skewness of the individual components
is not simple to perceive. To help visualizing this region
in the bivariate case, you can run the R program
feasible-CP2.R; besides R, it requires
its package 'rgl'. To run it, save this file locally,
then start R and type source('feasible-CP2.R').
(2009-05-27)
The program displays two plots in sequence.
The first plot adopts delta as the shape parameter;
the connection between delta and
gamma1 is described in various articles, including
this one. The second plot uses
gamma1.
Miscellanea
Real SN distribution
The skew-normal distribution is not a mere mathematical abstraction:
it is real life! (2013-06-06)
Translations of the term "skew-normal distribution"
available at ISI
A research problem
The paper Statistical applications of the multivariate
skew-normal distribution includes the discussion of an
apparently innocuous dataset,
but having the MLE on the frontier of the parameter space.
Can you suggest an explanation of the phenomenon, and/or
propose an alternative, `reasonable' estimate?
It should work with this as well as with more regular datasets.
Hence, the obvious answer (the method of moments) is not acceptable,
since it would work here but not with other datasets
having the sample index of skewness outside the feasible region.
Various solutions to the problem have been put forward, both in
the classical and in the Bayesian approach.
You can get the `frontier' data,
and try out your own method.