GAS Manual
{ Analysis Modules v2.3 }
(c) Alan Young, 19931998.
Introduction
This is the analysis manual for the Genetic Analysis System version 2.3
Before using the modules described in this manual,
you should be familiar with the contents of the companion document
General User Guide
which explains how to use gas and
the data input/output formats required.
Contents
 Introduction
 Parameters
 Statistical Analysis
 Sibpair Analysis
 Sibpair Interval Mapping
 Likelihood Calculations
 Association Analysis
 Haplotyping
This chapter discusses some of the parameters that are common to a large
number of the routines.
General parameters for functions are shown in italics.
To use the gas analysis routines you enter them in the following
format:
call anal( parameters );
where anal is the name of the routine, and parameters
is a list of one or more items to be used in the analysis.
The descriptions of the function parameters use the convention:
type code  meaning

compulsory  this parameter must always be present

optional  this parameter is optional and may be used to modify the
behaviour of the routine or request additional information

graphics  output as postscript graphics is available

The output from routines are written to the file specified by
set outfile,
or to gas.out if this has not been done.
To obtain graphical output you must use set psfile
before giving the program; command.
Locus
Most of the routines require the names of loci to analyze. These are
entered after the locus keyword.
Theta recombination Values
Some of the routines require values for the recombination
fractions between loci, while certain others can use them if
available. These are entered after the theta keyword,
and may be given in 3 formats:
No sexdifference
The most basic case is that in which the recombination fractions between
markers are the same for both male and female chromosomes, and this is
entered into the relevant gas functions using the keyword theta
followed by a list of the recombination fractions in the same order
as any loci specified.
For example, with six loci, five theta values are required and these are
specified as
theta theta_{1} theta_{2}
theta_{3} theta_{4} theta_{5}
where each value must be in the range
0 < theta_{i} <= 1/2.
Constant sexdifference
It is possible to specify that the male and female recombination fractions
differ by a fixed constant multiple. This is done by placing the letter
c after the male recombination fractions, followed by the constant
multiplier. Hence, for three loci with male recombinations 0.2 and 0.3,
and female recombinations 0.24 and 0.36, the parameters should be
theta 0.2 0.3 c 1.2
Arbitrary sexdifference
If there is no simple relationship between male and female recombination
values, both can be specified by separating them with the letter f
Thus for four loci, male recombinations 0.11 0.23 0.17 and female
recombinations 0.15 0.19 0.22 (in the same order) enter the parameter
list
theta 0.11 0.23 0.17 f 0.15 0.19 0.22
Interference
Interference occurs when the probability of crossovers occurring between
adjacent loci is not independent. The current version of gas does not
support this option.
Showraw
Many of the routines are able to produce extra output showing intermediate
stages in their calculations. This is requested by adding the showraw
parameter to the function brackets. (You can also try increasing the
overall verbosity level.)
Signif
Most of the routines will mark a significant result with the
symbol `<+'
if a pvalue or lodscore passes a particular threshold (the default size
of which depends on the actual routine).
The signif parameter may be used to change this threshold:
signif thresholdvalue
Mapping Functions
A mapping function is used to convert from recombination fractions
to physical distances along a chromatid. Some of the routines are
able to display their results in terms of physical distance
using the mapfunc option:
mapfunc nameofmap
The map functions implemented in gas are
 carfal (the CarterFalconer map)
 haldane
 kosambi
Help
Most of the analysis routines accept help as a parameter, and will
respond by showing the full
*
list of parameters available with them. Thus
call sibdes( help );
will show all the parameters available with the sibdes routine.
Statistical Analysis
The gas program provides facilities to perform some basic
statistical analyses of the input data. The routines available
allow the dependencies between members of the same family to
be studied.
References

"Numerical Recipes in C" W.H.Press, B.P.Flannery,
S.A.Teulosky & W.T.Vetterling, Cambridge (2nd Ed.)
Routine: DISSECT
The dissect routine performs general statistical analysis on
the input data, the format being:
call dissect( options... );
Where the options are:
type  parameter  description

optional  pedigree  analyze properties of whole dataset

family  examine individual families

locus  analyze loci singly and in pairs

graphical  psgraphics  display of statistical results

Pedigree
The pedigree option gives data on the total number of active
individuals in the pedigree and their family structures, numbers of
offspring, matings and siblings.
To perform this type of analysis, give the command:
call dissect( pedigree );
Family
The family option displays data on one or more families,
which including their size and numbers of male/female members,
crosses (ie. matings) and generations.
The format is:
call dissect( family family_name(s)... );
If no families are named, then every family in the pedigree is
analysed sequentially.
Locus
The locus option analyses the data for a particular locus.
The general format is:
call dissect( locus locus_name(s)... );
To run the
analysis for the locus height include the following
line in your gasfile program:
call dissect( locus height );
The type of analysis performed depends on the locus (see below).
If several loci of the same type are listed within the brackets, gas
performs a pairwise analysis to show correspondences between their
distributions.
Affection Locus Algorithms
A variety of contingency tables are constructed
showing the affection relations between relatives  in particular
the correspondences between the affection statuses
of parents and of their children. If two or more loci are
listed then contingency tables are created showing their
joint pairwise affectedness distributions and giving the pvalue for
the Null Hypothesis that the statuses at the loci are independent.
Binary Locus Algorithms
The number of positive, negative and unknowns for each factor are
displayed. There is no pairedlocus analysis.
Named Locus Algorithms
The input gene frequencies are compared with observed population
frequencies, the latter also being broken down into parent and
child categories (parents being defined as those subjects with no
children in the pedigree and children being defined as those
with no parents). If two or more loci are listed then a table
is constructed showing how often the alleles at each locus are
present in the same subject (note that a fully typed individual
will generate 4 entries in this table).
Quantitative Locus Algorithms
The mean, median, extrema, std.deviation and various correlations
between relatives are calculated and pvalues assigned.
If two or more loci are present then the correlations between
their values are computed.
References
None.
* Example *
The gasfile dis.gas loads pedigree data from dis.ped
and uses dissect to perform several types of analysis on it.
The results are written to the file dis.out with
postscript graphical output sent to the file dis.ps
SibPair Analysis
These gas modules perform sibpair analyses on sets of
loci to determine the degree of allele sharing between full siblings,
and hence to indicate the chances of linkage between the loci.
Both identity by descent
(IBD) and identity by state (IBS) methods are implemented.
Weights
Sibships in which there are more than two siblings may have a
disproportionate effect (see Reference 1 below)
on the results of a sibpair analysis,
and various weighting strategies have been developed in an
attempt to compensate for this.
Weighting Categorical Traits
Some of the categorical sibpair tests used the weight parameter
to compensate for multipair sibships.
Using strict weights the sharing information from a family
of n siblings by
2/n,
and using hodge weights it by a factor of
4(2n3+(1/2)^{n1})/3n(n1).
For instance, the command
call sibdes( locus dis1 mk1 weight strict );
performs a IBD sibpair analyses for the affection locus dis1
versus the marker locus mk1.
Any multiple sibships are given a {strict} weighting
as described above.
Weighting Quantitative Traits
The HasemanElston routines involve performing a leastsquares fit
through a set of points (one point being generated by each pair).
The dfweight parameter compensates for multipair sibships
by reducing the number of degrees of freedom
used when the significance of the bestfit line is assessed.
References
 "The Information Contained in Multiple Sibling Pairs",
S.E.Hodge, Genet.Epid. 1:109122 (1984).
Routine: SIBDES
This routine performs basic IBD analysis on sibpairs
categorized according to affection status.
call sibdes( locus
locus_names...
options... );}
The routine lists the various types of matings, the degree of
allele sharing between sibs in each (and parental source), the
210 t_{2} and chi^{2}
scores and associated probabilities, together with
the exact 10 binomial probabilities.
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 alltypes  show sharing for notaffected, concordant and discordant
* pairs

halfsib  show commonparent sharing for halfsiblings

summary  only a short summary of the results is given

weight  options are strict and hodge

Algorithm Notes
The IBD sharing is
shown both in terms of a 2:1:0 distribution (using sibs in which
both parents are fully informative
(gas defines a fully informative
parent as one for which it is possible to tell exactly what
happened to their alleles during mating) but not intercrosses,
a 2:1:0 distribution using the expected sharing all pairs which
are at least partly informative, and a 1:0 distribution for
each parental sex. Intercrosses are treated differently from
the fully informative pairs to avoid biasing the results
towards 1sharing, as would happen if the intercrosses in which
pairs definitely shared 2, 1 or 0 alleles were included but those in which
sharing was undecidable between 2 and 0 were excluded.
Note that the chi^{2}
probability is 2sided, whereas the others are
1sided.
If you have allele data for more than one named locus on a chromosome
then, provided the recombination fraction between adjacent loci is less
than 0.3, you will benefit from using the interval map version
(sibides)
of this routine.
References
 "The general purpose sibpair linkage test",
L.S.Penrose, Ann.Eugen. 18, 120124 (1953).
Routine: SIBMLS
The sibmls routine calculates
the maximumlikelihood 210 IBD sharing distribution
of markers (ie. named loci). In addition to the data used by sibdes
it also utilizes partial information from cases in which the sharing
cannot be unambiguously determined.
call sibmls( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 alltypes  show nonaffected and concordant pairs

Algorithm Notes
The maximumlikelihood estimate is restricted so that
2z_{2}> z_{1}
and
z_{1}> 2z_{0}
(the `possible triangle' method  see
Reference 1 below),
where
z_{i} is the fraction of siblings sharing
i alleles IBD.
The maximum is located in a twophase search, using simulated
annealing to explore the function range, then Brent's algorithm
to refine converge about the highest point found.
References
 "Asymptotic Properties of affected sibpair linkage analysis",
P.Holmans, Am.Jou.Hum.Gen. 52:362374 (1993).
Routine: SIB2MLS
The sib2mls routine calculates the
joint maximumlikelihood 210 IBD sharing
distribution of pairs across two named loci simultaneously.
In addition to the data used by sibdes
it also utilizes partial information from cases in which the sharing
cannot be unambiguously determined.
call sib2mls( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 alltypes  show nonaffected and concordant pairs

compare  show MLS sharing for alternative submodels

showraw  display raw sharing data

Algorithm Notes
The region of maximization is restricted according to the type of
model being considered.
The present version of gas compares mls values for the singlelocus,
multiplicative and general models.
The maximum is located in a twophase search, using simulated
annealing to explore the function domain, then Powell's algorithm
(using Brent for the 1dimensional substages)
to refine converge about the highest point found.
References

"TwoLocus Maximum LodScore Analysis of a Multifactorial Trait: Joint
Consideration of IDDM2 and IDDM4 with IDDM1 in Type I Diabetes",
H.J.Cordell et.al, Am.J.Hum.Genet 57:920934 (1995).
Routine: SIBSTATE
The sibstate routine performs Identity By State analysis
on sibpair data.
call sibstate( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 alltypes  show nonaffected and concordant pairs

showraw  display raw statistical information

weight  options are strict and hodge

For instance, the command
call sibstate( locus dis1 mk1 );
performs an IBS sibpair analyses for the affection locus dis1
versus the marker locus mk1.
Algorithm Notes
For the sibstate analysis (and any other IBS technique)
it is absolutely essential that the allele frequencies
of the marker loci are correctly set, otherwise the computed
probabilities will be meaningless.
This means that global binning (preferably using fixed bin sizes)
must be used if data is read using the
alsize
option.
Note that even if parental information is available on some of the
pairs, it will not be used in the analyses.
Two methods are used to calculate pvalues. The first uses a 2sided
chi^{2}
test to compare the overall observed IBS sharing distribution
with that predicted from the allele frequencies  note that this can
produce spurious significant results when an excess of 0 sharers are
present. The second method
uses Lange's Zstatistic (which automatically takes into account
multiple sibships) and produces a 1sided pvalue.
Note that the weight parameter only affects the
chi^{2} results
by reducing the effective contribution of multiple sibships  the
Zstatistic does not require weighting.
References

"The Affected Sibpair Method using Identity by State Relations",
K.Lange, Am.Jou. Hum.Genet., 148150 (1986).

"A Test Statistic for the Affected Sibset Method",
K.Lange, Ann.Hum.Genet., 50 283290 (1986).
Routine: SIBMAP
This routine gives a graphical display of how sharing between siblings
varies along the length of a chromosome,
with options to estimate recombination fraction and namedlocus
order. The syntax is:
call sibmap( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 bestorder  attempt to order loci using sharing data

halfsib  show halfsiblings with paternal/maternal options

mapfunc  select mapfunction for distance estimates

maternal  show map for maternallyderived chromosomes

maxprob  show pairs having crossover probability
above this threshold

minchanges  show pairs in which there are at least
n changes of sharing

mindefinite  show pairs in which
at least n loci can be categorised definitely

paternal  show map for paternallyderived chromosomes

sortleft  sort pairs by first recombination position from left

sortright  sort pairs by first recombination position from right

theta  recombination values to use with maxprob

For instance, the command
call sibmap( locus dis1 dis2 mk1 mk2 mk3 mk4 maternal sortleft );
gives a map of the allele sharing in maternallyderived chromosomes
for the marker loci {mk1, mk2, mk3 and mk4,
sorted to show pairs with the most sharing at the left end of
the chromosome first.
Algorithm Notes
The problem of computing a metric for all possible arrangements of loci
is called NP complete, meaning that the time required is proportional
to the factorial of the number of possible orders. For modest numbers
(above 8) calculating all possible orders is computationally
impractical, and instead gas uses a version of the Metropolis
algorithm based on simulated annealing. Once a sufficiently good order
has been produced, all sets of 3 adjacent loci are permuted to
indicate regions where the order is least certain.
bestorder can take a numeric parameter n in
which case the first n equivalently good orders (as produced
by the triplet permutations described above) are listed in full.
References
None.
* Example *
The gasfile sib.gas reads gformat locus data from sib.loc,
and gformat pedigree data from sib.ped.
The sibdes routine is used to perform an IBD analysis,
with results sent to sibp.out.
The sibstate routine is used to perform an IBS analysis,
with results sent to sibs.out.
The sibmap routine is used twice, firstly to
show the sharing of paternal chromosomes for the sibpairs in which at least
three of the named markers are unambiguously determined (results
in sibm1.out), and then to
to show only those pairs in which there are at least two changes in
sharing status (results in sibm2.out).
The latter analysis may be used to indicate the possibility of
double recombinants.
* Example *
The gasfile bo.gas reads gformat locus data from bo.loc,
and gformat pedigree data from bo.ped.
The sibmap routine is used to determine the most probable order
of the named loci 110 (which should be 1,2,3,...,10) with
the results being written out to the file bo.out.
The dataset contains 20 nuclear families with each locus
simulated as having 6 equally
frequent alleles and a recombination fraction of 0.04 between adjacent loci.
The results show that there are two equally good orders in which
loci 8 and 9 are interchanged
(20 sibpairs is too small a dataset to expect sufficient crossovers between
each locus to produce a unique best ordering).
Routine: SIBMWU
The sibmwu routine performs a nonparametric IBD analysis on a trait
which is specified in terms of a quantitative locus.
call sibmwu( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

exact  sizes of dataset below which pvalues calculated exactly

signif  significance level for linkage

For instance, the command
call sibmwu( locus humour mker );
assesses whether sibling pairs sharing more alleles
at named locus `mker' are
significantly more similar at quantitative locus `humour'
than pairs sharing fewer alleles at `mker'.
Algorithm Notes
The sibmwu routine first ranks all sibling pairs according to
the absolute difference in their value at a quantitative locus,
then uses the MannWitney Utest to compare the distributions of
these values within subsets of the sibpair population, categorized
according to the amount of IBD sharing at a named locus. A result may
indicate linkage if the average rank of pairs decreases as the number
of alleles shared IBD increases, and the pvalues are 1sided towards
this direction.
The exact parameter controls the threshold above which
the Ustatistic is calculated approximately.
For more details see the entry on the
assmwu
routine.
References
None.
Routine: SIBHE
This routine implements the ElstonHaseman algorithm
for analyzing a quantitative trait using IBD sibpair information.
call sibhe( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 absolute  use absolute difference of values rather than square

dfweight  compensate for multipair sibships

empiricalpv  compute empirical pvalues

graph  draw graphs of regression plots

knownonly  only pairs with unambiguous sharing are used

sexual  do separate analyses for paternal and maternal sharing

showallp  show pvalues for +ve slope regressions

signif  the value at which significant results are marked

useall  pairs with no genetic sharing information are used

graphical
 psgraphics  regression plots with graph option

Algorithm Notes
The basic assumption of this method is that siblings sharing marker alleles
near the quantitative trait locus will be more likely to have similar
quantitative values than nonsharing siblings. Thus the mean value of
the difference between siblings should decrease as the fraction of alleles
shared increases. The sibhe routine performs a leastsquares fit
using allele sharing as the independent variable, and trait difference
as the dependent variable. A significantly negative slope may be taken
to indicate linkage.
sibhe implements 3 versions of the HasemanElston algorithm. The
default is to use all pairs for which there is definite sharing
information for either the paternal or maternal alleles (or their sum).
The knownonly parameter causes gas to use only the pairs for
which there is definite sharing information for both paternal and
maternal alleles
(this was the algorithm used by sibdreg in gas1.4).
The useall parameter means that all pairs in a dataset
with known quantitative values are
used and if no sharing information is available for a pair then their
expected IBD sharing is taken to be 1  this happens
under 3 circumstances,
 missing genotypes
 two homozygous parents
 an intercross mating
(producing siblings who share either 0 or 2 alleles but it cannot be
determined which).
If you have allele data for more than one named locus on a chromosome
then, provided the recombination fraction between adjacent loci is less
than 0.2, you will benefit from using the interval map version
(sibihe) of this routine.
References
 "The investigation of linkage between a quantitative trait and a
marker locus",
J.K.Haseman and R.C.Elston, Behaviour Genetics 2, 319 (1972).
* Example *
The gasfile qsib.gas reads locus data from qt_mk1.loc
and qt_level.loc with pedigree data from qtrait.ped.
It calls sibhe and writes the results to qsibhe.out,
then calls sibmwu and writes these results to qsibmwu.out.
Plots of the points and bestfit lines for the sibhe regression
are written to the file qsibhe.ps.
Note the use of fprintf to add comments to the screen and output files.
Routine: SIBTABLE
This routine displays simultaneously the sibpair sharing across a
number of affection and marker loci.
call sibtable( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

optional
 halfsib  show halfsibling data

For instance, the command
call sibtable( locus dis1 dis2 mk1 mk2 mk3 mk4 paternal );
gives a table of the allele sharing in paternallyderived
chromosomes for the marker loci
mk1, mk2, mk3 and mk4.
Algorithm Notes
sibtable has no analytic functions, it's only purpose is to display
the observed sharing in the pedigree data.
References
None.
SibPair Interval Mapping
Sibpair interval mapping is a multipoint method in which information
from adjacent markers is used to infer missing or ambiguous allele sharing.
Calculation of Sharing Probabilities
The calculation of sharing probabilities is carried out in 4 steps:

Loci at which the sharing status is definitely known are stored,

Estimated sharing at ambiguous intercrosses is calculated (see below),

Unknownsharing loci are interpolated (see below) from the results of
above 2 steps.

Any interloci values are interpolated using results of above 3 steps.
Note that when the sharing at a particular locus (for a particular pair)
cannot be assigned due to missing parental data, the algorithm in
gas calculates the expected sharing purely
from the known sharing at adjacent loci rather than attempting to infer
parental genotypes. This strategy was adopted to prevent incorrect
results being caused by wrongly specified allele frequencies,
which is a particular problem with highly polymorphic markers.
Ambiguous Intercrosses
With some intercrosses it can be observed that, while the actual
sharing is unobservable,
either

both paternal and maternal sharing statuses must be the same
(e.g. all parents and children have genotype 1 2,)
or
 paternal and maternal sharing statuses must be opposite
(e.g. all parents and one child have genotype
1 2, other child is 1 1.)
The expected sharing at such a locus is calculated using Bayes'
formula by conditioning on the nearest adjacent loci at which sharing can be
definitely assigned.
To illustrate consider 3 consecutive loci
X, I and Y with recombination fractions
theta_{XI}
and theta_{IY}.
Suppose that paternalmaternal sharing is
10 at X,
11 at Y,
and that I is an intercross at which the pair must either
share 2 or 0 alleles IBD, then the expected paternal sharing at
locus I is
=  P(pat sharing=1 at I  data)


 P(data  pat sharing=1 at I)

=  _____________________________________________

 P(data  pat sharing=1 at I)
+ P(data  pat sharing=0 at I)


 V_{XIm}(1V_{IYm})V_{XIf}V_{IYf}

=  _____________________________________________

 V_{XIm}(1V_{IYm})V_{XIf}V_{IYf}
+(1V_{XIm})V_{IYm}(1V_{XIf})(1V_{IYf})

writing
V_{12s}=theta_{12s}^{2}+(1theta_{12s})^{2},
where theta_{12s} is the recombination fraction between
loci 1 and 2 along the chromatid of sex s.
If several intercross loci interact
then Bayes' formula is extended over all the possible cases
(n intercrosses generate 2^{n} cases)
simultaneously.
Interpolation
Once any ambiguous intercrosses have been resolved, the paternal and
maternal sharing calculations are effectively decoupled. Suppose that
a, b and c are 3 adjacent loci, and
that the sharing (S_{a},
S_{c})
is definitely known at the outer loci a and c, but not
at b in the centre, then S_{b} is calculated
using the formula
S_{b}
=
[ (1V_{ab})(1V_{bc})(1V_{ac})
S_{a}V_{bc}(1V_{bc})(12V_{ab})
S_{b}V_{ab}(1V_{ab})(12V_{bc}) ]
/ V_{ac}(1V_{ac}) ,
where V_{ij} is the sexspecific `V' value between
loci i and j
for the chromatid pair under consideration. If b is at an
end point of the region, so (for instance) a does not exist,
then
V_{ab}=V_{ac}=0.5
and the value of S_{a} is
redundant. The same formula is used to interpolate intermediate
values between the loci for stage [4] above.
Sibpairs for which there is no IBD sharing information at any locus are
not used by the interval mapping routines.
N.B. In most references the symbols V and S are generally
denoted by Greek `psi' and `pi', however it wasn't possible to duplicate
this using transportable html.
Parameters
Interval
By default the interval mapping routines infer sharing only at the
actual loci listed. The interval parameter may be used to
request that the expected sharing is calculated at points between
the loci, thus adding the command option
call routine( ... interval 0.03 );
will generate extra points between the loci so that there is no region
larger than theta=0.03 without such an interpolated value.
Since recombination fractions cannot be added linearly (for
instance twice 0.2 is 0.32) the steps taken will be smaller
than the value specified after interval.
Showraw
The showraw parameter will display the results on a pairbypair basis
after stage [3] of the sharing calculation described above.
References

"Robust Multipoint Linkage Analysis:
An Extension of the HasemanElston Method",
J.M.Olsen, 177193 (1995).
Routine: SIBIDES
This combines interval mapping with the t_{2}
analysis in the
sibdes
routine.
call sibides( locus locus_names...
theta recombination_fractions
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

theta  list of recombination fractions between marker loci

optional
 alltypes  evaluate for nonaffecteds also

interval  use interval map with specified recombination

mapfunc  select distance mapping function

sexual  do separate analyses for paternal and maternal sharing

showraw  display raw sharing data

weight  options are strict and hodge

graphical
 psgraphics  displays t_{2}
and log_{10}(pvalue) along chromosome

Algorithm Notes
The expected sharing is calculated as described earlier in this
chapter, and the 1sided t_{2} test applied to the results.
Do not confuse the graph of log_{10}(pvalue) with a lodscore
statistic  the use of the logarithm is purely to allow the
full range of data to be displayed on a sensible scale.
References
None.
Routine: SIBIHE
This routine combines interval mapping with the HasemanElston algorithm
described in
sibihe.
call sibihe( locus locus_names...
theta recombination_fractions
options... );
type  parameter  description

compulsory
 locus  list the affection and marker loci to be analyzed

theta  list of recombination fractions between marker loci

optional
 absolute  use absolute difference of values rather than square

dfweight  compensate for multipair sibships

empiricalpv  compute empirical pvalues

graph  draw graphs of regression plots

interval  use interval map with specified recombination

mapfunc  select distance mapping function

sexual  do separate analyses for paternal and maternal sharing

showraw  display raw sharing data

signif  the value at which significant results are marked

graphical
 psgraphics  displays log_{10}(pvalue),
also regression plots with graph option

Algorithm Notes
The expected sharing is calculated as described earlier in this
chapter, and the HasemanElston test is then applied as described
in routine sibhe.
Do not confuse the graph of log_{10}(pvalue) with a lodscore
statistic  the use of the logarithm is purely to allow the
full range of data to be displayed on a sensible scale.
The empiricalpv option computes empirical pvalues for each
dataset, and may be given a numeric parameter to control the number
of simulations used to estimate these. Hence
empiricalpv 10
will compute 10 thousand replicates  if no number is given then the default
value of 5 (giving 5000 replicates per calculation) is assumed.
References
None.
* Example *
The gasfile iplot.gas performs the sibides test on
the affection locus a1 and the sibihe test on the
quantitative locus `q1' using genotype data from 8 named
loci labelled 1,2,...,8. The recombination values are
different along the male and female chromatids.
Results are written to the
files iplot.out and iplot.ps.
Likelihood Calculations
The routines in this chapter are designed to perform `traditional' linkage
analysis in which alternate hypotheses about genotype/phenotype interactions
are tested by computing lodscores.
All of the `lik' routines use the Vitesse likelihood engine, which was
devised and implemented by Jeff O'Connell.
Vitesse is the fastest
likelihood program currently extant (1996), capable of computing multipoint
lodscores with highly polymorphic markers  see below for further details.
Vitesse
Jeff O'Connell's Vitesse program incorporates many new computational
techniques which enable it to perform calculations impossible for
other programs. In particular it is able to handle up to 8 loci
simultaneously in multipoint lodscores and
isn't slowed by highly polymorphic marker alleles. A more optimized
(ie. faster) version of the likelihood engine is under construction
and will be incorporated into gas as soon as it is fully tested.
Vitesse is undergoing continuous improvement, and while we believe that
all the results produced are correct, there are restrictions on the types
of data it can currently handle. These are:
 Only a single trait locus per calculation
 No sexlinked loci
 No inbreeding or consanguinity loops
 Only one `founding' nuclear unit per pedigree
Condition [4] means that there can only be one mating in any
family in which all four grandparents are unknown (ie. not listed
in the pedigree).
Datasets which violate these conditions will cause the program to exit.
Vitesse will eventually available as a standalone program
with a `Linkagelike' interface via anonymous ftp.
The data and control formats are compatible with version 5.1/5.2
of LINKAGE and version 2.3P of the FASTLINK program.
Email
jeff@sherlock.hgen.pitt.edu
for more details on this.
Parameters
Several of the routines use a common syntax for performing particular
tasks, and some of this is described below. Refer to the individual
routine descriptions to see which features are available for each of them.
Support
Some functions are able to calculate support intervals about the location
of a maximum lodscore (ie. the adjacent region where
the lodscore is within a certain
amount of its highest value). Hence if a lodscore has a peak of 6.3
at location X, then the support interval of
height 1.5 is the
adjacent region of the chromosome surrounding X on which the lodscore is
continuously
above 4.8
support value
If no value is supplied, a default of 1 is assumed.
Exclusion
An exclusion map shows regions of a chromosome where linkage is unlikely
because the lodscore is significantly below zero. The exclude
parameter is used to scan for such areas:
exclude value
If no value is supplied, a default of 2 is assumed, so
that any region with lodscore of 2 or lower is marked as being excluded.
References
 "The VITESSE algorithm for rapid exact multilocus linkage analysis
via genotype setrecoding and fuzzy inheritance",
J.R.O'Connell and D.E.Weeks, Nature Genetics 11:402408 (1995).
Routine: LIK2POINT
The lik2point routine performs a series of twolocus optimizations
to determine the most probable recombination fractions between pairs
of adjacent loci.
call lik2point( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of loci to be analyzed

optional
 allorders  all possible pairs of loci are examined

exclude  identify exclusion region

findmax  find maximum lodscores in each interval between fixed loci

mapfunc  mapping function to use

signif  level to declare linkage to be significantly probable

support  calculate support interval

graphical
 psgraphics  plots of lodscores over range
0 < theta <= 1/2.

Algorithm Notes
The maximization (for the findmax option) is carried out using
Brent's algorithm, taking as starting point the highest value found
during the initial scan of the range
0 < theta <= 0.5.
Further parameters are available to tune the internal performance of this
maximization algorithm
type  parameter  description

optional
 initstep  number of steps in initial scan of interval
0 < theta <= 0.5, default 5.

maxtol  maximum tolerance in optimization,
default 10^{5}.

maxiter
 maximum optimization iterations to attempt, default 100.

Under the vast majority of circumstances the default options will produce
good results, however for `difficult' datasets you may try
increasing initstep and maxiter.
References
None.
Routine: LIKMAP
The likmap routine generates a series of likelihoods giving the
probabilities that a particular locus (called `movable') lies in various
locations with respect to one or more other loci whose positions
are specified (called `fixed').
call likmap( locfix locus_names...
locmov locus_names...
theta recombination_fractions
options... );
type  parameter  description

compulsory
 locfix  list of ordered fixed loci to analyze

locmov  list of movable loci to analyze

theta  list of recombination fractions between fixed loci

optional
 doall  calculate all values in subsets

dosets  use subsets of fixed loci of this size

exclude  level to indicate linkage is excluded

findmax  find maximum likelihood position in each interval

mapfunc  mapping function to use

margin  the minimum distance between fixed and movable loci

showraw  display `actual' likelihoods

signif  level to declare linkage to be significantly probably

step  the number of steps to take between adjacent fixed loci

graphics
 psgraphics  lodscore map across the interval

Algorithm Notes
It is essential to set the dosets parameter if more than 8 fixed
loci are used  otherwise the computation time and space are likely
to be prohibitive. For optimal performance dosets should be an
even number, with a value of 4 (the default) or 6 generally
producing good results (also, graphical output will be messy for
odd numbers of fixed loci since many points will have two values
plotted).
The text output displays only the two recombination fractions to either
side of the movable locus, since for each order the others
are fixed by the input parameters.
The maximization is carried out using Brent's algorithm, taking as starting
point the highest value found whilst constructing the map (the resolution
of which may be changed using the step parameter).
References
None.
* Example *
The gasfile twop.gas uses lik2point to calculate
the mostprobable recombination
fractions between a series
mka  mkd
of 4 adjacent marker loci. Distances are given in
Morgans (M) using the Kosambi map.
* Example *
The gasfile map.gas demonstrates the use of likmap to
create a table showing the likelihood of
the loci try1 and try2 being at various locations along a
chromosome on which the five markers
mka  e have previously
been positioned. Distances are shown using the Haldane map.
Routine: LIKSINGLE
The liksingle routine performs a single likelihood calculation
for a fixed set of loci and recombination fractions.
call liksingle( locus locus_names...
theta recombination_fractions
options... );
type  parameter  description

compulsory
 locus  list of ordered loci to be analyzed

theta  list of recombination fractions between fixed loci

optional
 genlod  computes Ott's generalized lodscore

Algorithm Notes
Because of internal variations in algorithms, the likelihoods calculated
by two programs for the same dataset may vary enormously.
However the ratio of two likelihoods (as computed by the same program)
should be invariant between programs, and thus the lodscores
produced by such programs should be very similar.
The generalized lodscore compares the likelihood against the value
when all the recombination fractions are set to 1/2.
For a single pair of loci it is identical to the normal lodscore.
References
None.
* Example *
The gasfile sin.gas demonstrates the use of liksingle to
show the likelihood of mka, mkb and mkc lying on the
same chromatid separated by recombination fractions
theta=0.35 and 0.31.
Association Analysis
The routines for association analysis look for correspondences
between the occurrences of particular alleles of named loci
and the values of traits in the population.
To perform association tests it is essential that the names of alleles
be the same in different families (eg. named allele `1' must
represent the same physical marker in the whole population).
This means that global binning (preferably with fixed bin sizes)
must be used if data is read using the alsize option.
Routine: ASSTDT
The asstdt routine performs association analysis between a
marker and an affection locus using the Transmission Disequilibrium
Test.
call asstdt( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of affection and marker loci to analyze

optional
 sexual  show separate analysis for paternal and maternal alleles

signif  set significance criteria

weight  reduce contribution of multiple sibships

Algorithm Notes
The standard algorithm follows Spielman's advice of treating all children
as independent observations, summing their transmitted and nontransmitted
alleles, and calculating the significance using the exact 1sided binomial
distribution.
Some authors suggest that only one child should be used from each mating,
and that this child be selected according to fixed ascertainment criteria.
To employ this strategy you need to remove the other
children from the pedigree file before running asstdt.
If there are several children within a family which satisfy the
ascertainment criteria equally well (so that selecting a particular one
would be arbitrary), then the weight option
will calculate the average contribution from each
of these `equivalent' children and treat this as being
the contribution due to a single child.
Since the weight option may result in noninteger totals,
the chi^{2}
distribution (with 1 degree of freedom) is used to calculate
the significance.
References
 "Transmission Test for Linkage Disequilibrium: The Insulin Gene
Region and InsulinDependent Diabetes Mellitus",
R.S.Spielman, R.E.McGinnis, W.J.Ewens, Am.Jou.Hum.Gen. 52:506:516 (1993).
Routine: ASSCOMPARE
The asscompare routine compares the allele frequencies between
two groups of subjects denoted by y/n values at an
affection locus.
call asscompare( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of affection and marker loci to analyze

optional
 sexual  show males and females separately

signif  set significance criteria

useall  use halfknown genotypes

Algorithm Notes
For a locus with n alleles, gas constructs
a 2xn contingency
table showing how often each allele occurs in the two populations
(ie. the sets of people labelled y
and n).
A chi^{2}
test is used to assess to what extent the allele frequencies differ
between the populations.
Each of the alleles is also tested individually by grouping all of the
other alleles into a single bin and performing chi^{2}
tests on the
2x2 contingency tables produced.
References
None.
Routine: ASSMWU
The assmwu routine performs association analysis between a
marker and a quantitative locus, using the MannWitney
UTest (equivalent to the Wilcoxon RankSum test).
call assmwu( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of quantitative and marker loci to analyze

optional
 allinfo  give extra information

cutoff  cutoff for displaying pvalues

exact  sizes of dataset below which pvalues calculated exactly

sexual  show separate results by subject sex

signif  set significance criteria

Algorithm Notes
Two tests are performed. The first treats each allele as a separate
observation, so that a subject with genotype
`1 3' appears
in both the `1' and `3' allele
categories, and a homozygous subject appears twice
in the same category. Subjects with halfknown
genotypes
(eg. `1 x') contribute a single observation.
The alleles are ranked one at a time (ie. each
allele versus all the others) according to the quantitative
values associated with them.
The second test categorizes subjects according to whether they do or
do not have a particular allele. The ranks of the subjects (according
to the quantitative trait) who have
each allele are compared with those who do not have the allele to
indicate if the allele tends to be associated with subjects who
are biased in a particular direction away from the mean. Subjects with
halfknown genotypes are not used.
Note that exact calculation of pvalues requires a large amount of time
and memory
(RAM is approximately proportional to
N^{2}M^{2}/4 where N and M
are the sizes of the datasets being compared)
and the optimal values for exact will depend on
your computer. If gas halts with an outofmemory message, reduce one or
both of the exact values.
For example, the command
call assmwu( locus weight mar1 exact 20 50 );}
performs the MannWitney Utest on the quantitative locus weight
against the marker mar1. Pvalues are calculated by a
Gaussian approximation unless there are less than 20 instances of
a particular allele versus a set of 50 instances of other alleles.
References
None.
* Example *
The gasfile assoc.gas reads pedigree data from the file
assoc.ped. The asstdt
routine is used on the loci disease and marker1,
and the assmwu test is used on response
and marker1.
The results are written to the files
tdt.out and mwu.out respectively.
Routine: ASSRELPREF
The assrelpref routine performs association analysis between a
marker and an affection locus, using the Relative Predispositional
Effect technique.
call assrelpref( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of quantitative and marker loci to analyze

optional
 alltypes  results are shown for nonaffected subjects

signif  set significance criteria

sexual  males and females are analyzed separately

For example, the command
call assrelpref( locus spotty mar1 );
performs the RPE analysis on the affection locus spotty
in terms of the alleles of the marker locus mar1.
Algorithm Notes
The RPE method calculates a pvalue for each allele individually according
to the formula
chi^{2}
=(O_{i}E_{i})^{2}/E_{i}
where O_{i} is the observed number of occurrences of
allele i in the dataset,
and E_{i} is the number expected
according to the input allele frequencies. The total pvalue for
the dataset is calculated by adding together the nonzero alleles
and using a chi^{2} test with degrees of freedom one less than
the number of nonzero alleles.
If the total pvalue is less than the significance criteria (which may
be altered with the signif parameter) then the allele with the
smallest pvalue is removed from the dataset and the expected frequencies
are recalculated as though that allele did not exist. This
procedure is then repeated until the total pvalue becomes
nonsignificant.
References
 "RPEs of Marker Alleles with Disease: HLADR Alleles and Graves Disease",
Payami et.al., Am.J.Hum.Genet 541546 (1989).
Routine: ASSGENORR
The assgenorr routine performs association analysis between a
marker and a condition using the Genotype Relative Risk method,
in which the observed distribution of named alleles in a subset of
the pedigree is compared to that predicted from the input allele
frequencies (entered earlier using
the set locus command) under the assumption of
HardyWeinberg equilibrium.
call assrelpref( locus locus_names...
allele allele_names...
options... );
type  parameter  description

compulsory
 locus  list of marker (and optionally affection) loci to analyze

allele  list of alleles of the marker loci to analyze

optional
 inpairs  compare two single genotypes

incommon  compare genotype against others with allele in common

allother  compare genotype against all others

signif  set significance criteria

If no affection loci are listed, then the whole of the pedigree is compared
to the input allele frequencies, and the risks computed refer to the
probability of a random member of the population being selected to form
part of the dataset (for optimum performance the members of the
pedigree should not be related).
The parameters inpairs, incommon, and
allother may be combined in a single command.
Inpairs
The inpairs option tests listed pairs of alleles against each other.
For example, the command
call assgenorr( locus spotty mar1 allele 1 2 3 4 inpairs );
calculates the relative risk of subjects having genotype
1 2
at locus mar1 of being affected at locus spotty, compared to
subjects with genotype 3 4.
More than two pairs of alleles can be listed.
Incommon
The incommon option tests specific allele pairs against all the
haplotypes sharing a particular allele in common with them.
The command
call assgenorr( locus mar2 allele a1 a2 incommon );
calculates (separately) the relative risks of subjects having genotype
a1 a1 ,
a1 a2 and a2 a2
at locus mar2 of being affected at locus spotty,
compared to subjects sharing one of these alleles.
Allother
The allother option tests specific allele pairs against all
other allele pairs simultaneously.
The command
call assgenorr( locus hairy mar3 allele alpha beta allother );
calculates (separately) the relative risks of subjects having genotype
alpha alpha, alpha beta
and beta beta
at locus mar3 of being affected at locus hairy,
compared to subjects having any other combination of alleles.
Algorithm Notes
The genotype relative risk (R_{AB}) of genotype
set A individuals compared to
genotype set B individuals is calculated according to the formula
R_{AB}=
(O_{A}+0.5)F_{B} / (O_{B}+0.5)F_{A} ,
where O_{A} and O_{B} are
the respective observed counts of the
genotypes in
the dataset, and F_{A}
and F_{B} are the expected counts based
on the input allele frequencies under the assumption of
HardyWeinberg equilibrium.
References
 "Estimating Genotype Relative Risks",
M.Lathrop, Tissue Antigens 22, 160166 (1983).
* Example *
The gasfile grr.gas reads pedigree data from assoc.ped
and uses the assgenorr method to compare the
haplotype 2 3
(at locus marker)
against 1 3 (inpairs results in grrpair.out),
to compare the haplotype 1 2 against all other haplotypes
having allele 1then against all other haplotypes having
allele 2 (incommon results in grrcom.out),
and lastly to compare haplotype 3 3 against every othe
possibility for the subjects affected at locus disease
(allother results in grrall.out).
Routine: ASSEMPLOG
The assemplog routine performs association analysis between a
pair of named or affection loci
and a condition using the Empirical Logistic method.
call assemplog( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of marker and affection loci to analyze

optional
 showraw  display raw Z terms and variances

signif  set significance criteria

At least three loci (one of which must be an affection status)
must be listed. If named loci are listed then all their alleles are
tested in a pairwise fashion.
Algorithm Notes
In cases where there are no subjects possessing a particular
genotype/phenotype combination, the variance of the statistic becomes
infinite and a pvalue of 1 is returned.
References
 "Association between HLA Antigens and the Presence of Certain Diseases",
J.R.Green et.al., Statistics in Medicine Vol.2, 7985 (1983).
* Example *
The file elm.gas read data from elm.loc and
elm.ped and performs the assemplog analysis.
(the subjects are all `singletons' and gas will generate
warnings about this  press `c' to continue at each stage
(if you set maxwarnings to a large value, most of
the repeated questioning will cease).
Results are writen to the files
elm.1a, elm.1b and elm.1c.
Routine: ASSHAPRR
The asshaprr routine performs association analysis between a
marker and an affection locus using the Haplotype Relative Risk
Test.
call asshaprr( locus locus_names...
options... );
type  parameter  description

compulsory
 locus  list of marker and affection loci to analyze

Algorithm Notes
None.
References
None.
Haplotyping
Haplotyping is the process of determining which alleles in an
unordered genotype are descended from each of a subjects parents,
and thus (when this is done for several linked loci)
reconstructing segments of the chromatids within each subject.
Routine: HAPCHILD
The hapchild routine determines the allelic phase of the genotypes
of children within the input population. For those children at which
the phase can definitely be decided for all or their alleles, the
observed haplotypes are ordered in decreasing frequency. The syntax is:
call hapchild( locus
locus_names...
options );
type  parameter  description

compulsory
 locus  list of affection and marker loci to analyze

optional
 sexual  show separate analysis for paternal and maternal chromatids

Algorithm Notes
The hapchild routine only uses the alleles for which the parental
origin can be definitely determined  there is no attempt to assign
probabilities to ambiguous cases (which are marked x and ignored
when counting haplotype frequencies).
The haplotypes are listed in order of decreasing frequency.
References
None.
* Example *
The gasfile chap.gas loads pedigree data from chap.ped
and uses hapchild to calculate the most frequently occurring
haplotypes in the children. Results are sent to the
file chap.out.
End of Gas Manual v2.3