GAS Manual

--{- Analysis Modules v2.3 -}--

(c) Alan Young, 1993-1998.


Introduction

This is the analysis manual for the Genetic Analysis System version 2.3 Before using the modules described in this manual, you should be familiar with the contents of the companion document General User Guide which explains how to use gas and the data input/output formats required.

Contents

  1. Introduction
  2. Parameters
  3. Statistical Analysis
  4. Sib-pair Analysis
  5. Sib-pair Interval Mapping
  6. Likelihood Calculations
  7. Association Analysis
  8. Haplotyping

Parameters

This chapter discusses some of the parameters that are common to a large number of the routines. General parameters for functions are shown in italics. To use the gas analysis routines you enter them in the following format:

call anal( parameters );

where anal is the name of the routine, and parameters is a list of one or more items to be used in the analysis. The descriptions of the function parameters use the convention:

type codemeaning
compulsorythis parameter must always be present
optionalthis parameter is optional and may be used to modify the behaviour of the routine or request additional information
graphicsoutput as postscript graphics is available

The output from routines are written to the file specified by set outfile, or to gas.out if this has not been done. To obtain graphical output you must use set psfile before giving the program; command.

Locus

Most of the routines require the names of loci to analyze. These are entered after the locus keyword.

Theta recombination Values

Some of the routines require values for the recombination fractions between loci, while certain others can use them if available. These are entered after the theta keyword, and may be given in 3 formats:

No sex-difference

The most basic case is that in which the recombination fractions between markers are the same for both male and female chromosomes, and this is entered into the relevant gas functions using the keyword theta followed by a list of the recombination fractions in the same order as any loci specified. For example, with six loci, five theta values are required and these are specified as

theta theta1 theta2 theta3 theta4 theta5

where each value must be in the range 0 < thetai <= 1/2.

Constant sex-difference

It is possible to specify that the male and female recombination fractions differ by a fixed constant multiple. This is done by placing the letter c after the male recombination fractions, followed by the constant multiplier. Hence, for three loci with male recombinations 0.2 and 0.3, and female recombinations 0.24 and 0.36, the parameters should be

theta 0.2 0.3 c 1.2

Arbitrary sex-difference

If there is no simple relationship between male and female recombination values, both can be specified by separating them with the letter f Thus for four loci, male recombinations 0.11 0.23 0.17 and female recombinations 0.15 0.19 0.22 (in the same order) enter the parameter list

theta 0.11 0.23 0.17 f 0.15 0.19 0.22

Interference

Interference occurs when the probability of cross-overs occurring between adjacent loci is not independent. The current version of gas does not support this option.

Showraw

Many of the routines are able to produce extra output showing intermediate stages in their calculations. This is requested by adding the showraw parameter to the function brackets. (You can also try increasing the overall verbosity level.)

Signif

Most of the routines will mark a significant result with the symbol `<+' if a p-value or lodscore passes a particular threshold (the default size of which depends on the actual routine). The signif parameter may be used to change this threshold:

signif threshold-value

Mapping Functions

A mapping function is used to convert from recombination fractions to physical distances along a chromatid. Some of the routines are able to display their results in terms of physical distance using the mapfunc option:

mapfunc name-of-map

The map functions implemented in gas are

Help

Most of the analysis routines accept help as a parameter, and will respond by showing the full * list of parameters available with them. Thus

call sibdes( help );

will show all the parameters available with the sibdes routine.


Statistical Analysis

The gas program provides facilities to perform some basic statistical analyses of the input data. The routines available allow the dependencies between members of the same family to be studied.

References

  1. "Numerical Recipes in C" W.H.Press, B.P.Flannery, S.A.Teulosky & W.T.Vetterling, Cambridge (2nd Ed.)

Routine: DISSECT

The dissect routine performs general statistical analysis on the input data, the format being:

call dissect( options... );

Where the options are:

typeparameterdescription
optionalpedigreeanalyze properties of whole dataset
familyexamine individual families
locusanalyze loci singly and in pairs
graphicalpsgraphicsdisplay of statistical results

Pedigree

The pedigree option gives data on the total number of active individuals in the pedigree and their family structures, numbers of offspring, matings and siblings. To perform this type of analysis, give the command:

call dissect( pedigree );

Family

The family option displays data on one or more families, which including their size and numbers of male/female members, crosses (ie. matings) and generations. The format is:

call dissect( family family_name(s)... );

If no families are named, then every family in the pedigree is analysed sequentially.

Locus

The locus option analyses the data for a particular locus. The general format is:

call dissect( locus locus_name(s)... );

To run the analysis for the locus height include the following line in your gasfile program:

call dissect( locus height );

The type of analysis performed depends on the locus (see below). If several loci of the same type are listed within the brackets, gas performs a pairwise analysis to show correspondences between their distributions.

Affection Locus Algorithms

A variety of contingency tables are constructed showing the affection relations between relatives --- in particular the correspondences between the affection statuses of parents and of their children. If two or more loci are listed then contingency tables are created showing their joint pair-wise affectedness distributions and giving the p-value for the Null Hypothesis that the statuses at the loci are independent.

Binary Locus Algorithms

The number of positive, negative and unknowns for each factor are displayed. There is no paired-locus analysis.

Named Locus Algorithms

The input gene frequencies are compared with observed population frequencies, the latter also being broken down into parent and child categories (parents being defined as those subjects with no children in the pedigree and children being defined as those with no parents). If two or more loci are listed then a table is constructed showing how often the alleles at each locus are present in the same subject (note that a fully typed individual will generate 4 entries in this table).

Quantitative Locus Algorithms

The mean, median, extrema, std.deviation and various correlations between relatives are calculated and p-values assigned. If two or more loci are present then the correlations between their values are computed.

References

None.

* Example *

The gasfile dis.gas loads pedigree data from dis.ped and uses dissect to perform several types of analysis on it. The results are written to the file dis.out with postscript graphical output sent to the file dis.ps

Sib-Pair Analysis

These gas modules perform sib-pair analyses on sets of loci to determine the degree of allele sharing between full siblings, and hence to indicate the chances of linkage between the loci. Both identity by descent (IBD) and identity by state (IBS) methods are implemented.

Weights

Sib-ships in which there are more than two siblings may have a disproportionate effect (see Reference 1 below) on the results of a sib-pair analysis, and various weighting strategies have been developed in an attempt to compensate for this.

Weighting Categorical Traits

Some of the categorical sib-pair tests used the weight parameter to compensate for multi-pair sibships. Using strict weights the sharing information from a family of n siblings by 2/n, and using hodge weights it by a factor of 4(2n-3+(1/2)n-1)/3n(n-1). For instance, the command

call sibdes( locus dis1 mk1 weight strict );

performs a IBD sib-pair analyses for the affection locus dis1 versus the marker locus mk1. Any multiple sibships are given a {strict} weighting as described above.

Weighting Quantitative Traits

The Haseman-Elston routines involve performing a least-squares fit through a set of points (one point being generated by each pair). The dfweight parameter compensates for multi-pair sibships by reducing the number of degrees of freedom used when the significance of the best-fit line is assessed.

References

  1. "The Information Contained in Multiple Sibling Pairs", S.E.Hodge, Genet.Epid. 1:109-122 (1984).

Routine: SIBDES

This routine performs basic IBD analysis on sib-pairs categorized according to affection status.

call sibdes( locus locus_names... options... );}

The routine lists the various types of matings, the degree of allele sharing between sibs in each (and parental source), the 2-1-0 t2 and chi2 scores and associated probabilities, together with the exact 1-0 binomial probabilities.

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional alltypesshow sharing for not-affected, concordant and discordant * pairs
halfsibshow common-parent sharing for half-siblings
summaryonly a short summary of the results is given
weightoptions are strict and hodge

Algorithm Notes

The IBD sharing is shown both in terms of a 2:1:0 distribution (using sibs in which both parents are fully informative (gas defines a fully informative parent as one for which it is possible to tell exactly what happened to their alleles during mating) but not intercrosses, a 2:1:0 distribution using the expected sharing all pairs which are at least partly informative, and a 1:0 distribution for each parental sex. Intercrosses are treated differently from the fully informative pairs to avoid biasing the results towards 1-sharing, as would happen if the intercrosses in which pairs definitely shared 2, 1 or 0 alleles were included but those in which sharing was undecidable between 2 and 0 were excluded. Note that the chi2 probability is 2-sided, whereas the others are 1-sided.

If you have allele data for more than one named locus on a chromosome then, provided the recombination fraction between adjacent loci is less than 0.3, you will benefit from using the interval map version (sibides) of this routine.

References

  1. "The general purpose sib-pair linkage test", L.S.Penrose, Ann.Eugen. 18, 120-124 (1953).

Routine: SIBMLS

The sibmls routine calculates the maximum-likelihood 2-1-0 IBD sharing distribution of markers (ie. named loci). In addition to the data used by sibdes it also utilizes partial information from cases in which the sharing cannot be unambiguously determined.

call sibmls( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional alltypesshow non-affected and concordant pairs

Algorithm Notes

The maximum-likelihood estimate is restricted so that 2z2> z1 and z1> 2z0 (the `possible triangle' method - see Reference 1 below), where zi is the fraction of siblings sharing i alleles IBD. The maximum is located in a two-phase search, using simulated annealing to explore the function range, then Brent's algorithm to refine converge about the highest point found.

References

  1. "Asymptotic Properties of affected sib-pair linkage analysis", P.Holmans, Am.Jou.Hum.Gen. 52:362-374 (1993).

Routine: SIB2MLS

The sib2mls routine calculates the joint maximum-likelihood 2-1-0 IBD sharing distribution of pairs across two named loci simultaneously. In addition to the data used by sibdes it also utilizes partial information from cases in which the sharing cannot be unambiguously determined.

call sib2mls( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional alltypesshow non-affected and concordant pairs
compareshow MLS sharing for alternative sub-models
showrawdisplay raw sharing data

Algorithm Notes

The region of maximization is restricted according to the type of model being considered. The present version of gas compares mls values for the single-locus, multiplicative and general models.

The maximum is located in a two-phase search, using simulated annealing to explore the function domain, then Powell's algorithm (using Brent for the 1-dimensional sub-stages) to refine converge about the highest point found.

References

  1. "Two-Locus Maximum LodScore Analysis of a Multifactorial Trait: Joint Consideration of IDDM2 and IDDM4 with IDDM1 in Type I Diabetes", H.J.Cordell et.al, Am.J.Hum.Genet 57:920-934 (1995).

Routine: SIBSTATE

The sibstate routine performs Identity By State analysis on sib-pair data.

call sibstate( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional alltypesshow non-affected and concordant pairs
showrawdisplay raw statistical information
weightoptions are strict and hodge

For instance, the command

call sibstate( locus dis1 mk1 );

performs an IBS sib-pair analyses for the affection locus dis1 versus the marker locus mk1.

Algorithm Notes

For the sibstate analysis (and any other IBS technique) it is absolutely essential that the allele frequencies of the marker loci are correctly set, otherwise the computed probabilities will be meaningless. This means that global binning (preferably using fixed bin sizes) must be used if data is read using the alsize option.

Note that even if parental information is available on some of the pairs, it will not be used in the analyses.

Two methods are used to calculate p-values. The first uses a 2-sided chi2 test to compare the overall observed IBS sharing distribution with that predicted from the allele frequencies - note that this can produce spurious significant results when an excess of 0 sharers are present. The second method uses Lange's Z-statistic (which automatically takes into account multiple sibships) and produces a 1-sided p-value.

Note that the weight parameter only affects the chi2 results by reducing the effective contribution of multiple sibships - the Z-statistic does not require weighting.

References

  1. "The Affected Sib-pair Method using Identity by State Relations", K.Lange, Am.Jou. Hum.Genet., 148-150 (1986).
  2. "A Test Statistic for the Affected Sib-set Method", K.Lange, Ann.Hum.Genet., 50 283-290 (1986).

Routine: SIBMAP

This routine gives a graphical display of how sharing between siblings varies along the length of a chromosome, with options to estimate recombination fraction and named-locus order. The syntax is:

call sibmap( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional bestorderattempt to order loci using sharing data
halfsibshow half-siblings with paternal/maternal options
mapfuncselect map-function for distance estimates
maternalshow map for maternally-derived chromosomes
maxprobshow pairs having crossover probability above this threshold
minchangesshow pairs in which there are at least n changes of sharing
mindefiniteshow pairs in which at least n loci can be categorised definitely
paternalshow map for paternally-derived chromosomes
sortleftsort pairs by first recombination position from left
sortrightsort pairs by first recombination position from right
thetarecombination values to use with maxprob

For instance, the command

call sibmap( locus dis1 dis2 mk1 mk2 mk3 mk4 maternal sortleft );

gives a map of the allele sharing in maternally-derived chromosomes for the marker loci {mk1, mk2, mk3 and mk4, sorted to show pairs with the most sharing at the left end of the chromosome first.

Algorithm Notes

The problem of computing a metric for all possible arrangements of loci is called N-P complete, meaning that the time required is proportional to the factorial of the number of possible orders. For modest numbers (above 8) calculating all possible orders is computationally impractical, and instead gas uses a version of the Metropolis algorithm based on simulated annealing. Once a sufficiently good order has been produced, all sets of 3 adjacent loci are permuted to indicate regions where the order is least certain.

bestorder can take a numeric parameter n in which case the first n equivalently good orders (as produced by the triplet permutations described above) are listed in full.

References

None.

* Example *

The gasfile sib.gas reads g-format locus data from sib.loc, and g-format pedigree data from sib.ped. The sibdes routine is used to perform an IBD analysis, with results sent to sibp.out. The sibstate routine is used to perform an IBS analysis, with results sent to sibs.out. The sibmap routine is used twice, firstly to show the sharing of paternal chromosomes for the sib-pairs in which at least three of the named markers are unambiguously determined (results in sibm1.out), and then to to show only those pairs in which there are at least two changes in sharing status (results in sibm2.out). The latter analysis may be used to indicate the possibility of double recombinants.

* Example *

The gasfile bo.gas reads g-format locus data from bo.loc, and g-format pedigree data from bo.ped. The sibmap routine is used to determine the most probable order of the named loci 1-10 (which should be 1,2,3,...,10) with the results being written out to the file bo.out.

The dataset contains 20 nuclear families with each locus simulated as having 6 equally frequent alleles and a recombination fraction of 0.04 between adjacent loci. The results show that there are two equally good orders in which loci 8 and 9 are interchanged (20 sib-pairs is too small a dataset to expect sufficient crossovers between each locus to produce a unique best ordering).


Routine: SIBMWU

The sibmwu routine performs a non-parametric IBD analysis on a trait which is specified in terms of a quantitative locus.

call sibmwu( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
exactsizes of dataset below which p-values calculated exactly
signifsignificance level for linkage

For instance, the command

call sibmwu( locus humour mker );

assesses whether sibling pairs sharing more alleles at named locus `mker' are significantly more similar at quantitative locus `humour' than pairs sharing fewer alleles at `mker'.

Algorithm Notes

The sibmwu routine first ranks all sibling pairs according to the absolute difference in their value at a quantitative locus, then uses the Mann-Witney U-test to compare the distributions of these values within subsets of the sib-pair population, categorized according to the amount of IBD sharing at a named locus. A result may indicate linkage if the average rank of pairs decreases as the number of alleles shared IBD increases, and the p-values are 1-sided towards this direction.

The exact parameter controls the threshold above which the U-statistic is calculated approximately. For more details see the entry on the assmwu routine.

References

None.

Routine: SIBHE

This routine implements the Elston-Haseman algorithm for analyzing a quantitative trait using IBD sib-pair information.

call sibhe( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional absoluteuse absolute difference of values rather than square
dfweightcompensate for multi-pair sibships
empiricalpvcompute empirical p-values
graphdraw graphs of regression plots
knownonlyonly pairs with unambiguous sharing are used
sexualdo separate analyses for paternal and maternal sharing
showallpshow p-values for +ve slope regressions
signifthe value at which significant results are marked
useallpairs with no genetic sharing information are used
graphical psgraphicsregression plots with graph option

Algorithm Notes

The basic assumption of this method is that siblings sharing marker alleles near the quantitative trait locus will be more likely to have similar quantitative values than non-sharing siblings. Thus the mean value of the difference between siblings should decrease as the fraction of alleles shared increases. The sibhe routine performs a least-squares fit using allele sharing as the independent variable, and trait difference as the dependent variable. A significantly negative slope may be taken to indicate linkage.

sibhe implements 3 versions of the Haseman-Elston algorithm. The default is to use all pairs for which there is definite sharing information for either the paternal or maternal alleles (or their sum). The knownonly parameter causes gas to use only the pairs for which there is definite sharing information for both paternal and maternal alleles (this was the algorithm used by sibdreg in gas1.4). The useall parameter means that all pairs in a dataset with known quantitative values are used and if no sharing information is available for a pair then their expected IBD sharing is taken to be 1 - this happens under 3 circumstances,

  1. missing genotypes
  2. two homozygous parents
  3. an intercross mating (producing siblings who share either 0 or 2 alleles but it cannot be determined which).
If you have allele data for more than one named locus on a chromosome then, provided the recombination fraction between adjacent loci is less than 0.2, you will benefit from using the interval map version (sibihe) of this routine.

References

  1. "The investigation of linkage between a quantitative trait and a marker locus", J.K.Haseman and R.C.Elston, Behaviour Genetics 2, 3-19 (1972).

* Example *

The gasfile qsib.gas reads locus data from qt_mk1.loc and qt_level.loc with pedigree data from qtrait.ped. It calls sibhe and writes the results to qsibhe.out, then calls sibmwu and writes these results to qsibmwu.out. Plots of the points and best-fit lines for the sibhe regression are written to the file qsibhe.ps. Note the use of fprintf to add comments to the screen and output files.

Routine: SIBTABLE

This routine displays simultaneously the sib-pair sharing across a number of affection and marker loci.

call sibtable( locus locus_names... options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
optional halfsibshow half-sibling data

For instance, the command

call sibtable( locus dis1 dis2 mk1 mk2 mk3 mk4 paternal );

gives a table of the allele sharing in paternally-derived chromosomes for the marker loci mk1, mk2, mk3 and mk4.

Algorithm Notes

sibtable has no analytic functions, it's only purpose is to display the observed sharing in the pedigree data.

References

None.

Sib-Pair Interval Mapping

Sib-pair interval mapping is a multi-point method in which information from adjacent markers is used to infer missing or ambiguous allele sharing.

Calculation of Sharing Probabilities

The calculation of sharing probabilities is carried out in 4 steps:

  1. Loci at which the sharing status is definitely known are stored,
  2. Estimated sharing at ambiguous intercrosses is calculated (see below),
  3. Unknown-sharing loci are interpolated (see below) from the results of above 2 steps.
  4. Any inter-loci values are interpolated using results of above 3 steps.

Note that when the sharing at a particular locus (for a particular pair) cannot be assigned due to missing parental data, the algorithm in gas calculates the expected sharing purely from the known sharing at adjacent loci rather than attempting to infer parental genotypes. This strategy was adopted to prevent incorrect results being caused by wrongly specified allele frequencies, which is a particular problem with highly polymorphic markers.

Ambiguous Intercrosses

With some intercrosses it can be observed that, while the actual sharing is unobservable, either or The expected sharing at such a locus is calculated using Bayes' formula by conditioning on the nearest adjacent loci at which sharing can be definitely assigned.

To illustrate consider 3 consecutive loci X, I and Y with recombination fractions thetaXI and thetaIY. Suppose that paternal-maternal sharing is 1-0 at X, 1-1 at Y, and that I is an intercross at which the pair must either share 2 or 0 alleles IBD, then the expected paternal sharing at locus I is

=P(pat sharing=1 at I | data)
 
P(data | pat sharing=1 at I)
=_____________________________________________
P(data | pat sharing=1 at I) + P(data | pat sharing=0 at I)
 
VXIm(1-VIYm)VXIfVIYf
=_____________________________________________
VXIm(1-VIYm)VXIfVIYf +(1-VXIm)VIYm(1-VXIf)(1-VIYf)

writing V12s=theta12s2+(1-theta12s)2, where theta12s is the recombination fraction between loci 1 and 2 along the chromatid of sex s. If several intercross loci interact then Bayes' formula is extended over all the possible cases (n intercrosses generate 2n cases) simultaneously.

Interpolation

Once any ambiguous intercrosses have been resolved, the paternal and maternal sharing calculations are effectively decoupled. Suppose that a, b and c are 3 adjacent loci, and that the sharing (Sa, Sc) is definitely known at the outer loci a and c, but not at b in the centre, then Sb is calculated using the formula

Sb = [ (1-Vab)(1-Vbc)(1-Vac) -SaVbc(1-Vbc)(1-2Vab) -SbVab(1-Vab)(1-2Vbc) ] / Vac(1-Vac) ,

where Vij is the sex-specific `V' value between loci i and j for the chromatid pair under consideration. If b is at an end point of the region, so (for instance) a does not exist, then Vab=Vac=0.5 and the value of Sa is redundant. The same formula is used to interpolate intermediate values between the loci for stage [4] above.

Sib-pairs for which there is no IBD sharing information at any locus are not used by the interval mapping routines.

N.B. In most references the symbols V and S are generally denoted by Greek `psi' and `pi', however it wasn't possible to duplicate this using transportable html.

Parameters

Interval

By default the interval mapping routines infer sharing only at the actual loci listed. The interval parameter may be used to request that the expected sharing is calculated at points between the loci, thus adding the command option

call routine( ... interval 0.03 );

will generate extra points between the loci so that there is no region larger than theta=0.03 without such an interpolated value. Since recombination fractions cannot be added linearly (for instance twice 0.2 is 0.32) the steps taken will be smaller than the value specified after interval.

Showraw

The showraw parameter will display the results on a pair-by-pair basis after stage [3] of the sharing calculation described above.

References

  1. "Robust Multipoint Linkage Analysis: An Extension of the Haseman-Elston Method", J.M.Olsen, 177-193 (1995).

Routine: SIBIDES

This combines interval mapping with the t2 analysis in the sibdes routine.

call sibides( locus locus_names... theta recombination_fractions options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
thetalist of recombination fractions between marker loci
optional alltypesevaluate for non-affecteds also
intervaluse interval map with specified recombination
mapfuncselect distance mapping function
sexualdo separate analyses for paternal and maternal sharing
showrawdisplay raw sharing data
weightoptions are strict and hodge
graphical psgraphicsdisplays t2 and -log10(pvalue) along chromosome

Algorithm Notes

The expected sharing is calculated as described earlier in this chapter, and the 1-sided t2 test applied to the results. Do not confuse the graph of -log10(pvalue) with a lodscore statistic - the use of the logarithm is purely to allow the full range of data to be displayed on a sensible scale.

References

None.

Routine: SIBIHE

This routine combines interval mapping with the Haseman-Elston algorithm described in sibihe.

call sibihe( locus locus_names... theta recombination_fractions options... );

typeparameterdescription
compulsory locuslist the affection and marker loci to be analyzed
thetalist of recombination fractions between marker loci
optional absoluteuse absolute difference of values rather than square
dfweightcompensate for multi-pair sibships
empiricalpvcompute empirical p-values
graphdraw graphs of regression plots
intervaluse interval map with specified recombination
mapfuncselect distance mapping function
sexualdo separate analyses for paternal and maternal sharing
showrawdisplay raw sharing data
signifthe value at which significant results are marked
graphical psgraphicsdisplays -log10(pvalue), also regression plots with graph option

Algorithm Notes

The expected sharing is calculated as described earlier in this chapter, and the Haseman-Elston test is then applied as described in routine sibhe. Do not confuse the graph of -log10(pvalue) with a lodscore statistic - the use of the logarithm is purely to allow the full range of data to be displayed on a sensible scale.

The empiricalpv option computes empirical p-values for each dataset, and may be given a numeric parameter to control the number of simulations used to estimate these. Hence

empiricalpv 10

will compute 10 thousand replicates - if no number is given then the default value of 5 (giving 5000 replicates per calculation) is assumed.

References

None.

* Example *

The gasfile iplot.gas performs the sibides test on the affection locus a1 and the sibihe test on the quantitative locus `q1' using genotype data from 8 named loci labelled 1,2,...,8. The recombination values are different along the male and female chromatids. Results are written to the files iplot.out and iplot.ps.

Likelihood Calculations

The routines in this chapter are designed to perform `traditional' linkage analysis in which alternate hypotheses about genotype/phenotype interactions are tested by computing lodscores.

All of the `lik' routines use the Vitesse likelihood engine, which was devised and implemented by Jeff O'Connell. Vitesse is the fastest likelihood program currently extant (1996), capable of computing multipoint lodscores with highly polymorphic markers - see below for further details.

Vitesse

Jeff O'Connell's Vitesse program incorporates many new computational techniques which enable it to perform calculations impossible for other programs. In particular it is able to handle up to 8 loci simultaneously in multi-point lodscores and isn't slowed by highly polymorphic marker alleles. A more optimized (ie. faster) version of the likelihood engine is under construction and will be incorporated into gas as soon as it is fully tested.

Vitesse is undergoing continuous improvement, and while we believe that all the results produced are correct, there are restrictions on the types of data it can currently handle. These are:

  1. Only a single trait locus per calculation
  2. No sex-linked loci
  3. No inbreeding or consanguinity loops
  4. Only one `founding' nuclear unit per pedigree

Condition [4] means that there can only be one mating in any family in which all four grandparents are unknown (ie. not listed in the pedigree). Datasets which violate these conditions will cause the program to exit. Vitesse will eventually available as a stand-alone program with a `Linkage-like' interface via anonymous ftp. The data and control formats are compatible with version 5.1/5.2 of LINKAGE and version 2.3P of the FASTLINK program. Email jeff@sherlock.hgen.pitt.edu for more details on this.

Parameters

Several of the routines use a common syntax for performing particular tasks, and some of this is described below. Refer to the individual routine descriptions to see which features are available for each of them.

Support

Some functions are able to calculate support intervals about the location of a maximum lodscore (ie. the adjacent region where the lodscore is within a certain amount of its highest value). Hence if a lodscore has a peak of 6.3 at location X, then the support interval of height 1.5 is the adjacent region of the chromosome surrounding X on which the lodscore is continuously above 4.8

support value

If no value is supplied, a default of 1 is assumed.

Exclusion

An exclusion map shows regions of a chromosome where linkage is unlikely because the lodscore is significantly below zero. The exclude parameter is used to scan for such areas:

exclude value

If no value is supplied, a default of -2 is assumed, so that any region with lodscore of -2 or lower is marked as being excluded.

References

  1. "The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance", J.R.O'Connell and D.E.Weeks, Nature Genetics 11:402-408 (1995).

Routine: LIK2POINT

The lik2point routine performs a series of two-locus optimizations to determine the most probable recombination fractions between pairs of adjacent loci.

call lik2point( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of loci to be analyzed
optional allordersall possible pairs of loci are examined
excludeidentify exclusion region
findmaxfind maximum lodscores in each interval between fixed loci
mapfuncmapping function to use
signiflevel to declare linkage to be significantly probable
supportcalculate support interval
graphical psgraphicsplots of lodscores over range 0 < theta <= 1/2.

Algorithm Notes

The maximization (for the findmax option) is carried out using Brent's algorithm, taking as starting point the highest value found during the initial scan of the range 0 < theta <= 0.5. Further parameters are available to tune the internal performance of this maximization algorithm

typeparameterdescription
optional initstepnumber of steps in initial scan of interval 0 < theta <= 0.5, default 5.
maxtolmaximum tolerance in optimization, default 10-5.
maxiter maximum optimization iterations to attempt, default 100.

Under the vast majority of circumstances the default options will produce good results, however for `difficult' datasets you may try increasing initstep and maxiter.

References

None.

Routine: LIKMAP

The likmap routine generates a series of likelihoods giving the probabilities that a particular locus (called `movable') lies in various locations with respect to one or more other loci whose positions are specified (called `fixed').

call likmap( locfix locus_names... locmov locus_names... theta recombination_fractions options... );

typeparameterdescription
compulsory locfixlist of ordered fixed loci to analyze
locmovlist of movable loci to analyze
thetalist of recombination fractions between fixed loci
optional doallcalculate all values in subsets
dosetsuse subsets of fixed loci of this size
excludelevel to indicate linkage is excluded
findmaxfind maximum likelihood position in each interval
mapfuncmapping function to use
marginthe minimum distance between fixed and movable loci
showrawdisplay `actual' likelihoods
signiflevel to declare linkage to be significantly probably
stepthe number of steps to take between adjacent fixed loci
graphics psgraphicslodscore map across the interval

Algorithm Notes

It is essential to set the dosets parameter if more than 8 fixed loci are used - otherwise the computation time and space are likely to be prohibitive. For optimal performance dosets should be an even number, with a value of 4 (the default) or 6 generally producing good results (also, graphical output will be messy for odd numbers of fixed loci since many points will have two values plotted). The text output displays only the two recombination fractions to either side of the movable locus, since for each order the others are fixed by the input parameters.

The maximization is carried out using Brent's algorithm, taking as starting point the highest value found whilst constructing the map (the resolution of which may be changed using the step parameter).

References

None.

* Example *

The gasfile twop.gas uses lik2point to calculate the most-probable recombination fractions between a series mka - mkd of 4 adjacent marker loci. Distances are given in Morgans (M) using the Kosambi map.

* Example *

The gasfile map.gas demonstrates the use of likmap to create a table showing the likelihood of the loci try1 and try2 being at various locations along a chromosome on which the five markers mka - e have previously been positioned. Distances are shown using the Haldane map.

Routine: LIKSINGLE

The liksingle routine performs a single likelihood calculation for a fixed set of loci and recombination fractions.

call liksingle( locus locus_names... theta recombination_fractions options... );

typeparameterdescription
compulsory locuslist of ordered loci to be analyzed
thetalist of recombination fractions between fixed loci
optional genlodcomputes Ott's generalized lodscore

Algorithm Notes

Because of internal variations in algorithms, the likelihoods calculated by two programs for the same dataset may vary enormously. However the ratio of two likelihoods (as computed by the same program) should be invariant between programs, and thus the lodscores produced by such programs should be very similar.

The generalized lodscore compares the likelihood against the value when all the recombination fractions are set to 1/2. For a single pair of loci it is identical to the normal lodscore.

References

None.

* Example *

The gasfile sin.gas demonstrates the use of liksingle to show the likelihood of mka, mkb and mkc lying on the same chromatid separated by recombination fractions theta=0.35 and 0.31.

Association Analysis

The routines for association analysis look for correspondences between the occurrences of particular alleles of named loci and the values of traits in the population.

To perform association tests it is essential that the names of alleles be the same in different families (eg. named allele `1' must represent the same physical marker in the whole population). This means that global binning (preferably with fixed bin sizes) must be used if data is read using the alsize option.


Routine: ASSTDT

The asstdt routine performs association analysis between a marker and an affection locus using the Transmission Dis-equilibrium Test.

call asstdt( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of affection and marker loci to analyze
optional sexualshow separate analysis for paternal and maternal alleles
signifset significance criteria
weightreduce contribution of multiple sibships

Algorithm Notes

The standard algorithm follows Spielman's advice of treating all children as independent observations, summing their transmitted and non-transmitted alleles, and calculating the significance using the exact 1-sided binomial distribution.

Some authors suggest that only one child should be used from each mating, and that this child be selected according to fixed ascertainment criteria. To employ this strategy you need to remove the other children from the pedigree file before running asstdt. If there are several children within a family which satisfy the ascertainment criteria equally well (so that selecting a particular one would be arbitrary), then the weight option will calculate the average contribution from each of these `equivalent' children and treat this as being the contribution due to a single child. Since the weight option may result in non-integer totals, the chi2 distribution (with 1 degree of freedom) is used to calculate the significance.

References

  1. "Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-Dependent Diabetes Mellitus", R.S.Spielman, R.E.McGinnis, W.J.Ewens, Am.Jou.Hum.Gen. 52:506:516 (1993).

Routine: ASSCOMPARE

The asscompare routine compares the allele frequencies between two groups of subjects denoted by y/n values at an affection locus.

call asscompare( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of affection and marker loci to analyze
optional sexualshow males and females separately
signifset significance criteria
usealluse half-known genotypes

Algorithm Notes

For a locus with n alleles, gas constructs a 2xn contingency table showing how often each allele occurs in the two populations (ie. the sets of people labelled y and n). A chi2 test is used to assess to what extent the allele frequencies differ between the populations. Each of the alleles is also tested individually by grouping all of the other alleles into a single bin and performing chi2 tests on the 2x2 contingency tables produced.

References

None.

Routine: ASSMWU

The assmwu routine performs association analysis between a marker and a quantitative locus, using the Mann-Witney U-Test (equivalent to the Wilcoxon Rank-Sum test).

call assmwu( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of quantitative and marker loci to analyze
optional allinfogive extra information
cutoffcutoff for displaying p-values
exactsizes of dataset below which p-values calculated exactly
sexualshow separate results by subject sex
signifset significance criteria

Algorithm Notes

Two tests are performed. The first treats each allele as a separate observation, so that a subject with genotype `1 3' appears in both the `1' and `3' allele categories, and a homozygous subject appears twice in the same category. Subjects with half-known genotypes (eg. `1 x') contribute a single observation. The alleles are ranked one at a time (ie. each allele versus all the others) according to the quantitative values associated with them.

The second test categorizes subjects according to whether they do or do not have a particular allele. The ranks of the subjects (according to the quantitative trait) who have each allele are compared with those who do not have the allele to indicate if the allele tends to be associated with subjects who are biased in a particular direction away from the mean. Subjects with half-known genotypes are not used.

Note that exact calculation of p-values requires a large amount of time and memory (RAM is approximately proportional to N2M2/4 where N and M are the sizes of the datasets being compared) and the optimal values for exact will depend on your computer. If gas halts with an out-of-memory message, reduce one or both of the exact values. For example, the command

call assmwu( locus weight mar1 exact 20 50 );}

performs the Mann-Witney U-test on the quantitative locus weight against the marker mar1. P-values are calculated by a Gaussian approximation unless there are less than 20 instances of a particular allele versus a set of 50 instances of other alleles.

References

None.

* Example *

The gasfile assoc.gas reads pedigree data from the file assoc.ped. The asstdt routine is used on the loci disease and marker1, and the assmwu test is used on response and marker1. The results are written to the files tdt.out and mwu.out respectively.

Routine: ASSRELPREF

The assrelpref routine performs association analysis between a marker and an affection locus, using the Relative Predispositional Effect technique.

call assrelpref( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of quantitative and marker loci to analyze
optional alltypesresults are shown for non-affected subjects
signifset significance criteria
sexualmales and females are analyzed separately

For example, the command

call assrelpref( locus spotty mar1 );

performs the RPE analysis on the affection locus spotty in terms of the alleles of the marker locus mar1.

Algorithm Notes

The RPE method calculates a p-value for each allele individually according to the formula chi2 =(Oi-Ei)2/Ei where Oi is the observed number of occurrences of allele i in the dataset, and Ei is the number expected according to the input allele frequencies. The total p-value for the dataset is calculated by adding together the non-zero alleles and using a chi2 test with degrees of freedom one less than the number of non-zero alleles.

If the total p-value is less than the significance criteria (which may be altered with the signif parameter) then the allele with the smallest p-value is removed from the dataset and the expected frequencies are re-calculated as though that allele did not exist. This procedure is then repeated until the total p-value becomes non-significant.

References

  1. "RPEs of Marker Alleles with Disease: HLA-DR Alleles and Graves Disease", Payami et.al., Am.J.Hum.Genet 541-546 (1989).

Routine: ASSGENORR

The assgenorr routine performs association analysis between a marker and a condition using the Genotype Relative Risk method, in which the observed distribution of named alleles in a subset of the pedigree is compared to that predicted from the input allele frequencies (entered earlier using the set locus command) under the assumption of Hardy-Weinberg equilibrium.

call assrelpref( locus locus_names... allele allele_names... options... );

typeparameterdescription
compulsory locuslist of marker (and optionally affection) loci to analyze
allelelist of alleles of the marker loci to analyze
optional inpairscompare two single genotypes
incommoncompare genotype against others with allele in common
allothercompare genotype against all others
signifset significance criteria

If no affection loci are listed, then the whole of the pedigree is compared to the input allele frequencies, and the risks computed refer to the probability of a random member of the population being selected to form part of the dataset (for optimum performance the members of the pedigree should not be related).

The parameters inpairs, incommon, and allother may be combined in a single command.

Inpairs

The inpairs option tests listed pairs of alleles against each other. For example, the command

call assgenorr( locus spotty mar1 allele 1 2 3 4 inpairs );

calculates the relative risk of subjects having genotype 1 2 at locus mar1 of being affected at locus spotty, compared to subjects with genotype 3 4. More than two pairs of alleles can be listed.

Incommon

The incommon option tests specific allele pairs against all the haplotypes sharing a particular allele in common with them. The command

call assgenorr( locus mar2 allele a1 a2 incommon );

calculates (separately) the relative risks of subjects having genotype a1 a1 , a1 a2 and a2 a2 at locus mar2 of being affected at locus spotty, compared to subjects sharing one of these alleles.

Allother

The allother option tests specific allele pairs against all other allele pairs simultaneously. The command

call assgenorr( locus hairy mar3 allele alpha beta allother );

calculates (separately) the relative risks of subjects having genotype alpha alpha, alpha beta and beta beta at locus mar3 of being affected at locus hairy, compared to subjects having any other combination of alleles.

Algorithm Notes

The genotype relative risk (RAB) of genotype set A individuals compared to genotype set B individuals is calculated according to the formula RAB= (OA+0.5)FB / (OB+0.5)FA , where OA and OB are the respective observed counts of the genotypes in the dataset, and FA and FB are the expected counts based on the input allele frequencies under the assumption of Hardy-Weinberg equilibrium.

References

  1. "Estimating Genotype Relative Risks", M.Lathrop, Tissue Antigens 22, 160-166 (1983).

* Example *

The gasfile grr.gas reads pedigree data from assoc.ped and uses the assgenorr method to compare the haplotype 2 3 (at locus marker) against 1 3 (inpairs results in grrpair.out), to compare the haplotype 1 2 against all other haplotypes having allele 1then against all other haplotypes having allele 2 (incommon results in grrcom.out), and lastly to compare haplotype 3 3 against every othe possibility for the subjects affected at locus disease (allother results in grrall.out).

Routine: ASSEMPLOG

The assemplog routine performs association analysis between a pair of named or affection loci and a condition using the Empirical Logistic method.

call assemplog( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of marker and affection loci to analyze
optional showrawdisplay raw Z terms and variances
signifset significance criteria

At least three loci (one of which must be an affection status) must be listed. If named loci are listed then all their alleles are tested in a pairwise fashion.

Algorithm Notes

In cases where there are no subjects possessing a particular genotype/phenotype combination, the variance of the statistic becomes infinite and a p-value of 1 is returned.

References

  1. "Association between HLA Antigens and the Presence of Certain Diseases", J.R.Green et.al., Statistics in Medicine Vol.2, 79-85 (1983).

* Example *

The file elm.gas read data from elm.loc and elm.ped and performs the assemplog analysis. (the subjects are all `singletons' and gas will generate warnings about this - press `c' to continue at each stage (if you set maxwarnings to a large value, most of the repeated questioning will cease). Results are writen to the files elm.1a, elm.1b and elm.1c.

Routine: ASSHAPRR

The asshaprr routine performs association analysis between a marker and an affection locus using the Haplotype Relative Risk Test.

call asshaprr( locus locus_names... options... );

typeparameterdescription
compulsory locuslist of marker and affection loci to analyze

Algorithm Notes

None.

References

None.

Haplotyping

Haplotyping is the process of determining which alleles in an un-ordered genotype are descended from each of a subjects parents, and thus (when this is done for several linked loci) re-constructing segments of the chromatids within each subject.


Routine: HAPCHILD

The hapchild routine determines the allelic phase of the genotypes of children within the input population. For those children at which the phase can definitely be decided for all or their alleles, the observed haplotypes are ordered in decreasing frequency. The syntax is:

call hapchild( locus locus_names... options );

typeparameterdescription
compulsory locuslist of affection and marker loci to analyze
optional sexualshow separate analysis for paternal and maternal chromatids

Algorithm Notes

The hapchild routine only uses the alleles for which the parental origin can be definitely determined - there is no attempt to assign probabilities to ambiguous cases (which are marked x and ignored when counting haplotype frequencies). The haplotypes are listed in order of decreasing frequency.

References

None.

* Example *

The gas-file chap.gas loads pedigree data from chap.ped and uses hapchild to calculate the most frequently occurring haplotypes in the children. Results are sent to the file chap.out.
End of Gas Manual v2.3