The NEXUS format is a format for storing phylogenetic information, including trees, morphological characters and nucleotide sequences.
RadCon can input trees stored in the NEXUS format.
The NEXUS format is formally described by Maddison et al. (1997). Further information is also available from the NEXUS webpage.
The essential feature of the NEXUS format is the modularisation of the phylogenetic information.
A NEXUS file is divided into one or more blocks which house different kinds of phylogenetic information. The blocks are composed of a series of commands and the commands composed of a series of tokens.
Modularisation facilitates both the sharing of files between programs and the extension of the format as the skipping of irrelevant or unrecognised blocks, commands or tokens is a simple operation.
Another feature of the NEXUS format is case insensitivity which means that all tokens, except names, with the same sequence of characters are equivalent irrespective of case. Thus, for example, the BEGIN token can be written BEGIN or begin.
An example NEXUS tree file
#NEXUS [This is a comment. It is for you not RadCon] begin taxa; dimensions ntax = 11; taxlabels leaf_a leaf_b leaf_c leaf_d leaf_e leaf_f leaf_g leaf_h leaf_i leaf_j leaf_k; endblock; BEGIN TREES; TREE tree1 = [&R] [&W 1] ((leaf_e,(leaf_d,(leaf_c,(leaf_b,leaf_a)))),(leaf_f,(leaf_g,(leaf_h,(leaf_i,(leaf_j,leaf_k)))))); END; BEGIN TREES; TRANSLATE 1 leaf_a, 2 leaf_b, 3 leaf_c, 4 leaf_d, 5 leaf_e, 6 leaf_f, 7 leaf_g, 8 leaf_h, 9 leaf_i, ten leaf_j, eleven leaf_k; TREE tree1 = ((leaf_e,(leaf_d,(leaf_c,(leaf_b,leaf_a)))),(leaf_f,(leaf_g,(leaf_h,(leaf_i,(leaf_j,leaf_k)))))); TREE tree1 = [&R] [&W 1] ((5,(4,(3,(2,1)))),(6,(7,(8,(9,(ten,eleven)))))); END; |
This NEXUS file consists of a comment, a TAXA block, which is in lower case, and predefines 11 leaves and two TREES block, which are in upper case, and between them contain three trees (each tree is defined by a single TREE command). The three trees are identical. They both have the name 'tree1', are rooted, have a weight of 1 and posit the same set of cladistic relationships for the 11 leaves. The tree descriptions differ in appearance because the first tree in the second TREES block refers to the leaves by the arbitrary tokens defined in the TRANSLATE command while the two other tree descriptions refer to the leaves by name.
When presented with multiple TREES blocks phylogenetic programs typically only read in the trees stored in one of the TREES block, RadCon reads in all the trees, treating them as constituting a single set of trees or multiple sets of trees.
The NEXUS standard is for the leaves to be predefined in a TAXA or DATA block and the trees in succeeding TREES block to contain all and only these leaves.
In RadCon this standard can be relaxed and trees permitted to have missing or undefined leaves.
RadCon is unable to read DATA blocks. A TREES block with no TAXA block can be input to RadCon if undefined leaves are permitted.
RadCon does not require trees to have the complete complement of predefined leaves.
This allows trees with non-identical leaf sets to be input and allows the production of supertrees.
If any predefined leaves are missing from all the trees they are listed in the Log window and deleted from memory.
If trees have leaves that were not predefined RadCon either 1) treats this as an input error or 2) redefines the leaf set 'on the fly' to include them.
The adopted protocol for handling undefined leaves can be set using the Preferences option in the Edit menu.
Allowing the leaf set to be redefined 'on the fly' makes the TAXA or DATA block unnecessary (although the presence of the TAXA block may effect the ordering of leaves in memory) and facilitates the production of supertrees as it allows TREE commands from different files to be pasted into a single TREES block.
Tree descriptions describe the structure of trees.
The NEXUS format adopts the Newick format for storing tree descriptions.
The second token in the TREE command defines a trees name.
The presence of the [&R] or [&U] rooting command comment in the TREE command indicates if the tree is rooted or unrooted, respectively.
The rooting command comment must directly precede the tree description.
If no rooting command comment is present the tree is rooted or unrooted if the first token in the TREE command is TREE or UTREE, respectively.
The weight of a tree is specified by a command comment of the form [&W 1] in the TREE command.
The weighting command comment must directly precede the tree description.
If no weighting command comment is present the tree is assigned the default weight of 1.
The '*' character in the TREE command designates a tree the default tree.
The default tree is the tree viewed in the Tree Window.
Multiple TREES blocks are treated as constituting either separate sets of trees or a single set of trees.
The adopted protocol can be set using the Preferences option in the Edit menu.
If multiple TREES blocks are treated as constituting a single set of trees then each time a TAXA block is encountered the leaf set is redefined to include any new leaves.
If multiple TREES blocks are taken to constitute separate sets of trees then each time a TAXA block is encountered the current set of leaves is reset.
RadCon's response to the presence of undefined leaves is independent of its treatment of multiple TREES blocks. If undefined leaves are added to the current leaf set 'on the fly' then the presence of TAXA blocks is effectively irrelevant (although it may affect the ordering of leaf names in memory).Although undefined leaves in a TRANSLATE command can be added to the current leaf set 'on the fly', the correspondence between arbitrary labels and leaves defined by each TRANSLATE command is only valid in the TREES block containing the TRANSLATE command.
The first token in a NEXUS file must be
#NEXUS
A NEXUS file consists of a series of tokens.
A token is a word, i.e. series of characters separated by whitespace, or a punctuation character.
Examples include the BEGIN and semicolon tokens.
Tokens are arranged according to a predefined syntax (see Maddison et al., 1997) to produce a series of commands.
There are 10 special characters ';,():=[]&*' which are treated as tokens and provide punctuation in NEXUS files.
Tokens are arranged according to a predefined syntax (see Maddison et al., 1997) to produce a series of commands.
A command begins with a token specifying the name of the command and is terminated with the semicolon token punctuation character.
Examples include the BEGIN, ENDBLOCK, TRANSLATE and TREE commands.
Commands are arranged according to a predefined syntax (see Maddison et al., 1997) to produce one or more blocks.
Commands are arranged according to a predefined syntax (see Maddison et al., 1997) to produce one or more blocks.
The start of a block is indicated by the BEGIN command.
The termination of a block is indicated by the ENDBLOCK command.
Examples include the TAXA, TREES, DATA and CHARACTERS blocks.
The blocks store different kinds of phylogenetic information.
Comments can be added to NEXUS files by enclosing text within the punctuation characters [ and ].
Comments, with the exception of command comments, provide information for the user not the program.
Command comments are comments which provide phylogenetic information for programs.
Command comments have the & punctuation character at the start of the commented text, i.e., [& commented text].
Examples include the tree rooting and tree weighting command comments.
The BEGIN token
BEGIN
indicates the start of a BEGIN command.The semicolon token
;
indicates the end of a command.The semicolon token is a punctuation character.
The BEGIN command indicates the start of a block.
A BEGIN command is of the formBEGIN NAME_OF_BLOCK;
where the second token specifys the type of the block.Examples include the the TAXA, TREES, DATA and CHARACTERS blocks.
The ENDBLOCK command
ENDBLOCK;
which can be shortened to
END;
indicates the termination of a block.The DIMENSIONS command is TAXA block command that specifies the number of leaves defined in the TAXA block.
An example DIMENSIONS command
DIMENSIONS NTAX=10;
This DIMENSIONS command indicates that the TAXA block will define 10 leaves.
The TAXLABELS command is TAXA block command that specifies the names of the leaves defined in the TAXA block.
The leaf names must be valid and unique.
An example TAXLABELS command
TAXLABELS Fulmarus_glacialis, Calonectris_diomedea, Puffinus_puffinus, Puffinus_yelkouan;
This TAXLABELS command specifies the names of four leaves.
The TREE command is a TREES block command that specifies the name, rooting, weight and structure of one tree.
An example TREE command
TREE tree1 = [&R] [&W 1/5]
(((Puffinus_puffinus,Puffinus_yelkouan),Calonectris_diomedea),Fulmarus_glacialis);
This TREE command specifies that the tree structure defined by the tree description is rooted, has a weight of 0.2 and has the name tree1.
The TRANSLATE command is an optional TREES block command. It informs a program of arbitrary labels that may be used to represent leaves in the tree descriptions. These labels are often integers and this standard was introduced to allow a reduction in the size of tree descriptions.
An example TRANSLATE command
TRANSLATE
1 Fulmarus_glacialis,
2 Calonectris_diomedea,
5 Puffinus_puffinus,
yel Puffinus_yelkouan;
The TAXA block stores information about leaves (taxa). A TAXA block must contain one DIMENSIONS command and one TAXLABELS command in that order. The number of leaves specified by the DIMENSIONS command must correspond to the number of leaf names in the TAXLABELS command.
An example TAXA block
BEGIN TAXA;
DIMENSIONS NTAX = 4;
TAXLABELS Fulmarus_glacialis Calonectris_diomedea Puffinus_puffinus
Puffinus_yelkouan;
ENDBLOCK;
This TAXA block defines four leaves.
The TREES block stores information about trees. A TREES block must contain at least one TREE command and can contain a single optional TRANSLATE command which must precede the TREE commands.
An example TREES block
BEGIN TREES;
TRANSLATE
1 Fulmarus_glacialis,
2 Calonectris_diomedea,
5 Puffinus_puffinus,
yel Puffinus_yelkouan;
TREE tree1 = [&R] [&W 1/5]
(((Puffinus_puffinus,Puffinus_yelkouan),Calonectris_diomedea),Fulmarus_glacialis);
TREE tree2 = (((5,yel),2),1);
ENDBLOCK;
This TREES block specifies two trees. Structurally the trees are identical, although the tree descriptions differ. This is because the first tree description refers to the leaves by their actual names whilst the second tree description refers to the leaves by their corresponding labels defined by the TRANSLATE command.
Go to the Table of Contents.
This page is maintained by joe@poissonconsulting.ca
© Copyright 1999, 2000, 2001, Joseph L. Thorley and Mark Wilkinson. All rights reserved.