seqaln(aln, id=NULL, profile=NULL, exefile="muscle", outfile="aln.fa", protein=TRUE, seqgroup=FALSE, refine=FALSE, extra.args="", verbose=FALSE, web.args = list())
seqbind
, or an alignment list object as obtained from
read.fasta
. read.fasta
). The alignment
aln
will be added to the profile. Create multiple alignments of amino acid or nucleotide sequences according to the method of Edgar.
Sequence alignment attempts to arrange the sequences of protein, DNA or RNA, to highlight regions of shared similarity that may reflect functional, structural, and/or evolutionary relationships between the sequences.
Aligned sequences are represented as rows within a matrix. Gaps (-) are inserted between the aminoacids or nucleotides so that equivalent characters are positioned in the same column.
This function calls the MUSCLE program to perform a multiple sequence alignment, which must be installed on your system and in the search path for executables. If local MUSCLE can not be found, alignment can still be performed via online web services (see below) with limited features.
If you have a large number of input sequences (a few thousand), or they are
very long, the default settings may be too slow for practical
use. A good compromise between speed and accuracy is to run just the
first two iterations of the MUSCLE algorithm by setting the
extra.args
argument to -maxiters 2.
You can set MUSCLE to improve an existing alignment by setting
refine
to TRUE.
To inspect the sequence clustering used by MUSCLE to produce
alignments, include -tree2 tree.out in the extra.args
argument. You can then load the tree.out file with the
read.tree function from the ape package.
CLUSTALO can be used as an alternative to MUSCLE by
specifiying exefile='clustalo'
. This might be useful e.g. when
adding several sequences to a profile alignment.
If local MUSCLE or CLUSTALO program is unavailable,
the function will attempt to perform alignment via the EMBL-EBI Web Services
(See http://www.ebi.ac.uk). In this case, the argument web.args
cannot be empty and must contain at least user's E-Mail address.
Note that as stated by EBI, a fake email address may result
in your jobs being killed and your IP, organisation or entire domain being
black-listed (See http://www.ebi.ac.uk/Tools/webservices/help/faq).
Possible parameters to be passed via web.args
include:
An example of usage is web.args=list(email='user_id@email.provider')
.
"fasta"
with the following components:
Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.
MUSCLE is the work of Edgar: Edgar (2004) Nuc. Acid. Res. 32, 1792--1797.
Full details of the MUSCLE algorithm, along with download and installation instructions can be obtained from: http://www.drive5.com/muscle.
A system call is made to the MUSCLE program, which must be installed on your system and in the search path for executables. See http://thegrantlab.org/bio3d/tutorials/installing-bio3d for instructions of how to install this program.
##-- Basic sequence alignemnt seqs <- get.seq(c("4q21_A", "1ftn_A"))Warning message: Removing existing file: seqs.fasta|======================================================================| 100%aln <- seqaln(seqs) ##-- add a sequence to the (profile) alignment seq <- get.seq("1tnd_A")Warning message: Removing existing file: seqs.fastaaln <- seqaln(seq, profile=aln) ##-- Read a folder/directory of PDB files #pdb.path <- "my_dir_of_pdbs" #files <- list.files(path=pdb.path , # pattern=".pdb", # full.names=TRUE) ##-- Use online files files <- get.pdb(c("4q21","1ftn"), URLonly=TRUE) ##-- Extract and store sequences raw <- NULL for(i in 1:length(files)) { pdb <- read.pdb(files[i]) raw <- seqbind(raw, pdbseq(pdb) ) } ##-- Align these sequences aln <- seqaln(raw, id=files, outfile="seqaln.fa") ##-- Read Aligned PDBs storing coordinate data pdbs <- read.fasta.pdb(aln)pdb/seq: 1 name: http://www.rcsb.org/pdb/files/4q21.pdb pdb/seq: 2 name: http://www.rcsb.org/pdb/files/1ftn.pdb## Sequence identity seqidentity(aln)http://www.rcsb.org/pdb/files/4q21.pdb http://www.rcsb.org/pdb/files/4q21.pdb 1.000 http://www.rcsb.org/pdb/files/1ftn.pdb 0.352 http://www.rcsb.org/pdb/files/1ftn.pdb http://www.rcsb.org/pdb/files/4q21.pdb 0.352 http://www.rcsb.org/pdb/files/1ftn.pdb 1.000## Note that all the above can be done with the pdbaln() function: #pdbs <- pdbaln(files) ##-- For identical sequences with masking use a custom matrix aa <- seqbind(c("X","C","X","X","A","G","K"), c("C","-","A","X","G","X","X","K")) aln <- seqaln(aln=aln, id=c("a","b"), outfile="temp.fas", protein=TRUE, extra.args= paste("-matrix", system.file("matrices/custom.mat", package="bio3d"), "-gapopen -3.0 ", "-gapextend -0.5", "-center 0.0") )
read.fasta
, read.fasta.pdb
,
get.seq
, seqbind
,
pdbaln
, plot.fasta
,
blast.pdb