complex

 

Function

Find the linguistic complexity in nucleotide sequences

Description

Usage

Here is a sample session with complex


% complex -omnia 
Find the linguistic complexity in nucleotide sequences
Input nucleotide sequence(s): tembl:*
Window length [100]: 
Step size [5]: 
Minimum word length [4]: 
Maximum word length [6]: 
Output file [hs989235.complex]: 
output sequence(s) [hs989235.fasta]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-sequence]          seqall     Nucleotide sequence(s) filename and optional
                                  format, or reference (input USA)
   -lwin               integer    [100] Window length (Any integer value)
   -step               integer    [5] Displacement of the window over the
                                  sequence (Any integer value)
   -jmin               integer    [4] Minimum word length (Integer from 2 to
                                  20)
   -jmax               integer    [6] Maximum word length (Integer from 2 to
                                  50)
  [-outfile]           outfile    [*.complex] Output file name
*  -outseq             seqoutall  [.] Sequence set(s)
                                  filename and optional format (output USA)

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -omnia              toggle     [N] Calculate over a set of sequences
   -sim                integer    [0] Calculate the linguistic complexity by
                                  comparison with a number of simulations
                                  having a uniform distribution of bases (Any
                                  integer value)
   -freq               boolean    [N] Execute the simulation of a sequence
                                  based on the base frequency of the original
                                  sequence
   -print              boolean    [N] Generate a file named UjTable containing
                                  the values of Uj for each word j in the
                                  real sequence(s) and in any simulated
                                  sequences
   -ujtablefile        outfile    [complex.ujtable] Program complex temporary
                                  output file

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   "-ujtablefile" associated qualifiers
   -odirectory         string     Output directory

   "-outseq" associated qualifiers
   -osformat           string     Output seq format
   -osextension        string     File name extension
   -osname             string     Base file name
   -osdirectory        string     Output directory
   -osdbname           string     Database name to add
   -ossingle           boolean    Separate file for each entry
   -oufo               string     UFO features
   -offormat           string     Features format
   -ofname             string     Features file name
   -ofdirectory        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
-lwin Window length Any integer value 100
-step Displacement of the window over the sequence Any integer value 5
-jmin Minimum word length Integer from 2 to 20 4
-jmax Maximum word length Integer from 2 to 50 6
[-outfile]
(Parameter 2)
Output file name Output file  
-outseq Sequence set(s) filename and optional format (output USA) Writeable sequence(s)  
Additional (Optional) qualifiers Allowed values Default
(none)
Advanced (Unprompted) qualifiers Allowed values Default
-omnia Calculate over a set of sequences Toggle value Yes/No No
-sim Calculate the linguistic complexity by comparison with a number of simulations having a uniform distribution of bases Any integer value 0
-freq Execute the simulation of a sequence based on the base frequency of the original sequence Boolean value Yes/No No
-print Generate a file named UjTable containing the values of Uj for each word j in the real sequence(s) and in any simulated sequences Boolean value Yes/No No
-ujtablefile Program complex temporary output file Output file complex.ujtable

Input file format

Input files for usage example

'tembl:*' is a sequence entry in the example nucleic acid database 'tembl'

Output file format

Sequence TEMBL:HHTETRA contains repeats and is included in the test database for repeat analysis.

Output files for usage example

File: complex.ujtable


File: hs989235.complex

Length of window : 100 
jmin : 4 
jmax : 6 
step : 5 
Execution without simulation 
----------------------------------------------------------------------------
|                  |                  |                  |                  |
|     number of    |      name of     |     length of    |      value of    |
|     sequence     |     sequence     |     sequence     |     complexity   |
|                  |                  |                  |                  |
----------------------------------------------------------------------------
         1                    HS989235            495             0.7210 
         2                    AB009602            561             0.6688 
         3                      HSCAD5           3170             0.6921 
         4                         HSD            781             0.6991 
         5                      HSEGL1           3919             0.6618 
         6                       HSFAU            518             0.6739 
         7                      HSFAU1           2016             0.7105 
         8                       HSFOS           6210             0.6681 
         9                       HSEF2           3075             0.6925 
        10                        HSHT           1658             0.7314 
        11                       HSTS1          18596             0.6668 
        12                      HSNFG9          33760             0.6661 
        13                    AB000095           2399             0.6569 
        14                    AB009062            532             0.6465 
        15                     HSFERG1            512             0.5609 
        16                     HSFERG2           1132             0.7217 
        17                    AC004629         116019             0.6478 
        18                    AP000504         100000             0.6611 
        19                    AF129756         184666             0.6562 
        20                    AB000360           2582             0.6710 
        21                       HSHBB          73308             0.6544 
        22                     CEZK637          40699             0.6307 
        23                      PDRHOD           1675             0.6201 
        24                    AAHSP70B           4712             0.7110 
        25                      GMGL01           3400             0.3981 
        26                    GMLLBPS1            852             0.5986 
        27                    GMLLBPS2           1698             0.5225 
        28                       ECLAC           7477             0.7137 
        29                      ECLACA           1832             0.6916 
        30                      ECLACI           1113             0.7480 
        31                      ECLACY           1500             0.6801 
        32                      ECLACZ           3078             0.7278 
        33                      PAAMIB           1212             0.6596 
        34                      PAAMIE           1065             0.6418 
        35                      PAAMIR           2167             0.6562 
        36                      PAAMIS           1130             0.6989 
        37                        MMAM            366             0.7163 
        38                       RNOPS           1493             0.6571 
        39                    RNU68037           1218             0.6381 
        40                   HSA203YC1            389             0.4327 
        41                     HHTETRA           1272             0.3114 
        42                    XLRHODOP           1684             0.7193 
        43                     XL23808           4734             0.7180 
        44                    AF123456           1510             0.5913 
        45                    AF123457           1634             0.7254 

File: hs989235.fasta

>HS989235 H45989.1 yo13c02.s1 Soares adult brain N2b5HB55Y Homo sapiens cDNA clone IMAGE:177794 3', mRNA sequence.
ccggnaagctcancttggaccaccgactctcgantgnntcgccgcgggagccggntggan
aacctgagcgggactggnagaaggagcagagggaggcagcacccggcgtgacggnagtgt
gtggggcactcaggccttccgcagtgtcatctgccacacggaaggcacggccacgggcag
gggggtctatgatcttctgcatgcccagctggcatggccccacgtagagtggnntggcgt
ctcggtgctggtcagcgacacgttgtcctggctgggcaggtccagctcccggaggacctg
gggcttcagcttcccgtagcgctggctgcagtgacggatgctcttgcgctgccatttctg
ggtgctgtcactgtccttgctcactccaaaccagttcggcggtccccctgcggatggtct
gtgttgatggacgtttgggctttgcagcaccggccgccgagttcatggtngggtnaagag
atttgggttttttcn
>AB009602 AB009602.1 Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds.
gttcgatgcctaaaataccttcttttgtccctacacagaccacagttttcctaatggctt
tacaccgactagaaattcttgtgcaagcactaattgaaagcggttggcctagagtgttac
cggtttgtatagctgagcgcgtctcttgccctgatcaaaggttcattttctctactttgg
aagacgttgtggaagaatacaacaagtacgagtctctcccccctggtttgctgattactg
gatacagttgtaatacccttcgcaacaccgcgtaactatctatatgaattattttccctt
tattatatgtagtaggttcgtctttaatcttcctttagcaagtcttttactgttttcgac
ctcaatgttcatgttcttaggttgttttggataatatgcggtcagtttaatcttcgttgt
ttcttcttaaaatatttattcatggtttaatttttggtttgtacttgttcaggggccagt
tcattatttactctgtttgtatacagcagttcttttatttttagtatgattttaatttaa
aacaattctaatggtcaaaaa
>HSCAD5 X59796.1 H.sapiens mRNA for cadherin-5
ctccactcacgctcagccctggacggacaggcagtccaacggaacagaaacatccctcag
cccacaggcacgatctgttcctcctgggaagatgcagaggctcatgatgctcctcgccac
atcgggcgcctgcctgggcctgctggcagtggcagcagtggcagcagcaggtgctaaccc
tgcccaacgggacacccacagcctgctgcccacccaccggcgccaaaagagagattggat
ttggaaccagatgcacattgatgaagagaaaaacacctcacttccccatcatgtaggcaa
gatcaagtcaagcgtgagtcgcaagaatgccaagtacctgctcaaaggagaatatgtggg
caaggtcttccgggtcgatgcagagacaggagacgtgttcgccattgagaggctggaccg
ggagaatatctcagagtaccacctcactgctgtcattgtggacaaggacactggcgaaaa
cctggagactccttccagcttcaccatcaaagttcatgacgtgaacgacaactggcctgt
gttcacgcatcggttgttcaatgcgtccgtgcctgagtcgtcggctgtggggacctcagt
catctctgtgacagcagtggatgcagacgaccccactgtgggagaccacgcctctgtcat
gtaccaaatcctgaaggggaaagagtattttgccatcgataattctggacgtattatcac
aataacgaaaagcttggaccgagagaagcaggccaggtatgagatcgtggtggaagcgcg
agatgcccagggcctccggggggactcgggcacggccaccgtgctggtcactctgcaaga
catcaatgacaacttccccttcttcacccagaccaagtacacatttgtcgtgcctgaaga
cacccgtgtgggcacctctgtgggctctctgtttgttgaggacccagatgagccccagaa
ccggatgaccaagtacagcatcttgcggggcgactaccaggacgctttcaccattgagac
aaaccccgcccacaacgagggcatcatcaagcccatgaagcctctggattatgaatacat
ccagcaatacagcttcatagtcgaggccacagaccccaccatcgacctccgatacatgag
ccctcccgcgggaaacagagcccaggtcattatcaacatcacagatgtggacgagccccc
cattttccagcagcctttctaccacttccagctgaaggaaaaccagaagaagcctctgat
tggcacagtgctggccatggaccctgatgcggctaggcatagcattggatactccatccg
caggaccagtgacaagggccagttcttccgagtcacaaaaaagggggacatttacaatga
gaaagaactggacagagaagtctacccctggtataacctgactgtggaggccaaagaact
ggattccactggaacccccacaggaaaagaatccattgtgcaagtccacattgaagtttt
ggatgagaatgacaatgccccggagtttgccaagccctaccagcccaaagtgtgtgagaa
cgctgtccatggccagctggtcctgcagatctccgcaatagacaaggacataacaccacg
aaacgtgaagttcaaattcatcttgaatactgagaacaactttaccctcacggataatca


  [Part of this file has been deleted for brevity]

gagccagttgtcaagaagagcagcagcagcagctcctgtctcctgcaggacagcagcagc
cctgctcactccacgagcacggtggcagcagcagcagcgagcgcaccaccagagggacgg
atgctcattcaggacatcccttccatccccagcagagggcacttggagagcacgtctgat
ttggttgtggactccacctactacagcagtttttaccagccatccctgtatccttactat
aacaacctgtacaactactcccagtaccaaatggcagtggccactgagtcttcctcaagt
gagacagggggtacgtttgtagggtcagccatgaaaaacagccttcgaagcctcccagca
acatacatgtcaagccagtcaggaaaacagtggcagatgaagggaatggagaaccgccat
gccatgagctcccagtaccggatgtgctcctactacccgcccacctcatacctgggccag
ggggttggcagtcccacctgcgtcacacagatactggcctcggaggacaccccctcctac
tcagagtcgaaagcgagagtgttttcgccgcccagcagccaggactcgggcctggggtgc
ctgtcgagcagcgagagcaccaagggagacctggagtgcgagccccaccaagagcccggc
gccttcgcggtgagcccggttcttgagggcgagtaggcgcggcgtcgggcggctgctgcg
cggcgttcactgttgccttgttctgttggggttgcgggggggcgttgggtttcttctttc
cggggcggggggggcacggcggggccgcggccgggccggcggggcggggcggggcgggac
ggggcggggcggagccgcgcgggggccgcagtccgggccggggccgccgtcgggtctcgg
cccgctcccgtcggggcggagcgtccgacgatcggcctccacgaaacgcggtgccgtgat
gtgtttgtagtggttcctcgtaggctccagacgttttctcctcgtatcgccaaattaacg
cgttttgcatattacagttgagtgcctcgacttagattgcaatataagcggccagcaaac
aagtctcaaaaaaaagttacgtgcgtttctgcgagtgttattttgttaagaacggctcac
agtgtcctcttcctgtgttacagaagccaacctgaaatgaaactagtctggaaaaattca
ttgttctctgtagttgcagctgtacctgaaataaaaatgttattgatgactgaaaaaaaa
aaaaaaaaaa
>AF123457 AF123457.1 Toxoplasma gondii enolase (ENO2) mRNA, complete cds.
tctaccgttactcaacttccaacaaaatggtggccatcaaggacatcactgctcgtcaga
tcctcgactcccgaggaaacccgaccgtcgaggttgacttgttgaccgatggcggctgct
tccgtgccgctgtccccagcggcgcatccactggcatctacgaggcgcttgagctccgtg
acaaggaccaaactaagttcatgggcaagggtgtgatgaaggccgtggagaacatccaca
agattatcaagccggcgcttattggcaaggacccgtgcgaccagaagggtattgacaagc
tgatggtcgaggagctcgatggaactaagaacgagtggggctggtgcaagtcgaagctcg
gcgcgaacgcgatcctggccgtctcgatggcttgctgccgcgccggcgctgctgccaagg
gcatgcccctgtacaagtacattgccactttggctggaaacccgacagacaagatggtaa
tgcccgtcccgttcttcaacgtcatcaacggcggctcccacgcaggcaacaaggtcgcga
tgcaggagttcatgatcgcccccgtcggcgcctccacaatccaagaggcgatccagatcg
gcgcggaagtgtaccagcacctgaaggtcgtcattaagaagaagtatggcctcgacgcca
cgaacgtcggcgacgagggtggcttcgcccccaacatcagcggcgccacggaggccctcg
acttgctgatggaggccatcaaggtgtctggtcacgaaggcaaggtcaagattgccgccg
acgtcgccgcttccgagttcttcctccaggacgacaaagtctatgacctagacttcaaga
ctccgaacaacgacaagtcgcaacgcaagactggcgaagagcttcgcaacctgtacaagg
acctgtgccagaagtatcccttcgtgtccatcgaggacccgttcgaccaggacgacttcc
acagctacgctcagctcaccaacgaggttggcgagaaggtccaaatcgtcggcgacgacc
tcctggtcaccaacccgacgcgcattgagaaggccgttcaggagaaagcgtgcaacggcc
tgcttctcaaggtgaaccagattggcacagtcagcgagtctatcgaggcctgccagcttg
cccagaagaacaagtggggcgttatggtttctcaccgctccggtgagactgaggactcct
tcatcgctgacctcgtcgtcggtctccgcaccgggcaaatcaagactggcgccccgtgca
gatccgagcgtctctgcaagtacaaccagctgatgcgtatcgaagagtcgctcggctccg
actgtcagtacgccggcgctggcttccgccatcccaactaagtggaaacggagtttcgac
tacccaactgctcaattggggctgggtggtttgtccactctgcaacaagggcgtgacgag
atcgttgcacatgcaactgccttttttgtgcttggtgggaaggagcactttcgcaggtgc
agcaccgagttgcggttgatgggaatttcggaactgatttgtttcttgcatgccatcacc
gaaggaacgagcagtttcgttgataatattggaaagtcttttgaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaa

Data files

None

Notes

None

References

None

Warnings

None

Diagnostic Error Messages

None

Exit status

Always exits with status 0

Known bugs

None

See also

Program nameDescription
banana Bending and curvature plot in B-DNA
btwisted Calculates the twisting in a B-DNA sequence
chaos Create a chaos game representation plot for a sequence
compseq Count composition of dimer/trimer/etc words in a sequence
dan Calculates DNA RNA/DNA melting temperature
freak Residue/base frequency table or plot
isochore Plots isochores in large DNA sequences
sirna Finds siRNA duplexes in mRNA
wordcount Counts words of a specified size in a DNA sequence

Author(s)

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None