GeneMark Manual














Return to GeneMark

GeneMark Options gm

You may specify options to the GeneMark program by adding options on the command line or by setting default options using the DEFMAT and GMARGS environment variables .

All options start with the the '-' symbol to indicate it as an option followed by a letter corresponding to the option. Most options require that they be followed by an argument. For example, the -m option allows the user to select a sequence matrix file like so: ' -m ecoli_4.mat'.

Analysis Options Graphic Options Listing Options
ORF-related Options ROI-related Options

Analysis Options :

-a number A priori probability of coding. This is the probability that a randomly selected sequence fragment is coding. The default value is 0.5 and should work well in most circumstances. Values between 0.01 and 0.99 are permitted.
-c filename If the organism studied uses an alternative codon translation from the standard, it can be specified here. 'filename' specifies a file that lists on each line a codon, it's single letter IUPAC protein translation (use '*' for stop codons), and, optionally the word 'start' or 'rare_start' to indicate start codons.
-m filename Matrix file . 'filename' specifies the GeneMark model to use when calculating the coding potential function. The program has a default location it will search, but will also search the path specified by the MATPATH environment variable.
-s number Step size (in nucleotides). This is the step size used by the sliding evaluation window in the sequence. The default value of 12 is adequate. Large values may produce strange results. Adjusting this value adjusts the number of data points evaluated by the program but does not effect the accuracy of prediction.
-t number Threshold value. Regions and ORFs with a mean coding potential higher than this value are identified as coding signals. The default value of 0.5 should be adequate. Values between 0.01 and 0.99 are permitted.
-w number Window size (in nucleotides). This sets the size of the scanning window. Larger values may cause the program to underpredict small coding regions. Smaller values result in diminished prediction accuracy. The default value of 96 should be adequate in most circumstances.
-v Verbose. Print out a confirmation at each stage of the program's execution.
-D Data. When this option is specified, GeneMark will provide machine-readable versions of the .lst and .ps files generated by the -g and -l options above. These files will end with the suffixes .ldata and .gdata respectively.
-R filename Ribosome binding site pattern file. If provided, the program will score ribosome binding sites near putative gene starts according to parameters provided in an RBS pattern file.

Graphic Options :

The GeneMark program permits the user to generate a Postscript file containing a graphical depiction of the coding potential function in 6 reading frames. This graphic can be very useful in visualizing the data. The Postscript graphic is placed in a file with the suffix ' .ps'.

Graphic options are set by specifying ' -g' followed by any combination of the letters in the table below. For example:

gm -gnos -m ecoli_4.mat cyaY

Creates a graph containing start codon, stop codon, and open-reading frame indicators (in addition to the coding potential function) and places it in the file cyaY.ps.

0 (zero) Cancel all previous graphic options. If no options are selected, no graphical output is made.
f Frameshift. Indicate possible frame-shift errors with a veritcal arrow.
k Use an alternative scale labeling scheme that labels the scale in nice round units.
l Landscape. Print the graph in landscape orientation rather than portrait (the default)
n Ends. Indicate stop codon positions with a descending tick.
o ORF. Indicate open-reading frames with a horizontal line.
r Region. Indicate regions between stop codons where significant coding potential is indicated. These regions are indicated with a grey bar.
s Start. Indicate start and rare start codons with upward and small upward ticks respectively.
x Exon. Indicate possible exon boundaries based soley on coding potential information. Boundaries are inidicated using angular brackets.

You may also specify a "zoom level" using the ' -z' option. The number of data points graphed per page is simply divideded by this number. So, to view the graphical output at 0.5x zoom (twice as many data points per page):
gm -gnos -z 0.5 -m ecoli_5.mat cyaY
GeneMark Postscript output can be sent to a printer or viewed interactively on your computer.

Listing Options :

The GeneMark program permits the user to generate a text file containing summary results of the program's analsysis. Summary information is placed in a file with the suffix ' .lst'. The listing options are selected using the option ' -l' followed by any combination of the letters from the table below. For example:

gm -lo -m ecoli_5.mat cyaY

... would generate a list, cyaY.lst, of open reading frames with a mean coding potential greater than the threshold value .

0 (zero) Cancel all previous listing options. If no options are selected, no summary output is made.
o ORF. List open reading frames. If an RBS pattern file is specified value , evaluations of RBS sites near putative gene starts is provided.
r Region. List regions between stop codons where there is a significant coding potential.
q Quiet. Suppress comments and header information (makes output easier to use with scripts and other programs).
x Exon. List regions between putative acceptor/donor sites with significant coding potential.

ORF-related Options :

The GeneMark program permits the user to automatically write out, in FASTA format, open-reading frames with high coding potential into a file as either nucleotide sequences or amino acid translations. The results are placed in a file with the suffix ' .orf'.

The ORF-related options are specified with the option ' -o' followed by any combination of letters from the table below. For example:

gm -op -m ecoli_4.mat cyaY

... creates protein translations of high-scoring ORFs and places them in cyaY.orf.

0 (zero) Cancel all previous ORF-related options. If no options are selected, no ORF-related output is made.
n Nucleotides. Write out the nucleotide sequences of the high-scoring ORFs.
p Protein. Write out the amino acid translationg of the high-scoring ORFs.
q Quiet. Suppress comments and header information (makes output easier to use with scripts and other programs).

ROI-related Options :

The GeneMark program defines a unit called a "region of intrest" as a region between two stop codons in the same reading frame with a significant coding potential. Such regions may not occur within an open reading frame and may indicate coding regions where start and stop codons have been masked by errors in the sequence or other circumstances.

GeneMark permits the user to automatically write out, in FASTA format, regions of interest with high coding potential into a file as either nucleotide sequences or amino acid translations. The results are placed in a file with the suffix ' .rgn'.

The ROI-related options are specified with the option ' -r' followed by any combination of letters from the table below. For example:

gm -rp -m ecoli_4.mat cyaY

... creates protein translations of high-scoring regions of interest and places them in cyaY.orf.

0 (zero) Cancel all previous ROI-related options. If no options are selected, no ROI-related output is made.
n Nucleotides. Write out the nucleotide sequences of the high-scoring ROIs.
p Protein. Write out the amino acid translationg of the high-scoring ROIs.
q Quiet. Suppress comments and header information (makes output easier to use with scripts and other programs).

previous: Using GeneMark
next: GeneMark Environment Variables


Gene Probe, Inc.
1106 Wrights Mill Court
Atlanta, GA 30324

PH: +1 (404) 579 - 2975
FX: +1 (404) 255 - 2067

Technical Support
Licensing Support