From 60254aaf72d3f63693f8a360314900a6fa8f3840 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A2muran=20=C4=B0mran?= <73625486+kamurani@users.noreply.github.com> Date: Wed, 15 May 2024 17:30:09 +1000 Subject: [PATCH] compseq: add page (#12713) --- pages/linux/compseq.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 pages/linux/compseq.md diff --git a/pages/linux/compseq.md b/pages/linux/compseq.md new file mode 100644 index 000000000..e9c519c6a --- /dev/null +++ b/pages/linux/compseq.md @@ -0,0 +1,36 @@ +# compseq + +> Calculate the composition of unique words in sequences. +> More information: . + +- Count observed frequencies of words in a FASTA file, providing parameter values with interactive prompt: + +`compseq {{path/to/file.fasta}}` + +- Count observed frequencies of amino acid pairs from a FASTA file, save output to a text file: + +`compseq {{path/to/input_protein.fasta}} -word 2 {{path/to/output_file.comp}}` + +- Count observed frequencies of hexanucleotides from a FASTA file, save output to a text file and ignore zero counts: + +`compseq {{path/to/input_dna.fasta}} -word 6 {{path/to/output_file.comp}} -nozero` + +- Count observed frequencies of codons in a particular reading frame; ignoring any overlapping counts (i.e. move window across by word-length 3): + +`compseq -sequence {{path/to/input_rna.fasta}} -word 3 {{path/to/output_file.comp}} -nozero -frame {{1}}` + +- Count observed frequencies of codons frame-shifted by 3 positions; ignoring any overlapping counts (should report all codons except the first one): + +`compseq -sequence {{path/to/input_rna.fasta}} -word 3 {{path/to/output_file.comp}} -nozero -frame 3` + +- Count amino acid triplets in a FASTA file and compare to a previous run of `compseq` to calculate expected and normalised frequency values: + +`compseq -sequence {{path/to/human_proteome.fasta}} -word 3 {{path/to/output_file1.comp}} -nozero -infile {{path/to/output_file2.comp}}` + +- Approximate the above command without a previously prepared file, by calculating expected frequencies using the single base/residue frequencies in the supplied input sequence(s): + +`compseq -sequence {{path/to/human_proteome.fasta}} -word 3 {{path/to/output_file.comp}} -nozero -calcfreq` + +- Display help (use `-help -verbose` for more information on associated and general qualifiers): + +`compseq -help`