Parameters:)

usage: ganon [-h] [-v]
             {build,build-custom,update,classify,reassign,report,table} ...

- - - - - - - - - -
   _  _  _  _  _   
  (_|(_|| |(_)| |  
   _|   v. 2.3.0
- - - - - - - - - -

positional arguments:
  {build,build-custom,update,classify,reassign,report,table}
    build               Download and build ganon default databases
                        (refseq/genbank)
    build-custom        Build custom ganon databases
    update              Update ganon default databases
    classify            Classify reads against built databases
    reassign            Reassign reads with multiple matches with an EM
                        algorithm
    report              Generate reports from classification results
    table               Generate table from reports

options:
  -h, --help            show this help message and exit
  -v, --version         Show program's version number and exit.

ganon build

usage: ganon build [-h] [-g [ ...]] [-a [ ...]] [-l ] [-b [ ...]] [-o ] [-c] [-r] [-u ] [-m [ ...]] [-z [ ...]]
                   [--skip-genome-size] -d DB_PREFIX [-x ] [-t ] [-p ] [-k ] [-w ] [-s ] [-f ] [-j ] [-y ] [-v ]
                   [--restart] [--verbose] [--quiet] [--write-info-file]

options:
  -h, --help            show this help message and exit

required arguments:
  -g, --organism-group [ ...]
                        One or more organism groups to download [archaea, bacteria, fungi, human, invertebrate,
                        metagenomes, other, plant, protozoa, vertebrate_mammalian, vertebrate_other, viral]. Mutually
                        exclusive --taxid (default: None)
  -a, --taxid [ ...]    One or more taxonomic identifiers to download. e.g. 562 (-x ncbi) or 's__Escherichia coli' (-x
                        gtdb). Mutually exclusive --organism-group (default: None)
  -d, --db-prefix DB_PREFIX
                        Database output prefix

database arguments:
  -l, --level           Highest level to build the database. Options: any available taxonomic rank [species, genus,
                        ...], 'leaves' for taxonomic leaves or 'assembly' for a assembly/strain based analysis (default:
                        species)

download arguments:
  -b, --source [ ...]   Source to download [refseq, genbank] (default: ['refseq'])
  -o, --top             Download limited assemblies for each taxa. 0 for all. (default: 0)
  -c, --complete-genomes
                        Download only sub-set of complete genomes (default: False)
  -r, --reference-genomes
                        Download only sub-set of reference genomes (default: False)
  -u, --genome-updater 
                        Additional genome_updater parameters (https://github.com/pirovc/genome_updater) (default: None)
  -m, --taxonomy-files [ ...]
                        Specific files for taxonomy - otherwise files will be downloaded (default: None)
  -z, --genome-size-files [ ...]
                        Specific files for genome size estimation - otherwise files will be downloaded (default: None)
  --skip-genome-size    Do not attempt to get genome sizes. Activate this option when using sequences not representing
                        full genomes. (default: False)

important arguments:
  -x, --taxonomy        Set taxonomy to enable taxonomic classification, lca and reports [ncbi, gtdb, skip] (default:
                        ncbi)
  -t, --threads 

advanced arguments:
  -p, --max-fp          Max. false positive for bloom filters. Mutually exclusive --filter-size. Defaults to 0.001 with
                        --filter-type hibf or 0.05 with --filter-type ibf. (default: None)
  -k, --kmer-size       The k-mer size to split sequences. (default: 19)
  -w, --window-size     The window-size to build filter with minimizers. (default: 31)
  -s, --hash-functions 
                        The number of hash functions for the interleaved bloom filter [1-5]. With --filter-type ibf, 0
                        will try to set optimal value. (default: 4)
  -f, --filter-size     Fixed size for filter in Megabytes (MB). Mutually exclusive --max-fp. Only valid for --filter-
                        type ibf. (default: 0)
  -j, --mode            Create smaller or faster filters at the cost of classification speed or database size,
                        respectively [avg, smaller, smallest, faster, fastest]. If --filter-size is used,
                        smaller/smallest refers to the false positive rate. By default, an average value is calculated
                        to balance classification speed and database size. Only valid for --filter-type ibf. (default:
                        avg)
  -y, --min-length      Skip sequences smaller then value defined. 0 to not skip any sequence. Only valid for --filter-
                        type ibf. (default: 0)
  -v, --filter-type     Variant of bloom filter to use [hibf, ibf]. hibf requires raptor >= v3.0.1 installed or binary
                        path set with --raptor-path. --mode, --filter-size and --min-length will be ignored with hibf.
                        hibf will set --max-fp 0.001 as default. (default: hibf)

optional arguments:
  --restart             Restart build/update from scratch, do not try to resume from the latest possible step.
                        {db_prefix}_files/ will be deleted if present. (default: False)
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)
  --write-info-file     Save copy of target info generated to {db_prefix}.info.tsv. Can be re-used as --input-file for
                        further attempts. (default: False)

ganon build-custom

usage: ganon build-custom [-h] [-i [ ...]] [-e ] [-c] [-n ] [-a ] [-l ] [-m [ ...]] [-z [ ...]] [--skip-genome-size]
                          [-r [ ...]] [-q [ ...]] -d DB_PREFIX [-x ] [-t ] [-p ] [-k ] [-w ] [-s ] [-f ] [-j ] [-y ]
                          [-v ] [--restart] [--verbose] [--quiet] [--write-info-file]

options:
  -h, --help            show this help message and exit

required arguments:
  -i, --input [ ...]    Input file(s) and/or folder(s). Mutually exclusive --input-file. (default: None)
  -e, --input-extension 
                        Required if --input contains folder(s). Wildcards/Shell Expansions not supported (e.g. *).
                        (default: fna.gz)
  -c, --input-recursive
                        Look for files recursively in folder(s) provided with --input (default: False)
  -d, --db-prefix DB_PREFIX
                        Database output prefix

custom arguments:
  -n, --input-file      Tab-separated file with all necessary file/sequence information. Fields: file [<tab> target
                        <tab> node <tab> specialization <tab> specialization name]. For details:
                        https://pirovc.github.io/ganon/custom_databases/. Mutually exclusive --input (default: None)
  -a, --input-target    Target to use [file, sequence]. Parse input by file or by sequence. Using 'file' is recommended
                        and will speed-up the building process (default: file)
  -l, --level           Max. level to build the database. By default, --level is the --input-target. Options: any
                        available taxonomic rank [species, genus, ...] or 'leaves' (requires --taxonomy). Further
                        specialization options [assembly, custom]. assembly will retrieve and use the assembly accession
                        and name. custom requires and uses the specialization field in the --input-file. (default: None)
  -m, --taxonomy-files [ ...]
                        Specific files for taxonomy - otherwise files will be downloaded (default: None)
  -z, --genome-size-files [ ...]
                        Specific files for genome size estimation - otherwise files will be downloaded (default: None)
  --skip-genome-size    Do not attempt to get genome sizes. Activate this option when using sequences not representing
                        full genomes. (default: False)

ncbi arguments:
  -r, --ncbi-sequence-info [ ...]
                        Uses NCBI e-utils webservices or downloads accession2taxid files to extract target information.
                        [eutils, nucl_gb, nucl_wgs, nucl_est, nucl_gss, pdb, prot, dead_nucl, dead_wgs, dead_prot or one
                        or more accession2taxid files from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/].
                        By default uses e-utils up-to 50000 sequences or downloads nucl_gb nucl_wgs otherwise. (default:
                        [])
  -q, --ncbi-file-info [ ...]
                        Downloads assembly_summary files to extract target information. [refseq, genbank,
                        refseq_historical, genbank_historical or one or more assembly_summary files from
                        https://ftp.ncbi.nlm.nih.gov/genomes/] (default: ['refseq', 'genbank'])

important arguments:
  -x, --taxonomy        Set taxonomy to enable taxonomic classification, lca and reports [ncbi, gtdb, skip] (default:
                        ncbi)
  -t, --threads 

advanced arguments:
  -p, --max-fp          Max. false positive for bloom filters. Mutually exclusive --filter-size. Defaults to 0.001 with
                        --filter-type hibf or 0.05 with --filter-type ibf. (default: None)
  -k, --kmer-size       The k-mer size to split sequences. (default: 19)
  -w, --window-size     The window-size to build filter with minimizers. (default: 31)
  -s, --hash-functions 
                        The number of hash functions for the interleaved bloom filter [1-5]. With --filter-type ibf, 0
                        will try to set optimal value. (default: 4)
  -f, --filter-size     Fixed size for filter in Megabytes (MB). Mutually exclusive --max-fp. Only valid for --filter-
                        type ibf. (default: 0)
  -j, --mode            Create smaller or faster filters at the cost of classification speed or database size,
                        respectively [avg, smaller, smallest, faster, fastest]. If --filter-size is used,
                        smaller/smallest refers to the false positive rate. By default, an average value is calculated
                        to balance classification speed and database size. Only valid for --filter-type ibf. (default:
                        avg)
  -y, --min-length      Skip sequences smaller then value defined. 0 to not skip any sequence. Only valid for --filter-
                        type ibf. (default: 0)
  -v, --filter-type     Variant of bloom filter to use [hibf, ibf]. hibf requires raptor >= v3.0.1 installed or binary
                        path set with --raptor-path. --mode, --filter-size and --min-length will be ignored with hibf.
                        hibf will set --max-fp 0.001 as default. (default: hibf)

optional arguments:
  --restart             Restart build/update from scratch, do not try to resume from the latest possible step.
                        {db_prefix}_files/ will be deleted if present. (default: False)
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)
  --write-info-file     Save copy of target info generated to {db_prefix}.info.tsv. Can be re-used as --input-file for
                        further attempts. (default: False)

ganon update

usage: ganon update [-h] -d DB_PREFIX [-o ] [-t ] [--restart] [--verbose] [--quiet] [--write-info-file]

options:
  -h, --help            show this help message and exit

required arguments:
  -d, --db-prefix DB_PREFIX
                        Existing database input prefix

important arguments:
  -o, --output-db-prefix 
                        Output database prefix. By default will be the same as --db-prefix and overwrite files (default:
                        None)
  -t, --threads 

optional arguments:
  --restart             Restart build/update from scratch, do not try to resume from the latest possible step.
                        {db_prefix}_files/ will be deleted if present. (default: False)
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)
  --write-info-file     Save copy of target info generated to {db_prefix}.info.tsv. Can be re-used as --input-file for
                        further attempts. (default: False)

ganon classify

usage: ganon classify [-h] -d [DB_PREFIX ...] -o OUTPUT_PREFIX [-s [reads.fq[.gz] ...]]
                      [-p [reads.1.fq[.gz] reads.2.fq[.gz] ...]] [-a [file.tsv ...]] [-c [ ...]] [-e [ ...]] [-m ]
                      [--ranks [ ...]] [--min-count ] [--report-type ] [--skip-report] [--output-one] [--output-all]
                      [--output-unclassified] [--output-single] [-t ] [-b] [-f [ ...]] [-l [ ...]] [--verbose] [--quiet]

options:
  -h, --help            show this help message and exit

required arguments:
  -d, --db-prefix [DB_PREFIX ...]
                        Database input prefix[es]
  -o, --output-prefix OUTPUT_PREFIX
                        Output prefix for base report (.rep) and tree-like report (.tre).
  -s, --single-reads [reads.fq[.gz] ...]
                        Multi-fastq[.gz] file[s] to classify (default: None)
  -p, --paired-reads [reads.1.fq[.gz] reads.2.fq[.gz] ...]
                        Multi-fastq[.gz] pairs of file[s] to classify (default: None)
  -a, --batch-reads [file.tsv ...]
                        File with single- or paired-end reads to be processed in one run. Prefix can be repeated.
                        Example: prefix <tab> file1 [<tab> file2] (default: None)

cutoff/filter arguments:
  -c, --rel-cutoff [ ...]
                        Min. percentage of a read (set of k-mers) shared with a reference necessary to consider a match.
                        Generally used to remove low similarity matches. Single value or one per database (e.g. 0.7 1
                        0.25). 0 for no cutoff (default: [0.75])
  -e, --rel-filter [ ...]
                        Additional relative percentage of matches (relative to the best match) to keep. Generally used
                        to keep top matches above cutoff. Single value or one per hierarchy (e.g. 0.1 0). 1 for no
                        filter (default: [0.1])

post-processing/report arguments:
  -m, --multiple-matches 
                        Method to solve reads with multiple matches [em, lca, skip]. em -> expectation maximization
                        algorithm based on unique matches. lca -> lowest common ancestor based on taxonomy. The EM
                        algorithm can be executed later with 'ganon reassign' using the .all file (--output-all).
                        (default: em)
  --ranks [ ...]        Ranks to report taxonomic abundances (.tre). empty will report default ranks [domain phylum
                        class order family genus species assembly]. (default: [])
  --min-count           Minimum percentage/counts to report an taxa (.tre) [use values between 0-1 for percentage, >1
                        for counts] (default: 5e-05)
  --report-type         Type of report (.tre) [abundance, reads, matches, dist, corr]. More info in 'ganon report'.
                        (default: abundance)
  --skip-report         Disable tree-like report (.tre) at the end of classification. Can be done later with 'ganon
                        report'. (default: False)

output arguments:
  --output-one          Output a file with one match for each read (.one) either an unique match or a result from the EM
                        or a LCA algorithm (--multiple-matches) (default: False)
  --output-all          Output a file with all unique and multiple matches (.all) (default: False)
  --output-unclassified
                        Output a file with unclassified read headers (.unc) (default: False)
  --output-single       When using multiple hierarchical levels, output everything in one file instead of one per
                        hierarchy (default: False)

other arguments:
  -t, --threads         Number of sub-processes/threads to use (default: 1)
  -b, --binning         Optimized parameters for binning (--rel-cutoff 0.25 --rel-filter 0 --min-count 0 --report-type
                        reads). Will report sequence abundances (.tre) instead of tax. abundance. (default: False)
  -f, --fpr-query [ ...]
                        Max. false positive of a query to accept a match. Applied after --rel-cutoff and --rel-filter.
                        Generally used to remove false positives matches querying a database build with large --max-fp.
                        Single value or one per hierarchy (e.g. 0.1 0). 1 for no filter (default: [1e-05])
  -l, --hierarchy-labels [ ...]
                        Hierarchy definition of --db-prefix files to be classified. Can also be a string, but input will
                        be sorted to define order (e.g. 1 1 2 3). The default value reported without hierarchy is 'H1'
                        (default: None)
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)

ganon reassign

usage: ganon reassign [-h] -i [ ...] [-o OUTPUT_PREFIX] [-e ] [-s ] [--remove-all] [--skip-one] [--skip-rep] [--verbose]
                      [--quiet]

options:
  -h, --help            show this help message and exit

required arguments:
  -i, --input-prefix [ ...]
                        Input prefix to find files from ganon classify (.rep and .all)
  -o, --output-prefix OUTPUT_PREFIX
                        Alternative output prefix for reassigned files. If not provided, will use same path of input
                        files (will overwrite .rep). In case of multiple files, the output will be the suffix. Example:
                        {output_prefix}{filename}.one (default: )

EM arguments:
  -e, --max-iter        Max. number of iterations for the EM algorithm. If 0, will run until convergence (check
                        --threshold) (default: 10)
  -s, --threshold       Convergence threshold limit to stop the EM algorithm. (default: 0)

other arguments:
  --remove-all          Remove input file (.all) after processing. (default: False)
  --skip-one            Do not write output file (.one) after processing. (default: False)
  --skip-rep            Do not write report file (.rep) after processing. (default: False)
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)

ganon report

usage: ganon report [-h] -i [ ...] [-e INPUT_EXTENSION] [-d [ ...]] [-x ] [-m [ ...]] [-z [ ...]] [--skip-genome-size]
                    [-o OUTPUT_PREFIX] [-f ] [-t ] [-r [ ...]] [-s ] [-a] [-y] [-p [ ...]] [-k [ ...]] [-c ] [-n]
                    [--verbose] [--quiet] [--min-count ] [--max-count ] [--names [ ...]] [--names-with [ ...]]
                    [--taxids [ ...]]

options:
  -h, --help            show this help message and exit

required arguments:
  -i, --input [ ...]    Input file(s) and/or folder(s). '.rep' file(s) from ganon classify.
  -e, --input-extension INPUT_EXTENSION
                        Required if --input contains folder(s). Wildcards/Shell Expansions not supported (e.g. *).
                        (default: rep)

db/tax arguments:
  -d, --db-prefix [ ...]
                        Database prefix(es) used for classification. Only '.tax' file(s) are required. If not provided,
                        new taxonomy will be downloaded. Mutually exclusive with --taxonomy. (default: [])
  -x, --taxonomy        Taxonomy database to use [ncbi, gtdb, skip]. Mutually exclusive with --db-prefix. (default:
                        ncbi)
  -m, --taxonomy-files [ ...]
                        Specific files for taxonomy - otherwise files will be downloaded (default: None)
  -z, --genome-size-files [ ...]
                        Specific files for genome size estimation - otherwise files will be downloaded (default: None)
  --skip-genome-size    Do not attempt to get genome sizes. Valid only without --db-prefix. Activate this option when
                        using sequences not representing full genomes. (default: False)

output arguments:
  -o, --output-prefix OUTPUT_PREFIX
                        Output prefix for report file 'output_prefix.tre'. In case of multiple files, the base input
                        filename will be appended at the end of the output file 'output_prefix + FILENAME.tre' (default:
                        )
  -f, --output-format   Output format [text, tsv, csv, bioboxes]. text outputs a tabulated formatted text file for
                        better visualization. bioboxes is the the CAMI challenge profiling format (only
                        percentage/abundances are reported). (default: tsv)
  -t, --report-type     Type of report [abundance, reads, matches, dist, corr]. 'abundance' -> tax. abundance (re-
                        distribute read counts and correct by genome size), 'reads' -> sequence abundance, 'matches' ->
                        report all unique and shared matches, 'dist' -> like reads with re-distribution of shared read
                        counts only, 'corr' -> like abundance without re-distribution of shared read counts (default:
                        abundance)
  -r, --ranks [ ...]    Ranks to report ['', 'all', custom list]. 'all' for all possible ranks. empty for default ranks
                        [domain phylum class order family genus species assembly]. (default: [])
  -s, --sort            Sort report by [rank, lineage, count, unique]. Default: rank (with custom --ranks) or lineage
                        (with --ranks all) (default: )
  -a, --no-orphan       Omit orphan nodes from the final report. Otherwise, orphan nodes (= nodes not found in the
                        db/tax) are reported as 'na' with root as direct parent. (default: False)
  -y, --split-hierarchy
                        Split output reports by hierarchy (from ganon classify --hierarchy-labels). If activated, the
                        output files will be named as '{output_prefix}.{hierarchy}.tre' (default: False)
  -p, --skip-hierarchy [ ...]
                        One or more hierarchies to skip in the report (from ganon classify --hierarchy-labels) (default:
                        [])
  -k, --keep-hierarchy [ ...]
                        One or more hierarchies to keep in the report (from ganon classify --hierarchy-labels) (default:
                        [])
  -c, --top-percentile 
                        Top percentile filter, based on percentage/relative abundance. Applied only at default ranks
                        [domain phylum class order family genus species assembly] (default: 0)
  -n, --normalize       Ignore the number of unclassified reads, normalizing the output to 100%. Use with caution, can
                        drastically change abundance estimations. (default: False)

optional arguments:
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)

filter arguments:
  --min-count           Minimum number/percentage of counts to keep an taxa [values between 0-1 for percentage, >1
                        specific number] (default: 0)
  --max-count           Maximum number/percentage of counts to keep an taxa [values between 0-1 for percentage, >1
                        specific number] (default: 0)
  --names [ ...]        Show only entries matching exact names of the provided list (default: [])
  --names-with [ ...]   Show entries containing full or partial names of the provided list (default: [])
  --taxids [ ...]       One or more taxids to report (including children taxa) (default: [])

ganon table

usage: ganon table [-h] -i [ ...] [-e ] -o OUTPUT_FILE [-l ] [-f ] [-t ] [-a ] [-m ] [-r ] [-n] [--header ]
                   [--unclassified-label ] [--filtered-label ] [--skip-zeros] [--transpose] [--verbose] [--quiet]
                   [--min-count ] [--max-count ] [--names [ ...]] [--names-with [ ...]] [--taxids [ ...]]

options:
  -h, --help            show this help message and exit

required arguments:
  -i, --input [ ...]    Input file(s) and/or folder(s). '.tre' file(s) from ganon report.
  -e, --input-extension 
                        Required if --input contains folder(s). Wildcards/Shell Expansions not supported (e.g. *).
                        (default: tre)
  -o, --output-file OUTPUT_FILE
                        Output filename for the table

output arguments:
  -l, --output-value    Output value on the table [percentage, counts]. percentage values are reported between [0-1]
                        (default: counts)
  -f, --output-format   Output format [tsv, csv] (default: tsv)
  -t, --top-sample      Top hits of each sample individually (default: 0)
  -a, --top-all         Top hits of all samples (ranked by percentage) (default: 0)
  -m, --min-frequency   Minimum number/percentage of files containing an taxa to keep the taxa [values between 0-1 for
                        percentage, >1 specific number] (default: 0)
  -r, --rank            Define specific rank to report. Empty will report all ranks. (default: None)
  -n, --no-root         Do not report root node entry and lineage. Direct and shared matches to root will be accounted
                        as unclassified (default: False)
  --header              Header information [name, taxid, lineage] (default: name)
  --unclassified-label 
                        Add column with unclassified count/percentage with the chosen label. May be the same as
                        --filtered-label (e.g. unassigned) (default: None)
  --filtered-label      Add column with filtered count/percentage with the chosen label. May be the same as
                        --unclassified-label (e.g. unassigned) (default: None)
  --skip-zeros          Do not print lines with only zero count/percentage (default: False)
  --transpose           Transpose output table (taxa as cols and files as rows) (default: False)

optional arguments:
  --verbose             Verbose output mode (default: False)
  --quiet               Quiet output mode (default: False)

filter arguments:
  --min-count           Minimum number/percentage of counts to keep an taxa [values between 0-1 for percentage, >1
                        specific number] (default: 0)
  --max-count           Maximum number/percentage of counts to keep an taxa [values between 0-1 for percentage, >1
                        specific number] (default: 0)
  --names [ ...]        Show only entries matching exact names of the provided list (default: [])
  --names-with [ ...]   Show entries containing full or partial names of the provided list (default: [])
  --taxids [ ...]       One or more taxids to report (including children taxa) (default: [])