Parameters¶

A full list of parameters can be found in the table at the bottom of this page. However, in practice, only a few parameters will be relevant for most users of EUKulele. These are the required ones:

mets_or_mags: Whether the user intends to run the analysis for metatranscriptomic samples (“mets”) or metagenomic samples (“mags”)

Full list of `EUKulele` parameters¶
Flag	Configuration File Entry	Meaning
`--config`	N/A	The path to the configuration file which should be used to retrieve the equivalent of command-line arguments.
`-m/--mets_or_mags`	`mets_or_mags`	A required flag to indicate whether metatranscriptomic (“mets”) or metagenomic (“mags”) samples are being used as input.
`-s/--sample_dir`	samples	A required flag to indicate where the samples (metagenomic or metatranscriptomic, depending on “mets_or_mags” flag) are located.
`-o/--out_dir`	output	The path to the directory where output will be stored. Defaults to a folder called `output` in the present working directory.
`--reference_dir`	reference	A flag to indicate where the reference FASTA is stored, or a keyword argument for the dataset to be downloaded and used. Only used if not downloading automatically.
`--ref_fasta`	ref_fasta	The name of the reference FASTA file in `reference_dir`; defaults to reference.pep.fa if not specified, or is set according to the downloaded file if using a keyword argument.
`--database`	database	An optional additional argument for specifying the database name. If the database specified is one of the supported databases (currently, “mmetsp”, “eukprot”, or “phylodb”, it will be downloaded automatically. Otherwise, MMETSP is used as a default.
`--run_transdecoder`	run_transdecoder (set to 0 or 1)	An argument for the user to specify whether or not TransDecoder should be used to translate input nucleotide sequences, prior to `blastp` being used (i.e., the equivalent protein-protein alignment with the tool of choice). If included in command line or set to 1 in configuration file, `TransDecoder` is run. Otherwise, `blastp` is run if protein files are found (according to files in the sample directory ending in `--p_ext` (below), or `blastx` is run if only nucleotide format files are found.
`--nucleotide_extension/--n_ext`	nucleotide_extension	The file extension for samples in nucleotide format (metatranscriptomes). Defaults to .fasta.
`--protein_extension/--p_ext`	protein_extension	The file extension for samples in protein format (metatranscriptomes). Defaults to .faa.
`-f/--force_rerun`	force_rerun	If included in a command line argument or set to 1 in a configuration file, this argument forces all steps to be re-run, regardless of whether output is already present.
`--use_salmon_counts`	use_salmon_counts	If included in a command line argument or set to 1 in a configuration file, this argument causes classifications to be made based both on number of classified transcripts and by counts.
`--salmon_dir`	salmon_dir	If `--use_salmon_counts` is true, this must be specified, which is the directory location of the `salmon` output/quantification files.
`--names_to_reads`	names_to_reads	A file that creates a correspondence between each transcript name and the number of `salmon`-quantified reads. Can be generated manually via the `names_to_reads.py` script, or will be generated automatically if it does not exist.
`--transdecoder_orfsize`	transdecoder_orfsize	The minimum cutoff size for an open reading frame (ORF) detected by `TransDecoder`. Only relevant if `--use_transdecoder` is specified.
`--alignment_choice`	alignment_choice	A choice of aligner to use, currently `BLAST` or `DIAMOND`.
`--cutoff_file`	cutoff_file	A `YAML` file, provided in `src/EUKulele/static/`, that contains the percent identity cutoffs for various taxonomic classifications. Any path may be provided here to a user-specified file.
`--filter_metric`	filter_metric	Either evalue, pid, or bitscore (default evalue) - the metric to be used to filter hits based on their quality prior to taxonomic estimation.
`--consensus_cutoff`	consensus_cutoff	The value to be used to decide whether enough of the taxonomic matches are identical to overlook a discrepancy in classification based on hits associated with a contig. Defaults to 0.75 (75%).
`--busco_file`	busco_file	Overrides specific organism and taxonomy parameters (next two entries below) in favor of a tab-separated file containing each organism/group of interest and the taxonomic level of the query.
`--organisms`	organisms	A list of organisms/groups to test the BUSCO completeness of matching contigs for.
`--taxonomy_organisms`	taxonomy_organisms	The taxonomic level of the groupings indicated in the list of `--organisms`; also a list.
`--individual_or_summary / -i`	individual_or_summary	Defaults to summary. Whether BUSCO assessment should just be performed for the top organism matches, or whether the list of organisms + their taxonomies or BUSCO file (above parameters) should be used (individual). When `-i` is specified, individual mode is chosen.
`--busco_threshold`	busco_threshold	The threshold for BUSCO completeness for a set of contigs to be considered reasonably BUSCO-complete.
`--tax_table`	tax_table	The name of the formatted taxonomy table; defaults to “tax-table.txt.”. If this file is not found, it can be generated from the reference FASTA and original taxonomy file using the provided script `create_protein_file.py`, or the database specified will be automatically downloaded, if it is one of the supported databases.
`--protein_map`	protein_map	The name of the JSON file containing protein correspondences; defaults to “protein-map.json”. If this file is not found, it can be generated from the reference FASTA and original taxonomy file using the provided script `create_protein_file.py`, or the database specified will be automatically downloaded, if it is one of the supported databases.