Auxiliary Scripts

A few scripts are included with EUKulele that can be used outside of the software, or before invoking the program to obtain the required configuration.

Create Protein Table

create_protein_table.py is used to generate a taxonomy table and protein map JSON file from a provided taxonomy table tab-delimited file, and a FASTA file containing database protein sequences. The file is invoked using the following arguments:

  • --infile_peptide: the FASTA file containing the database peptide sequences
  • --infile_taxonomy: the tab-separated file containing the taxonomy of each peptide sequence in the database
  • --outfile_json: the protein map JSON file to be written
  • --output: the formatted taxonomy table to be written
  • --delim: the delimiter that separates tokens in the FASTA headers in the peptide sequence file
  • --col_source_id: the column in your tab-separated taxonomy file containing the name of the strain
  • --taxonomy_col_id: the column containing the strain taxonomy in the tab-separated taxonomy file
  • --column: either a character name for the token in the FASTA headers of the peptide sequence file containing the strain name (matching what is in the taxonomy file), or a numeric value indicating the order of the token
  • --reformat_tax: if included, indicates that the taxonomy table should be reformatted instead of left as-is
  • --euk-prot: if included, means we are using input from the EukProt database, which has different formatting features.

Download Database

Used within EUKulele, download_database.sh may also be used independently of EUKulele to download one of the available databases provided with EUKulele (see Section databases). Invoked via:

download_database.sh <DATABASE> <REF_FASTA> <REF_TABLE> <REF_FASTA_URL> <REF_TABLE_URL> <REFERENCE_DIR>

Where <DATABASE> is the name of the database, <REF_FASTA> is the name of the reference FASTA file to be generated/downloaded, REF_TABLE is the same but for the tab-delimited taxonomy table, <REF_FASTA_URL> and <REF_TABLE_URL> are the URLs to download said databases from (provided in reference_url.yaml), and <REFERENCE_DIR> is where to store the resulting database.