Upload (or delete) annotated genome

Select a proteome/genome file and upload it

This service takes several minutes to an hour to complete

Options
Input file (see the supported formats below)
Short name for it (only alphanumeric and underscore)

Instructions
What	Comment
Principle	The build a model (BUILD) method expects an annotated genome as one of its arguments. Two input formats are supported: EMBL and GenBank formats. If a genome is made of one or several entities i.e. contigs, scaffolds, chromosomes or plasmids, a single file containing the concatenated entries can be supplied. Zipped or gzipped genome file upload is supported but without any guarantee.
Data sources	Genomes from the following public resources have been tested and include numerous species EMBL-formatted genomes can be obtained from the Genomes Pages of ENA for many species of bacteria and archaea (column: Sequence / Plain). EMBL-formatted genomes can be obtained from the Ensembl Genomes FTP site GenBank-formatted genomes can be obtained from the Assembly pages of NCBI for many species of bacteria and archaea (follow the link Download the GenBank assembly or Download the RefSeq assembly and choose the file with the _genomic.gbff.gz extension)
Extracted fields	The following fields are extracted from the uploaded genome: The ID of every entry is read from the ID or LOCUS field The protein sequences are obtained from the /translation sub-field of the CDS field The locations and strand of genes are retrieved from the gene field A protein identifier is created from the CDS field by looking for one of the /locus_tag, /gene and /protein_id qualifiers Other informations are currently ignored
Nota Bene	Many software tools produce *pseudo*-EMBL or -GenBank formats: these fancy formats are possibly not accepted here and no support will be given. For a given gene, the translated CDS sequences from an entry and the protein sequence from the corresponding UniProt entry are not necessarily identical! The EMBL and the GenBank databases act as an archive which preserves the genome annotations as originally deposited, while UniProt is a curated resource that is regularly updated.