CPC - Coding Potential Calculator

quick links

Run CPC

Get Results

Quick Guide

Download

Documents

This guide will give you a quick look at how to using CPC on line web tools. If you want to run CPC on your computer, please refer to our Installation Guide.

Fasta: "A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. (ncbi)" For more information about fasta format, refer to wikipeida fasta format description or ncbi fasta format description.
Email: Because of the complexity of coding potential calculating, it might takes several minutes or more to finish the calculating. By providing email address here, users can recieve their results by email for those long-time runing calculating. The email address provided by users is only used for sending calculating results, and we will not keep the email address or leak it to others.
Task ID: The online coding potential calculator will automatically assign an unique ID for each calculating task. Users can use the Task ID to retrieve their calculating results. For more details, please refer to our Quick Guide .

NOTES: Task ID may be expired after 7 days when it is created. The results of an expired task will removed from the CPC server and can no longer be retrieved.
Hit Num: How many blast hits are found for the input sequence.
Hit Score: For a true protein-coding transcript the hits are also likely to have higher quality; i.e., the HSPs (High-scoring Segment Pairs) overall tend to have lower E-value. Thus we define feature HIT SCORE as follows::

where Eij is the E-value of the jth HSP in frame i, Si measures the average quality of the HSPs in frame i and HIT SCORE is the average of Si across three frames. The higher the HIT SCORE, the better the overall quality of the hits and the more likely the transcript is protein-coding.
Frame Score: For a true protein-coding transcript most of the hits are likely to reside within one frame, whereas for a true noncoding transcript, even if it matches certain known protein sequence segments by chance, these chance hits are likely to scatter in any of the three frames. Thus we define feature FRAME SCORE to measure the distribution of the HSPs among three reading frames:

The higher the FRAME SCORE, the more concentrated the hits are and the more likely the transcript is protein-coding.
FrameFinder ORF Coverage: FrameFinder's orf coverage, A large COVERAGE OF THE PREDICTED ORF is an indicator of good ORF quality (Slater, G.S.C. (2000) Algorithms for the Analysis of Expressed Sequence Tags, University of Cambridge, Cambridge.) . For more information, refer to Pasteur FrameFinder Man Page .
FrameFinder LOG-ODDS SCORE: As suggested by the FrameFinder's author, the LOG-ODDS SCORE is an indicator of the quality of a predicted ORF and the higher score, the higher the quality. For more information, refer to Pasteur FrameFinder Man Page .
FrameFinder ORF Type: The ORF Type is the INTEGRITY OF THE PREDICTED ORF, that indicates whether an ORF begins with a start codon and ends with an in-frame stop codon.
Html View: The html view of coding potential calculator online results.
UTRdb: UTResource-DB (UTRdb) is a specialized database of 5' and 3' UnTRanslated sequences of eukaryotic mRNAs cleaned and annotated based on RefSeq. For more detials, refer to UTResource-DB Web Site.
RNAdb: A comprehensive mammalian noncoding RNA database, includes >800 unique experimentally studied ncRNAs, >1100 putative antisense ncRNA and almost 20000 putative ncRNAs identified in high-quality murine and human cDNA libraries. For more detials, refer to RNAdb Web Site.