PMID- 27249328 OWN - NLM STAT- MEDLINE DCOM- 20171017 LR - 20220317 IS - 1557-8666 (Electronic) IS - 1066-5277 (Linking) VI - 23 IP - 8 DP - 2016 Aug TI - A Complex Prime Numerical Representation of Amino Acids for Protein Function Comparison. PG - 669-77 LID - 10.1089/cmb.2015.0178 [doi] AB - Computationally assessing the functional similarity between proteins is an important task of bioinformatics research. It can help molecular biologists transfer knowledge on certain proteins to others and hence reduce the amount of tedious and costly benchwork. Representation of amino acids, the building blocks of proteins, plays an important role in achieving this goal. Compared with symbolic representation, representing amino acids numerically can expand our ability to analyze proteins, including comparing the functional similarity of them. Among the state-of-the-art methods, electro-ion interaction pseudopotential (EIIP) is widely adopted for the numerical representation of amino acids. However, it could suffer from degeneracy that two different amino acid sequences have the same numerical representation, due to the design of EIIP. In light of this challenge, we propose a complex prime numerical representation (CPNR) of amino acids, inspired by the similarity between a pattern among prime numbers and the number of codons of amino acids. To empirically assess the effectiveness of the proposed method, we compare CPNR against EIIP. Experimental results demonstrate that the proposed method CPNR always achieves better performance than EIIP. We also develop a framework to combine the advantages of CPNR and EIIP, which enables us to improve the performance and study the unique characteristics of different representations. FAU - Chen, Duo AU - Chen D AD - 1 School of Biological Science and Medical Engineering, Southeast University , Nanjing, China . FAU - Wang, Jiasong AU - Wang J AD - 2 Department of Mathematics, Nanjing University , Nanjing, China . FAU - Yan, Ming AU - Yan M AD - 3 Department of Biotechnology & Pharmaceutical Engineering, Nanjing Tech University , Nanjing, China . FAU - Bao, Forrest Sheng AU - Bao FS AD - 4 Department of Electrical and Computer Engineering, University of Akron , Akron, Ohio. LA - eng PT - Journal Article DEP - 20160601 PL - United States TA - J Comput Biol JT - Journal of computational biology : a journal of computational molecular cell biology JID - 9433358 RN - 0 (Amino Acids) RN - 0 (Proteins) SB - IM MH - *Algorithms MH - Amino Acids/*chemistry MH - Computational Biology/*methods MH - Databases, Protein MH - Humans MH - Models, Molecular MH - Proteins/*chemistry/*metabolism MH - Sequence Analysis, Protein/*methods OTO - NOTNLM OT - CPNR OT - EIIP OT - numerical representation OT - protein OT - sequence comparison EDAT- 2016/06/02 06:00 MHDA- 2017/10/19 06:00 CRDT- 2016/06/02 06:00 PHST- 2016/06/02 06:00 [entrez] PHST- 2016/06/02 06:00 [pubmed] PHST- 2017/10/19 06:00 [medline] AID - 10.1089/cmb.2015.0178 [doi] PST - ppublish SO - J Comput Biol. 2016 Aug;23(8):669-77. doi: 10.1089/cmb.2015.0178. Epub 2016 Jun 1.