PMID- 23331707
OWN - NLM
STAT- MEDLINE
DCOM- 20130918
LR  - 20211021
IS  - 1471-2105 (Electronic)
IS  - 1471-2105 (Linking)
VI  - 14
DP  - 2013 Jan 19
TI  - Extending the BEAGLE library to a multi-FPGA platform.
PG  - 25
LID - 10.1186/1471-2105-14-25 [doi]
AB  - BACKGROUND: Maximum Likelihood (ML)-based phylogenetic inference using 
      Felsenstein's pruning algorithm is a standard method for estimating the 
      evolutionary relationships amongst a set of species based on DNA sequence data, 
      and is used in popular applications such as RAxML, PHYLIP, GARLI, BEAST, and 
      MrBayes. The Phylogenetic Likelihood Function (PLF) and its associated scaling 
      and normalization steps comprise the computational kernel for these tools. These 
      computations are data intensive but contain fine grain parallelism that can be 
      exploited by coprocessor architectures such as FPGAs and GPUs. A general purpose 
      API called BEAGLE has recently been developed that includes optimized 
      implementations of Felsenstein's pruning algorithm for various data parallel 
      architectures. In this paper, we extend the BEAGLE API to a multiple Field 
      Programmable Gate Array (FPGA)-based platform called the Convey HC-1. RESULTS: 
      The core calculation of our implementation, which includes both the phylogenetic 
      likelihood function (PLF) and the tree likelihood calculation, has an arithmetic 
      intensity of 130 floating-point operations per 64 bytes of I/O, or 2.03 ops/byte. 
      Its performance can thus be calculated as a function of the host platform's peak 
      memory bandwidth and the implementation's memory efficiency, as 2.03 x peak 
      bandwidth x memory efficiency. Our FPGA-based platform has a peak bandwidth of 
      76.8 GB/s and our implementation achieves a memory efficiency of approximately 
      50%, which gives an average throughput of 78 Gflops. This represents a ~40X 
      speedup when compared with BEAGLE's CPU implementation on a dual Xeon 5520 and 3X 
      speedup versus BEAGLE's GPU implementation on a Tesla T10 GPU for very large data 
      sizes. The power consumption is 92 W, yielding a power efficiency of 1.7 Gflops 
      per Watt. CONCLUSIONS: The use of data parallel architectures to achieve high 
      performance for likelihood-based phylogenetic inference requires high memory 
      bandwidth and a design methodology that emphasizes high memory efficiency. To 
      achieve this objective, we integrated 32 pipelined processing elements (PEs) 
      across four FPGAs. For the design of each PE, we developed a specialized 
      synthesis tool to generate a floating-point pipeline with resource and throughput 
      constraints to match the target platform. We have found that using low-latency 
      floating-point operators can significantly reduce FPGA area and still meet timing 
      requirement on the target platform. We found that this design methodology can 
      achieve performance that exceeds that of a GPU-based coprocessor.
FAU - Jin, Zheming
AU  - Jin Z
AD  - Department of Computer Science and Engineering, University of South Carolina, 
      Columbia, SC, USA.
FAU - Bakos, Jason D
AU  - Bakos JD
LA  - eng
PT  - Journal Article
PT  - Research Support, Non-U.S. Gov't
DEP - 20130119
PL  - England
TA  - BMC Bioinformatics
JT  - BMC bioinformatics
JID - 100965194
SB  - IM
MH  - *Algorithms
MH  - Likelihood Functions
MH  - *Phylogeny
MH  - *Software
PMC - PMC3599256
EDAT- 2013/01/22 06:00
MHDA- 2013/09/21 06:00
PMCR- 2013/01/19
CRDT- 2013/01/22 06:00
PHST- 2012/05/29 00:00 [received]
PHST- 2013/01/04 00:00 [accepted]
PHST- 2013/01/22 06:00 [entrez]
PHST- 2013/01/22 06:00 [pubmed]
PHST- 2013/09/21 06:00 [medline]
PHST- 2013/01/19 00:00 [pmc-release]
AID - 1471-2105-14-25 [pii]
AID - 10.1186/1471-2105-14-25 [doi]
PST - epublish
SO  - BMC Bioinformatics. 2013 Jan 19;14:25. doi: 10.1186/1471-2105-14-25.