- 积分
- 0
- 威望
- 0
- 包包
- 21
|
PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES
j. X1 b; `/ Y- q, s$ v! cAbstract; P7 R( X. ]+ |; ]! n
Predicting the secondary structure of proteins is important in biochemistry because the 3D
8 X; n s/ u* Z3 K3 n; j$ @structure can be determined from the local folds that are found in secondary structures.
! T9 s, j4 T9 }$ w7 P- n0 _+ ^5 RMoreover, knowing the tertiary structure of proteins can assist in determining their functions.
/ u' Z$ |. {/ y& R( @" e0 mThe objective of this thesis is to compare the performance of Neural Networks (NN) and$ F. h, x- E. F1 s
Support Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins
0 Y$ T4 f0 `4 P; A7 Ufrom their primary sequence. For each NN and SVM, we created six binary classifiers to. F4 n2 L4 x+ L6 A7 X
distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient
8 G2 ?% G: O) s' y* G sBackpropagation training with and without early stopping. We use NN with either no hidden: f2 v: ]/ U. Q& x: \
layer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian
& w: K$ H v4 [' S2 E! j" \( Okernel with parameter fixed at ' ^" k$ e. [! V6 O8 p8 Q: _9 j
= 0.1 and varying cost parameters C in the range [0.1,5]. 10-4 i$ y5 T4 I2 M# S
fold cross-validation is used to obtain overall estimates for the probability of making a correct
, q% ^( e+ f0 v0 cprediction. Our experiments indicate for NN and SVM that the different binary classifiers5 v. ]1 L- F; l$ u, U
have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct
% y3 e9 {( k2 y% ?predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer u+ Z# {# }; Z, g9 ~% b0 ^& i" E( Q
or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For% s! h* x% o& @3 x* q& `0 n9 [! V
SVM we show that the estimated accuracies do not depend on the value of the cost parameter.
$ B3 @$ V! _, w- c1 ]9 o4 YAs a major result, we will demonstrate that the accuracy estimates of NN and SVM binary
$ [# y9 ~$ R" f) ?* Kclassifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM7 W/ n9 M2 N" D: D, i+ _* n( T, p0 F
outperforms other predictors.
' Y' L. O: l* b: G( w# @keywords: Neural Networks, Support Vector Machines, Protein Secondary Structure Prediction |
|