데이터 과학

단백질 구조 예측, CF & GOR 방법 본문

생명정보학 & 화학정보학/알파폴드와 단백질 구조 예측

단백질 구조 예측, CF & GOR 방법

티에스윤 2023. 9. 16. 21:24

Chou and Fasman (CF) 방법은 오래전에 연구된 단백질 구조 예측을 위한 방법입니다.

1974년에 연구된 이 방법은 아미노산의 특성을 연구하여 확률적으로 단백질 구조 결과를 나타냅니다.

 

https://pubs.acs.org/doi/10.1021/bi00699a002

 

 

버지니아 대학에서 FASTA 서열을 입력하여  CF 결과를 나타내는 사이트를 제공합니다. 

 

https://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1 

 

Misc Protein Analysis

 

fasta.bioch.virginia.edu

 

 

 

 

 

 

 

위 사이트에 아래 FASTA 예제를 입력해 봅시다. 

 

>AYV99761.1 spike [SARS coronavirus Urbani]
MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFH
TINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFAV
SKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLP
SGFNTLKPIFKLPLGINITNFRAILTAFSPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQ
NPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKKISNCVA
DYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
LAWNTRNIDATSTGNHNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIG
YQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTD
SVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNNVFQ
TQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNF
SISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQM
YKTPTLKYFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFN
KAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLIT
GRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYV
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVY
DPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ
YIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT

 

 

 

Submit Sequence를 선택하면 결과가 나타납니다.

 

                .         .         .         .         .         .
       MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFL
 helix <------->        <------->                    <-------------
 sheet EEEEEEEEEE     EEEEEEEEEE  EEEEEE   EEEEE       EEEEEEEEEEEE
 turns            TTT             T      T        T    TT     T    

                .         .         .         .         .         .
       PFYSNVTGFHTINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNS
 helix >                  <--------------------->       <----->    
 sheet EEEEEEEEEEEEEEE           EEE       EEEEEEEEEE     EEEEEEEEE
 turns                  T      T        TT         T   T  T      T 

                .         .         .         .         .         .
       TNVVIRACNFELCDNPFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFK
 helix   <------->     <----->    <------->      <-----------------
 sheet EEEEEEEEEEE       EEEEEEEEEEEEEEEEEEEEEEEEE                 
 turns               T      T            T          T       T  T   

                .         .         .         .         .         .
       HLREFVFKNKDGFLYVYKGYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSP
 helix ---------------->   <------>       <-------->    <----------
 sheet  EEEEE      EEEEEEEEEEEEEE        EEEEEEEEEEEEEEEEEEEEEEE   
 turns         T T                   T                            T

                .         .         .         .         .         .
       AQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQNPLAELKCSVKSFEIDKGIY
 helix ---------------->   <----->    <------------------------->  
 sheet  EEEE      EEEEEEEEEEEEEE     EEE                         EE
 turns       T           T        TTT         TT               T   

                .         .         .         .         .         .
       QTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKKISNCVADYSVLYNSTF
 helix                         <------------------>               <
 sheet EE            EEEEEEEE                          EEEEEEEEEEEE
 turns  TT      TT                              TT              T  

                .         .         .         .         .         .
       FSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
 helix ------------->        <--------------------->     <---------
 sheet EEEEEEEEEEEEEEEEEEEEEE              EEEEEEEEE          EEEEE
 turns                         T     TT       T             T      

                .         .         .         .         .         .
       LAWNTRNIDATSTGNHNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLND
 helix ->                        <----->                           
 sheet EEEEE            EEEEE                             EEEEEEEEE
 turns     TT                          T        T  T    T          

                .         .         .         .         .         .
       YGFYTTTGIGYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTP
 helix                <----------->        <------>                
 sheet EEEEEEEEEEEEEEEEEEEEEEE       EEEEEEEEEEEEEEEEEEEEEEEEEEEE  
 turns                                 T   T    T                 T

                .         .         .         .         .         .
       SSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQD
 helix   <------>      <-------------->                 <--------->
 sheet     EEEEEE                             EEEEE         EEEEEEE
 turns T T         T      T   TT  T                 T     T        

                .         .         .         .         .         .
       VNCTDVSTAIHADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSYECDIPIGAGICASY
 helix       <---------->         <------------->                  
 sheet EEEEEEEEEEEEEEE EEEEEE     EEEEEEEEE                     EEE
 turns             T   T      T T    T            T         T      

                .         .         .         .         .         .
       HTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDC
 helix     <------------->                       <------------->   
 sheet EEEEEE       EEEEEEE            EEEEEEEEEEEEEEEEEEE      EEE
 turns        T  TT            T     TT     T                      

                .         .         .         .         .         .
       NMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYFG
 helix          <----->       <----------->     <-------->         
 sheet EEEE      EEEEEEEEEEEEEEEE               EEEEEEEEEEEEEEEEEEE
 turns       T                   T        T T              T       

                .         .         .         .         .         .
       GFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
 helix  <---->        <----------------------->        <---------->
 sheet EEEEEEE           EEEEEEEEEE      EEEEEEE        EEEEEEEEEEE
 turns          T   T  T               T       T             T     

                .         .         .         .         .         .
       TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYE
 helix  <---------------->       <------------------->     <-------
 sheet EEEEEEEEEEEEEEEEEEEEEEEEEEEEE      EEEEEEEEEEEEEEEEEEEEEEE  
 turns         T            T         T                      T     

                .         .         .         .         .         .
       NQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLN
 helix ---------------------------------------------->      <------
 sheet    EEEEEEEEEEEEEEEEEEEEEE  EEEEEEEEEEEEEEEEEEEE          EEE
 turns T T   T   T      T                 T             T          

                .         .         .         .
       DILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEI
 helix --------------------------------------> 
 sheet EEE         EEEEEEEEEEEEEEEEEEEEEEE     
 turns           T         T          T        

 Residue totals: H:606   E:592   T:105
        percent: H: 60.6 E: 59.2 T: 10.5


CF 방법으로는 헬릭스 값이 60.6 정도 나오고 flat 구조는 59.2이며 코일은 10.5입니다. 

 

 

다시 메뉴로 돌아와서 이번에는 가니어 알고리즘을 선택합니다. 

 

 

 

가니어 알고리즘이 GOR 방법입니다. GOR은 Garnier, Osguthorpe and Robson 공동저자의 약자입니다. 

GOR은 GOR III까지 나와 있는데 CF 알고리즘의 구조예측률이 50%~60% 정도 선까지 나타난다면 GOR III 방법은 70% 이상을 상회합니다. 

 

 

https://www.sciencedirect.com/science/article/abs/pii/0022283678902978

 

 

GOR 메소드로 실험한 결과를 살펴보면 다음과 같습니다. 

 

           .   10    .   20    .   30    .   40    .   50    .   60
       MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFL
 helix HHHHHHHH                                                    
 sheet                  EE  E             E     EE EEE      EEEEEEE
 turns           TTTTT T  TT TTT   TTTT    TTTT   T   TTT  T       
 coil          CC     C         CCC    CCC     C         CC        

           .   70    .   80    .   90    .  100    .  110    .  120
       PFYSNVTGFHTINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNS
 helix                            HHHHHH                           
 sheet          EEE        EEEE            EEEEEEE        EEEEEE   
 turns TTT T TTT   TTTTT       TT       TTT            TT          
 coil     C C           CCC      C                CCCCC  C      CCC

           .  130    .  140    .  150    .  160    .  170    .  180
       TNVVIRACNFELCDNPFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFK
 helix                HHHH      H                 H HHHHHHHHHH  HHH
 sheet   EEEEE            E      EEEEEEE        EE                 
 turns TT     TTTTTTT         T         TTTTTTTT   T          TT   
 coil                C     CCC C                                   

           .  190    .  200    .  210    .  220    .  230    .  240
       HLREFVFKNKDGFLYVYKGYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSP
 helix HHHHHHHH                                                    
 sheet             EEEE    EEEEEEEE       EEEEEEEE   EEEEEEEEE     
 turns         TTTT    TTTT          TTT          TT               
 coil                              CC   CC          C         CCCCC

           .  250    .  260    .  270    .  280    .  290    .  300
       AQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQNPLAELKCSVKSFEIDKGIY
 helix                                           HHHHHHHHHHHHH     
 sheet           EEEEE     EEEEE      EEEEE                        
 turns TTTTT          T   T      TTTTT     TTT                TTT  
 coil       CCCCC      CCC      C             CCC                CC

           .  310    .  320    .  330    .  340    .  350    .  360
       QTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKKISNCVADYSVLYNSTF
 helix                           HHHH      HHHHHH                  
 sheet    EEEEE    EEEE                              EEEEEEEEEE    
 turns TTT      TTT    TTTTT  TT     TT          TTTT          T TT
 coil          C            CC  C      CCCC                     C  

           .  370    .  380    .  390    .  400    .  410    .  420
       FSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
 helix                        HH                                   
 sheet         E  EEEE EEEEEE   EEEE    EEEE    EEEEEE        EEEEE
 turns TTTTTTTT  T    T      T      TTTT      T       TTTT TTT     
 coil           C                           CC C          C        

           .  430    .  440    .  450    .  460    .  470    .  480
       LAWNTRNIDATSTGNHNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLND
 helix HHH                                                         
 sheet                    EEEEE     EEE               EE   E EEE   
 turns      TT T   TTT TTT     TT      TTTT  TT TTT TT  TTT T   TTT
 coil     CC  C CCC   C          CCC       CC  C   C               

           .  490    .  500    .  510    .  520    .  530    .  540
       YGFYTTTGIGYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTP
 helix                      HHH  H                                 
 sheet   EEEE      EEEEEEEEE       EEEE   EEEE EEEEEEE       EEEE  
 turns TT    TTTT                 T    TT     T       TTTT         
 coil            CC            CC        C                CCC    CC

           .  550    .  560    .  570    .  580    .  590    .  600
       SSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQD
 helix                                                             
 sheet      EE                    EEEE E       EEEEE       EEEEEE  
 turns TTTTT  TTTTTTT  TTTT T   T     T  TTTTTT                  TT
 coil                CC    C CCC C      C           CCCCCCC        

           .  610    .  620    .  630    .  640    .  650    .  660
       VNCTDVSTAIHADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSYECDIPIGAGICASY
 helix          HHH                           HH                   
 sheet E     EEE         EEE       EEEE     EE         EEE    EEEEE
 turns  TTTTT      TT   T   TTTT       TTT      TTTTTTT   TTTT     
 coil                CCC        CCC       CC                       

           .  670    .  680    .  690    .  700    .  710    .  720
       HTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDC
 helix                                                  HHHHHHHH   
 sheet EEEEEE      EEEEEEEE    EEEE     EEEE       EEEEE           
 turns       T T T                 TTTT     T T                 TTT
 coil         C C C        CCCC        C     C CCCC                

           .  730    .  740    .  750    .  760    .  770    .  780
       NMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYFG
 helix          HHHHHH                  HHH    HHHHHHHHH           
 sheet EEE                EEEEEE      EE                  EEEEEEE  
 turns    TTTT        TTTT      TT         TT           TT       TT
 coil         CC                  CCCC       CC                    

           .  790    .  800    .  810    .  820    .  830    .  840
       GFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
 helix                     HHHHHHHHHHHHHHHHH        HHHHHHHH       
 sheet       E         EEEE                                 EE    E
 turns TT TTT        T                      TTTTTTTT          TTTT 
 coil    C    CCCCCCC C                                            

           .  850    .  860    .  870    .  880    .  890    .  900
       TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYE
 helix       HHHHHHHHHHHH                HHHHHHHHHHHH             H
 sheet EEEEEE            EEE                                EEEE   
 turns                           T                   TTTT          
 coil                       CCCCC CCCCCCC                CCC    CC 

           .  910    .  920    .  930    .  940    .  950    .  960
       NQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLN
 helix HHHHHHHH                             HHHH                   
 sheet              EEEEE          EEEEEEEE     EEEEEEE     EEEEEEE
 turns                     T      T                       T        
 coil          CCCCC     CC CCCCCC         C           CCC C       

           .  970    .  980    .  990    . 1000    . 1010    . 1020
       DILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSK
 helix H HHHHHHHHHHHHHH                   HHHHHHHHHHHHHHHHHHHHHH   
 sheet  E              EEE       EEEEEEEEE                         
 turns                     TT T                                 TTT
 coil                     C  C CC                                  

           . 1030    . 1040    . 1050    . 1060    . 1070    . 1080
       RVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICHEGKAYFPREGVFVFN
 helix                                         HHHHH               
 sheet EEE       EEE        EEEEEE                          E EEEE 
 turns    TTTTTT      T   T       T     TTTTT       TTTT   T T    T
 coil           C   CC CCC C       CCCCC     CC         CCC        

           . 1090    . 1100    . 1110    . 1120    . 1130    . 1140
       GTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVYDPLQPELDSFKEELDKYFKN
 helix                                             HHHHHHHHHHHHHH  
 sheet     EEE        EEEE         EEEEEEEE      EE                
 turns        TTTT   T    TTTTTTTTT        TTTT                  TT
 coil  CCCC       CCC                          CC                  

           . 1150    . 1160    . 1170    . 1180    . 1190    . 1200
       HTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWL
 helix                     H HHHHHHHHHHHHHHHHHHHHHHH HH            
 sheet      EEEEEE      EEE E                           E      E   
 turns T   T                                        T  T TTT TT TT 
 coil   CCC       CCCCCC                                    C     C

           . 1210    . 1220    . 1230    . 1240    . 1250
       GFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
 helix    HHHHH     H                           HHHHHHHHHH    
 sheet         EEEEE EE                    E              EEEE
 turns TT              TTTTTTTTTTTTTTTTTTTT TTTT              
 coil    C                                                    

 Residue totals: H:264   E:393   T:374   C:224
        percent: H: 21.3 E: 31.7 T: 30.2 C: 18.1