서열분석 - Needleman Wunsch algorithm

Notice

Recent Posts

Recent Comments

Link

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

데이터 과학

서열분석 - Needleman Wunsch algorithm 본문

생명정보학 & 화학정보학/NCBI와 블라스트

서열분석 - Needleman Wunsch algorithm

티에스윤 2021. 9. 27. 15:23

유사한 유전자를 서열분석을 하면 진화적인 관계를 통해 비슷한 생물학적 유사성을 찾아낼 수 가 있습니다.

서열분석은 유사성 검색 - 상동성 검색과 모티브 검색이 있습니다.

유사성 검색- 상동성 검색을 위해서는 Score Function 을 나타내어 최적화 시키는 방법으로 공백의 문자수를 최소화 하거나 일치되는 문자를 최대화 하는 방법을 사용합니다. Score Function을 위해 유사성 측정값에 대한 가점과 감점을 부여하여 계산을 합니다.

서열 정렬법으로는 Global, Local 정렬법, Pairwise , Multiple 정렬 방법이 있습니다.

- Global 정렬법 < Needleman-Wunsch algorithm >

D(i; j) = max D(i-1, j -1) + s(xi,yj)

D(i -1 1; j) + gap

D(i; j -,1) + gap

1. 스코어값을 초기화 합니다. (Initialisation of the score matrix)

2. 일치가점 +1 (Calculation of scores and filling the traceback matrix )

3. 공백감점 -1, 불일치감점 0점 (Deducing the alignment from the traceback matrix)

CAGAT와 CACTGCT에 대한 비교로 Cell을 아래 좌표로 나눠 볼 수 있습니다.

		C	A	G	A	T
	C(1,1)	C(1,2)	C(1,3)	C(1,4)	C(1,5)	C(1,6)
C	C(2,1)
A	C(3,1)
C	C(4,1)
T	C(5,1)
G	C(6,1)
C	C(7,1)
T	C(8,1)

		C	A	G	A	T
	0 (done)	-1 (left)	-2 (left)	-3 (left)	-4 (left)	-5 (left)
C	-1 (up)
A	-2 (up)
C	-3 (up)
T	-4 (up)
G	-5 (up)
C	-6 (up)
T	-7 (up)

The score matrix cells are filled by row starting from the cell C(2; 2)

The score of any cell C(I,j) is the maximum of:

qdiag = C(i-1, j-1) + S(I, j)

qup = C(i–1, j) + gap

qleft = C(i, j-1) + gap

where S(i; j) is the substitution score for Letters i and j, and g is the gap penalty

The calculation for the cell C(2; 2):

qdiag = C(1, 1) + S(S, A) = 0 + 1 = 1

qup = C(1, 2) + g (공백감점)= -1+ (-1) = -2

qleft = C(2, 1) + g (공백감점) = -1 + (-1) = -2

Where C(1,1), C(1, 2), and C(2,1) are read from the score matrix, and S(S,A) is the score for the S <-> A taken from the BLOSUM62 matrix.

For the score matrix C(2, 2) = qdiag which is 1. The corresponding cell of the traceback matrix is "diag":

		C	A	G	A	T
	0 (done)	-1 (left)	-2 (left)	-3 (left)	-4 (left)	-5 (left)
C	-1 (up)	1
A	-2 (up)
C	-3 (up)
T	-4 (up)
G	-5 (up)
C	-6 (up)
T	-7 (up)

For the score matrix C(2, 3)

qdiag = C(1, 2) + 불일치감점 = -1 + 0 = -1

qup = C(1, 3) + 공백감점= -2+ (-1) = -3

qleft = C(2, 2) + 공백감점 = 1 + (-1) = 0

For the score matrix C(2, 4)

qdiag = C(1, 3) + 불일치감점 = -2 + 0 = -2

qup = C(1, 4) + 공백감점= -3+ (-1) = -4

qleft = C(2, 3) + 공백감점 = 0 + (-1) = -1

....

For the score matrix C(3, 3)

qdiag = C(2, 2) + 일치가점 = 1 + 1 = 2

qup = C(2, 3) + 일치가점= 0+ 1 = 1

qleft = C(3, 2) + 일치가점 = 0 +1 = 1

		C	A	G	A	T
	0 (done)	-1 (left)	-2 (left)	-3 (left)	-4 (left)	-5 (left)
C	-1 (up)	1	0	-1	-2	-3
A	-2 (up)	0	2	1	0	-1
C	-3 (up)	-1	1	2	1	0
T	-4 (up)	-2	0	1	2	2
G	-5 (up)	-3	-1	1	1	2
C	-6 (up)	-4	-2	0	1	1
T	-7 (up)	-5	-3	-1	0	2

이런 방법으로 대각선, 가로, 세로 형식으로 값을 전부 계산합니다.

그 중에서 가장 큰 값을 표시하고, 일치하는 것은 뒤에서 부터 붙여봅니다.

GAT

GCT

-GAT

TGCT

--GAT

CTGCT

CA - - GAT

CACTGCT

The alignment is over the entire length of two sequences: the traceback starts from the lower right corner of the traceback matrix, and completes in the upper left cell of the matrix.

The Needleman-Wunsch algorithm works in the same Way regardless of the length or complexity of sequences, and guarantees to find the best alignment.

The Needleman-Wunsch algorithm is appropriate for finding the best alignment of two sequences which are (i) of the similar length; (ii) similar across their Entire lengths

https://enjoybioinfo.blogspot.com/2020/10/needleman-wunsch-algorithm.html

https://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

- Local 정렬법 < Smith-Waterman algorithm >

주어진 치환행렬과 갭 감점을 기반으로 두서열의 유사도가 높은 영역을 정렬하는 방법입니다.

최적의점수를 기록한 서열의 정렬을 찾아냅니다. 최적의 부분 정렬을 찾아내는 방법입니다.

아래 그림은 서열 정렬을 나타내는 알고리즘입니다.

지역적인 일치성에 대해 연산만 나타나기에 전역정렬 보다는 계산 수식이 간단하게 나타납니다.

http://www.incodom.kr/Smith%E2%80%93Waterman_algorithm

https://freshrimpsushi.github.io/posts/smith-waterman-alignment/

참고 교재 - 생명정보학 (조재창 역, 월드사이언스)

저작자표시 비영리 동일조건 (새창열림)

'생명정보학 & 화학정보학 > NCBI와 블라스트' 카테고리의 다른 글

바이오인포매틱스 개요 (0)	2022.08.29
H1N1 시리즈 분석 (0)	2022.08.01
PAM과 BLOSUM (0)	2021.11.01
블라스트와 MEGA-X 염기서열 분석 비교 관련 (0)	2021.06.11
COVID19 는 어디에서 왔을까? (feat. NCBI와 블라스트 ) (0)	2021.05.23

'생명정보학 & 화학정보학/NCBI와 블라스트' Related Articles

데이터 과학

서열분석 - Needleman Wunsch algorithm 본문

서열분석 - Needleman Wunsch algorithm

'생명정보학 & 화학정보학 > NCBI와 블라스트' 카테고리의 다른 글

티스토리툴바