Working example WPGMA




1 working example

1.1 first step
1.2 second step
1.3 third step
1.4 final step
1.5 wpgma dendrogram





working example
first step

first clustering

let assume have 5 elements



(
a
,
b
,
c
,
d
,
e
)


{\displaystyle (a,b,c,d,e)}

, following matrix




d

1




{\displaystyle d_{1}}

of pairwise distances between them :



in example,




d

1


(
a
,
b
)
=
17


{\displaystyle d_{1}(a,b)=17}

smallest value of




d

1




{\displaystyle d_{1}}

, join elements



a


{\displaystyle a}

,



b


{\displaystyle b}

.



first branch length estimation

let



u


{\displaystyle u}

denote node



a


{\displaystyle a}

,



b


{\displaystyle b}

connected. setting



δ
(
a
,
u
)
=
δ
(
b
,
u
)
=

d

1


(
a
,
b
)

/

2


{\displaystyle \delta (a,u)=\delta (b,u)=d_{1}(a,b)/2}

ensures elements



a


{\displaystyle a}

,



b


{\displaystyle b}

equidistant



u


{\displaystyle u}

. corresponds expectation of ultrametricity hypothesis. branches joining



a


{\displaystyle a}

,



b


{\displaystyle b}





u


{\displaystyle u}

have lengths



δ
(
a
,
u
)
=
δ
(
b
,
u
)
=
17

/

2
=
8.5


{\displaystyle \delta (a,u)=\delta (b,u)=17/2=8.5}

(see final dendrogram)



first distance matrix update

we proceed update initial distance matrix




d

1




{\displaystyle d_{1}}

new distance matrix




d

2




{\displaystyle d_{2}}

(see below), reduced in size 1 row , 1 column because of clustering of



a


{\displaystyle a}





b


{\displaystyle b}

. bold values in




d

2




{\displaystyle d_{2}}

correspond new distances, calculated averaging distances between first cluster



(
a
,
b
)


{\displaystyle (a,b)}

, each of remaining elements:







d

2


(
(
a
,
b
)
,
c
)
=
(

d

1


(
a
,
c
)
+

d

1


(
b
,
c
)
)

/

2
=
(
21
+
30
)

/

2
=
25.5


{\displaystyle d_{2}((a,b),c)=(d_{1}(a,c)+d_{1}(b,c))/2=(21+30)/2=25.5}







d

2


(
(
a
,
b
)
,
d
)
=
(

d

1


(
a
,
d
)
+

d

1


(
b
,
d
)
)

/

2
=
(
31
+
34
)

/

2
=
32.5


{\displaystyle d_{2}((a,b),d)=(d_{1}(a,d)+d_{1}(b,d))/2=(31+34)/2=32.5}







d

2


(
(
a
,
b
)
,
e
)
=
(

d

1


(
a
,
e
)
+

d

1


(
b
,
e
)
)

/

2
=
(
23
+
21
)

/

2
=
22


{\displaystyle d_{2}((a,b),e)=(d_{1}(a,e)+d_{1}(b,e))/2=(23+21)/2=22}


italicized values in




d

2




{\displaystyle d_{2}}

not affected matrix update correspond distances between elements not involved in first cluster.


second step

second clustering

we reiterate 3 previous steps, starting new distance matrix




d

2




{\displaystyle d_{2}}

 :



here,




d

2


(
(
a
,
b
)
,
e
)
=
22


{\displaystyle d_{2}((a,b),e)=22}

smallest value of




d

2




{\displaystyle d_{2}}

, join cluster



(
a
,
b
)


{\displaystyle (a,b)}

, element



e


{\displaystyle e}

.



second branch length estimation

let



v


{\displaystyle v}

denote node



(
a
,
b
)


{\displaystyle (a,b)}

,



e


{\displaystyle e}

connected. because of ultrametricity constraint, branches joining



a


{\displaystyle a}

or



b


{\displaystyle b}





v


{\displaystyle v}

, ,



e


{\displaystyle e}





v


{\displaystyle v}

equal , have following length:



δ
(
a
,
v
)
=
δ
(
b
,
v
)
=
δ
(
e
,
v
)
=
22

/

2
=
11


{\displaystyle \delta (a,v)=\delta (b,v)=\delta (e,v)=22/2=11}


we deduce missing branch length:



δ
(
u
,
v
)
=
δ
(
e
,
v
)

δ
(
a
,
u
)
=
δ
(
e
,
v
)

δ
(
b
,
u
)
=
11

8.5
=
2.5


{\displaystyle \delta (u,v)=\delta (e,v)-\delta (a,u)=\delta (e,v)-\delta (b,u)=11-8.5=2.5}

(see final dendrogram)



second distance matrix update

we proceed update




d

2




{\displaystyle d_{2}}

matrix new distance matrix




d

3




{\displaystyle d_{3}}

(see below), reduced in size 1 row , 1 column because of clustering of



(
a
,
b
)


{\displaystyle (a,b)}





e


{\displaystyle e}

 :




d

3


(
(
(
a
,
b
)
,
e
)
,
c
)
=
(

d

2


(
(
a
,
b
)
,
c
)
+

d

2


(
e
,
c
)
)

/

2
=
(
25.5
+
39
)

/

2
=
32.25


{\displaystyle d_{3}(((a,b),e),c)=(d_{2}((a,b),c)+d_{2}(e,c))/2=(25.5+39)/2=32.25}


of note, average calculation of new distance not account larger size of



(
a
,
b
)


{\displaystyle (a,b)}

cluster (two elements) respect



e


{\displaystyle e}

(one element). similarly:







d

3


(
(
(
a
,
b
)
,
e
)
,
d
)
=
(

d

2


(
(
a
,
b
)
,
d
)
+

d

2


(
e
,
d
)
)

/

2
=
(
32.5
+
43
)

/

2
=
37.75


{\displaystyle d_{3}(((a,b),e),d)=(d_{2}((a,b),d)+d_{2}(e,d))/2=(32.5+43)/2=37.75}


the averaging procedure therefore gives differential weight initial distances of matrix




d

1




{\displaystyle d_{1}}

. reason why method weighted, not respect mathematical procedure respect initial distances.


third step

third clustering

we again reiterate 3 previous steps, starting updated distance matrix




d

3




{\displaystyle d_{3}}

.



here,




d

3


(
c
,
d
)
=
28


{\displaystyle d_{3}(c,d)=28}

smallest value of




d

3




{\displaystyle d_{3}}

, join elements



c


{\displaystyle c}

,



d


{\displaystyle d}

.



third branch length estimation

let



w


{\displaystyle w}

denote node



c


{\displaystyle c}

,



d


{\displaystyle d}

connected. branches joining



c


{\displaystyle c}

,



d


{\displaystyle d}





w


{\displaystyle w}

have lengths



δ
(
c
,
w
)
=
δ
(
d
,
w
)
=
28

/

2
=
14


{\displaystyle \delta (c,w)=\delta (d,w)=28/2=14}

(see final dendrogram)



third distance matrix update

there single entry update:




d

4


(
(
c
,
d
)
,
(
(
a
,
b
)
,
e
)
)
=
(

d

3


(
c
,
(
(
a
,
b
)
,
e
)
)
+

d

3


(
d
,
(
(
a
,
b
)
,
e
)
)
)

/

2
=
(
32.25
+
37.75
)

/

2
=
35


{\displaystyle d_{4}((c,d),((a,b),e))=(d_{3}(c,((a,b),e))+d_{3}(d,((a,b),e)))/2=(32.25+37.75)/2=35}


final step

the final




d

4




{\displaystyle d_{4}}

matrix is:



so join clusters



(
(
a
,
b
)
,
e
)


{\displaystyle ((a,b),e)}

,



(
c
,
d
)


{\displaystyle (c,d)}

.


let



r


{\displaystyle r}

denote (root) node



(
(
a
,
b
)
,
e
)


{\displaystyle ((a,b),e)}

,



(
c
,
d
)


{\displaystyle (c,d)}

connected. branches joining



(
(
a
,
b
)
,
e
)


{\displaystyle ((a,b),e)}

,



(
c
,
d
)


{\displaystyle (c,d)}





r


{\displaystyle r}

have lengths:






δ
(
(
(
a
,
b
)
,
e
)
,
r
)
=
δ
(
(
c
,
d
)
,
r
)
=
35

/

2
=
17.5


{\displaystyle \delta (((a,b),e),r)=\delta ((c,d),r)=35/2=17.5}


we deduce 2 remaining branch lengths:






δ
(
v
,
r
)
=
δ
(
(
(
a
,
b
)
,
e
)
,
r
)

δ
(
e
,
v
)
=
17.5

11
=
6.5


{\displaystyle \delta (v,r)=\delta (((a,b),e),r)-\delta (e,v)=17.5-11=6.5}






δ
(
w
,
r
)
=
δ
(
(
c
,
d
)
,
r
)

δ
(
c
,
w
)
=
17.5

14
=
3.5


{\displaystyle \delta (w,r)=\delta ((c,d),r)-\delta (c,w)=17.5-14=3.5}


the wpgma dendrogram




the dendrogram complete. ultrametric because tips (



a


{\displaystyle a}





e


{\displaystyle e}

) equidistant



r


{\displaystyle r}

 :






δ
(
a
,
r
)
=
δ
(
b
,
r
)
=
δ
(
e
,
r
)
=
δ
(
c
,
r
)
=
δ
(
d
,
r
)
=
17.5


{\displaystyle \delta (a,r)=\delta (b,r)=\delta (e,r)=\delta (c,r)=\delta (d,r)=17.5}


the dendrogram therefore rooted



r


{\displaystyle r}

, deepest node.







Comments

Popular posts from this blog

History Arab separatism in Khuzestan

Cyberspace as an Internet metaphor Cyberspace

Discography Little Pattie