Working example WPGMA
1 working example
1.1 first step
1.2 second step
1.3 third step
1.4 final step
1.5 wpgma dendrogram
working example
first step
first clustering
let assume have 5 elements
(
a
,
b
,
c
,
d
,
e
)
{\displaystyle (a,b,c,d,e)}
, following matrix
d
1
{\displaystyle d_{1}}
of pairwise distances between them :
in example,
d
1
(
a
,
b
)
=
17
{\displaystyle d_{1}(a,b)=17}
smallest value of
d
1
{\displaystyle d_{1}}
, join elements
a
{\displaystyle a}
,
b
{\displaystyle b}
.
first branch length estimation
let
u
{\displaystyle u}
denote node
a
{\displaystyle a}
,
b
{\displaystyle b}
connected. setting
δ
(
a
,
u
)
=
δ
(
b
,
u
)
=
d
1
(
a
,
b
)
/
2
{\displaystyle \delta (a,u)=\delta (b,u)=d_{1}(a,b)/2}
ensures elements
a
{\displaystyle a}
,
b
{\displaystyle b}
equidistant
u
{\displaystyle u}
. corresponds expectation of ultrametricity hypothesis. branches joining
a
{\displaystyle a}
,
b
{\displaystyle b}
u
{\displaystyle u}
have lengths
δ
(
a
,
u
)
=
δ
(
b
,
u
)
=
17
/
2
=
8.5
{\displaystyle \delta (a,u)=\delta (b,u)=17/2=8.5}
(see final dendrogram)
first distance matrix update
we proceed update initial distance matrix
d
1
{\displaystyle d_{1}}
new distance matrix
d
2
{\displaystyle d_{2}}
(see below), reduced in size 1 row , 1 column because of clustering of
a
{\displaystyle a}
b
{\displaystyle b}
. bold values in
d
2
{\displaystyle d_{2}}
correspond new distances, calculated averaging distances between first cluster
(
a
,
b
)
{\displaystyle (a,b)}
, each of remaining elements:
d
2
(
(
a
,
b
)
,
c
)
=
(
d
1
(
a
,
c
)
+
d
1
(
b
,
c
)
)
/
2
=
(
21
+
30
)
/
2
=
25.5
{\displaystyle d_{2}((a,b),c)=(d_{1}(a,c)+d_{1}(b,c))/2=(21+30)/2=25.5}
d
2
(
(
a
,
b
)
,
d
)
=
(
d
1
(
a
,
d
)
+
d
1
(
b
,
d
)
)
/
2
=
(
31
+
34
)
/
2
=
32.5
{\displaystyle d_{2}((a,b),d)=(d_{1}(a,d)+d_{1}(b,d))/2=(31+34)/2=32.5}
d
2
(
(
a
,
b
)
,
e
)
=
(
d
1
(
a
,
e
)
+
d
1
(
b
,
e
)
)
/
2
=
(
23
+
21
)
/
2
=
22
{\displaystyle d_{2}((a,b),e)=(d_{1}(a,e)+d_{1}(b,e))/2=(23+21)/2=22}
italicized values in
d
2
{\displaystyle d_{2}}
not affected matrix update correspond distances between elements not involved in first cluster.
second step
second clustering
we reiterate 3 previous steps, starting new distance matrix
d
2
{\displaystyle d_{2}}
:
here,
d
2
(
(
a
,
b
)
,
e
)
=
22
{\displaystyle d_{2}((a,b),e)=22}
smallest value of
d
2
{\displaystyle d_{2}}
, join cluster
(
a
,
b
)
{\displaystyle (a,b)}
, element
e
{\displaystyle e}
.
second branch length estimation
let
v
{\displaystyle v}
denote node
(
a
,
b
)
{\displaystyle (a,b)}
,
e
{\displaystyle e}
connected. because of ultrametricity constraint, branches joining
a
{\displaystyle a}
or
b
{\displaystyle b}
v
{\displaystyle v}
, ,
e
{\displaystyle e}
v
{\displaystyle v}
equal , have following length:
δ
(
a
,
v
)
=
δ
(
b
,
v
)
=
δ
(
e
,
v
)
=
22
/
2
=
11
{\displaystyle \delta (a,v)=\delta (b,v)=\delta (e,v)=22/2=11}
we deduce missing branch length:
δ
(
u
,
v
)
=
δ
(
e
,
v
)
−
δ
(
a
,
u
)
=
δ
(
e
,
v
)
−
δ
(
b
,
u
)
=
11
−
8.5
=
2.5
{\displaystyle \delta (u,v)=\delta (e,v)-\delta (a,u)=\delta (e,v)-\delta (b,u)=11-8.5=2.5}
(see final dendrogram)
second distance matrix update
we proceed update
d
2
{\displaystyle d_{2}}
matrix new distance matrix
d
3
{\displaystyle d_{3}}
(see below), reduced in size 1 row , 1 column because of clustering of
(
a
,
b
)
{\displaystyle (a,b)}
e
{\displaystyle e}
:
d
3
(
(
(
a
,
b
)
,
e
)
,
c
)
=
(
d
2
(
(
a
,
b
)
,
c
)
+
d
2
(
e
,
c
)
)
/
2
=
(
25.5
+
39
)
/
2
=
32.25
{\displaystyle d_{3}(((a,b),e),c)=(d_{2}((a,b),c)+d_{2}(e,c))/2=(25.5+39)/2=32.25}
of note, average calculation of new distance not account larger size of
(
a
,
b
)
{\displaystyle (a,b)}
cluster (two elements) respect
e
{\displaystyle e}
(one element). similarly:
d
3
(
(
(
a
,
b
)
,
e
)
,
d
)
=
(
d
2
(
(
a
,
b
)
,
d
)
+
d
2
(
e
,
d
)
)
/
2
=
(
32.5
+
43
)
/
2
=
37.75
{\displaystyle d_{3}(((a,b),e),d)=(d_{2}((a,b),d)+d_{2}(e,d))/2=(32.5+43)/2=37.75}
the averaging procedure therefore gives differential weight initial distances of matrix
d
1
{\displaystyle d_{1}}
. reason why method weighted, not respect mathematical procedure respect initial distances.
third step
third clustering
we again reiterate 3 previous steps, starting updated distance matrix
d
3
{\displaystyle d_{3}}
.
here,
d
3
(
c
,
d
)
=
28
{\displaystyle d_{3}(c,d)=28}
smallest value of
d
3
{\displaystyle d_{3}}
, join elements
c
{\displaystyle c}
,
d
{\displaystyle d}
.
third branch length estimation
let
w
{\displaystyle w}
denote node
c
{\displaystyle c}
,
d
{\displaystyle d}
connected. branches joining
c
{\displaystyle c}
,
d
{\displaystyle d}
w
{\displaystyle w}
have lengths
δ
(
c
,
w
)
=
δ
(
d
,
w
)
=
28
/
2
=
14
{\displaystyle \delta (c,w)=\delta (d,w)=28/2=14}
(see final dendrogram)
third distance matrix update
there single entry update:
d
4
(
(
c
,
d
)
,
(
(
a
,
b
)
,
e
)
)
=
(
d
3
(
c
,
(
(
a
,
b
)
,
e
)
)
+
d
3
(
d
,
(
(
a
,
b
)
,
e
)
)
)
/
2
=
(
32.25
+
37.75
)
/
2
=
35
{\displaystyle d_{4}((c,d),((a,b),e))=(d_{3}(c,((a,b),e))+d_{3}(d,((a,b),e)))/2=(32.25+37.75)/2=35}
final step
the final
d
4
{\displaystyle d_{4}}
matrix is:
so join clusters
(
(
a
,
b
)
,
e
)
{\displaystyle ((a,b),e)}
,
(
c
,
d
)
{\displaystyle (c,d)}
.
let
r
{\displaystyle r}
denote (root) node
(
(
a
,
b
)
,
e
)
{\displaystyle ((a,b),e)}
,
(
c
,
d
)
{\displaystyle (c,d)}
connected. branches joining
(
(
a
,
b
)
,
e
)
{\displaystyle ((a,b),e)}
,
(
c
,
d
)
{\displaystyle (c,d)}
r
{\displaystyle r}
have lengths:
δ
(
(
(
a
,
b
)
,
e
)
,
r
)
=
δ
(
(
c
,
d
)
,
r
)
=
35
/
2
=
17.5
{\displaystyle \delta (((a,b),e),r)=\delta ((c,d),r)=35/2=17.5}
we deduce 2 remaining branch lengths:
δ
(
v
,
r
)
=
δ
(
(
(
a
,
b
)
,
e
)
,
r
)
−
δ
(
e
,
v
)
=
17.5
−
11
=
6.5
{\displaystyle \delta (v,r)=\delta (((a,b),e),r)-\delta (e,v)=17.5-11=6.5}
δ
(
w
,
r
)
=
δ
(
(
c
,
d
)
,
r
)
−
δ
(
c
,
w
)
=
17.5
−
14
=
3.5
{\displaystyle \delta (w,r)=\delta ((c,d),r)-\delta (c,w)=17.5-14=3.5}
the wpgma dendrogram
the dendrogram complete. ultrametric because tips (
a
{\displaystyle a}
e
{\displaystyle e}
) equidistant
r
{\displaystyle r}
:
δ
(
a
,
r
)
=
δ
(
b
,
r
)
=
δ
(
e
,
r
)
=
δ
(
c
,
r
)
=
δ
(
d
,
r
)
=
17.5
{\displaystyle \delta (a,r)=\delta (b,r)=\delta (e,r)=\delta (c,r)=\delta (d,r)=17.5}
the dendrogram therefore rooted
r
{\displaystyle r}
, deepest node.
Comments
Post a Comment