Monday, July 20, 2009

CCK09: Validating Aneesha Bakharia's SNA Moodle Tool

Since I am unaware of any other validated tool to compare the data output of Bakharia's SNA Moodle Tool I would have to manually verify its reliability and validity. I am only going to show the method here and use a single discussion thread as an example.

My approach involves using the Firefox extension Web Developer to expose the message IDs and link details.

In Web Developer->Information->Display Id & Class Details, and Information->Display link details are enabled.

Figure 1: A CCK08 discussion thread opened in Firefox with Web Developer enabled

The figure above shows that Moodle forums identify each post with an ID. The initial post has an id #p3199. The same ID is displayed in the "Reply" link in the lower right corner of the message box.

Replies identify the parent message replied to which in the figure is encircled green as #p3199, and beside it is the link to the id of the reply e.g. #p3204 for the 1st reply.

This is how an automated method of data gathering is able to link a reply post to a parent post.


The reliability issue with the SNA Moodle Tool is whether it is able to capture this data accurately. Firstly is whether it accurately count the frequency of posts by each actor. And secondly is whether the Moodle SNA Tool name pair ties is congruent to the HTML id pair ties.

Table 1: Name pairs versus ID pairs in the discussion thread


*Node data

A 1 A 3284 1 yes
B 1 B 3491 1 yes
C 1 C 3327 1 yes

D 3710 1

D 3713 1

D 3879 1

D 3880 1

D 3887 1

D 3901 1

D 4032 1

D 4037 1
D 8

8 yes
E 1 E 3969 1 yes
F 1 F 3470 1 yes
G 1 G 5518 1 yes

H 3443 1

H 3535 1

H 3742 1

H 3871 1

H 4024 1
H 5

5 yes
I 1 I 3679 1 yes
J 1 J 3652 1 yes
K 1 K 3300 1 yes
L 1 L 3302 1 yes
M 1 M 3515 1 yes
N 1 N 3650 1 yes

O 3199 1

O 3271 1

O 3384 1

O 3595 1

O 3674 1

O 3695 1

O 3726 1

O 3744 1

O 3746 1

O 3747 1

O 3753 1

O 4034 1
O 12

12 yes
P 1 P 3768 1 yes

Q 3205 1

Q 3635 1

Q 3668 1
Q 3

3 yes
R 1 R 3605 1 yes

S 4061 1

S 4064 1
S 2

2 yes

T 3730 1

T 3737 1

T 3752 1

T 3754 1

T 3755 1

T 3885 1

T 3886 1

T 3893 1

T 3911 1

T 4001 1

T 4042 1

T 4089 1
T 12

12 yes
U 1 U 3204 1 yes
V 1 V 4178 1 yes

W 3381 1

W 3455 1

W 3734 1

W 3750 1
W 4

4 yes
X 1 X 3337 1 yes

Y 3285 1

Y 3335 1

Y 3678 1

Y 3705 1
Y 4

4 yes

Z 3924 1

Z 3926 1

Z 3927 1
Z 3

3 yes

AA 3293 1

AA 3365 1
AA 2

2 yes

BB 3545 1

BB 3604 1

BB 3609 1

BB 3774 1
BB 4

4 yes

CC 3222 1

CC 3612 1

CC 3873 1
CC 3

3 yes

Table 1 shows that there is 100% equivalence between the frequency of posts in the tool, and that of the html posts.

By listing in a table, manually, the id pairs associated with names I am able to test the reliability of the tool.

Table 2: Tie data and posts ids; Names in content.

*Tie data Is it correct id/name pair?
explicitly addressed Is it the right person?
from to

A3284 CC3222 yes 1

B3491 W3381 yes 1

C3327 O3199 yes 1

D4032 H4024 yes
H yes

D3879 H3871 yes 2

D4037 T3911 yes

D3887 T3886 yes 3

D3901 T3885 yes

D3710 BB3609 yes 1

D3880 CC3873 yes

D3713 CC3612 yes 2 CC yes

E3969 T3754 yes 1 T yes

F3470 AA3293 yes 1 A yes

G5518 O3199 yes 1

H4024 D3710 yes 1 D yes

H3535 M3515 yes 1 M yes

H3871 T3754 yes 1 T yes

H3443 W3381 yes 1 W yes

H3742 BB3609 yes 1

I3679 CC3612 yes 1

J3652 Q3635 yes 1 ??? no (name not in discussion)

K3300 Y3285 yes 1 Y yes
L3302 AA3293 yes 1

M3515 H3443 yes 1

N3650 R3605 yes 1 R yes

O4034 D3710 yes 1 D yes

O3384 K3300 yes 1 K yes

O3747 O3746 yes 1

O3674 Q3635 yes 1 Q yes

O3746 T3737 yes 1 T yes
O3271 U3204 yes 1 U yes

O3753 W3750 yes 1 W yes
O3726 Y3705 yes

O3695 Y3678 yes 2 Y yes

O3595 BB3545 yes 1 BB yes

O3744 CC3612 yes 1 CC yes

P3768 O3744 yes 1

Q3668 J3652 yes 1 J yes

Q3635 O3384 yes 2

Q3205 O3199 yes

R3605 Y3285 yes 1

S4061 D4037 yes 1 D yes

S4064 T3730 yes 1 T yes

T3730 D3710 yes

T3893 D3887 yes

T3911 D3901 yes

T3886 D3879 yes 5

T4042 D4037 yes

T4001 E3969 yes 1

T3885 H3871 yes 1

T3755 O3753 yes

T3754 O3747 yes 2

T4089 S4064 yes 1

T3737 W3734 yes 2

T3752 W3750 yes

U3204 O3199 yes 1

V4178 T3893 yes 1

W3455 H3443 yes 1 H yes

W3750 O3746 yes
O yes

W3734 O3674 yes 2 O yes
W3381 Y3335 yes 1

X3337 Y3335 yes 1

Y3335 C3327 yes 1

Y3705 O3695 yes 1

Y3678 R3605 yes 1

Y3285 U3204 yes 1

Z3924 O3384 yes 2 O yes

Z3927 O3199 yes
O yes

Z3926 Q3635 yes 1 Q yes

AA3293 A3284 yes 1

AA3365 Y3335 yes 1

BB3609 H3443 yes 2 H yes

BB3774 H3742 yes
H yes

BB3604 O3595 yes
O yes

BB3545 O3199 yes 2

CC3873 D3713 yes 1

CC3612 O3199 yes

CC3222 O3199 yes 2

100% reliability 78 total explicitly named 32 (41.03%) total other named 4 (5.13 %)

total right name 31 (39.74%) % other/32 12.50%

%right name/32 96.88%

total wrong person 1 (1.28%)

% wrong name/32 3.13%

Column 3 of Table 2 shows that there is 100% reliability in the congruence of name pairs in the SNA tool and the id pairs in the html.


How sure are we that the person being resplied to as indicated in the html ids of the parent post is the actual person addressed in the content?

In Column 4 of Table 2 32 (41.03%) of 78 replies explicitly named the person being addressed. 31 (39.74%) is congruent with the parent post id, while 1 (1.28%) addressed a name that is not in the discussion. But looking at the context, the person who is addressed ignored the wrong name and replied. So I interpret it as an honest mistake.

If the sample n (32) is used then 96.88% of the names are correct while 3.13% is incorrect.

Lost Information

What about people who were addressed in the same post but were not reflected in the parent id. In Column 6 of Table 2, 4 (5.13%)post of 78 had two persons addressed in the same posts. This is 12.5% of all explicitly named replies (32) for this sample thread.

Another information lost is which messages are replies to which particular messages. Since the vertices in the SNA Tool represent actors, we can see from the tables that all similar ties, and all posts by the same actor are aggregated.

The SNA Tool have actors as vertices and would produce the following graph for this thread.

Figure 2: Pajek graph of SNA Moodle tool data

While making the particular posts as nodes and the actor id as partition will produce this graph.

Figure 3: Pajek output of posts from html discussion page

This last look to me like a communication graph, and not a social network.

Both graphs are less rich in data as opposed to what has been tried here: (x28x28de, 2008). This appears to me an overlay of an SNA graph (head icons) and a communication graph (vertices). This also has a time element. But it is difficult to see the connections at an instant with this approach. Perhaps if it was in 3d it would be better.

Dataset for graphs (33.09 kb) :


This only outlines the approach. To generalize, I am going to use stratified random sampling (Best & Kahn, 1989). I would need 99 sample discussion threads (groan) to be subjected to this approach.

I will check the number of posts and the name pairs. Also, the distribution of the proportions of verified names in the content, wrong names, and lost ties (others addressed in the content) will be analyzed across the sample.

My concern is how to test for statistical significance of these distributions of proportions? I wonder if a z-test will do?


Bakharia, A. & Dawson, Shane. Moodle SNA Analysis [Javascript program]. In Blackboard and WebCT - forum social network Analysis Tool. Random Syntax [Blog]. Retrieved July 12, 2009, from

Best, J.W., & Kahn, J.V. (1989). Research in education (6th ed.). New Jersey: Prentice-Hall, p.14 .

x28x28de. (2008, September 13). A centralized forum discussion. Retrieved July 20, 2009 from

Pederick, C. Web Developer 1.1.8, Firefox browser extension. [Software]. Available at Developer homepage at

1 comment:

  1. Table 2 seems to have been cut by the blog layout. If you're using firefox, just click Menu->View->Page style->No style to see the whole table.


Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.