Open Educational Tools: July 2009

Monday, July 27, 2009

*ORA: Unifying Pajek net files has never been easier

Legal stuff of *ORA

*ORA is NOT free software, just like many of the network visualization tools out there.

According to it's "About *ORA" dialog box "Permission to use, copy and modify this version of the software or any parts of it and its documentation is hereby granted for RESEARCH ONLY purposes and provided that the above copyright notice and this permission notice appear intact in all copies of the software that, that you do not sell the software, nor include the software in a commercial package."

*ORA is a copyright (2001-2008 of the version I'm using) of Kathleen M. CArley, Center for Computational Analysis of Social and Organizational Systems (CASOS). Institute for Software Research International (ISRI), School of Computer Science (SCS), and Carnegie Mellon University(CMU).

The version I'm reviewing is 1.9.5.29 with a build date of July 2008 in Ubuntu. And version 1.9.5.4.3 in Vista with a build date of June 2009.

*ORA is available at http://www.casos.cs.cmu.edu/projects/ora/

I am not connected to the copyright holders.

Why is the union of networks in Pajek wrong?

When you hit Menu->Union of vertices in Pajek it doesn't seem to take into account labels. It will simply add all the vertices resulting in duplicate vertices with the same label.

What is *Ora?

According to its homepage at the Computational Analysis of Social and Organizational Systems (CASOS) "*ORA is a risk assessment tool for locating individuals or groups that are potential risks given social, knowledge and task network information." That description may mislead people to think that this tool is only a very specialized network visualization-analysis tool. I found that it is probably the most advance and user friendly, free network analysis tool out there that can be used for all kinds of networks. It is particularly powerful because of its application of metamatrices, and appear to be able to analyze semantic, social, organizational, communication, etc. networks at the same time.

Although I haven't explored it's full potential yet, one particular application is the union of pajek networks. I can't seem to import netdraw files yet, or include the vector files associated with the pajek networks. But perhaps there is a way I haven't found yet.

How to make a union of networks in *ORA

1. In *ORA click menu->File->Data import wizard... or Ctrl-O 2. A dialogue box will appear, select "Create a meta-network from separate network files" and click "Next"
3. In the next dialogue box, I just left the default value of "Create a new meta-network with ID: Meta Network" then I hit Next.

4. A dialogue box for choosing the networks to load in *ORA will appear. Just Browse for your file and hit OK.

I you have more than one network file just click the plus (+) button and another browsing textbox will appear.

I've experienced *ORA complained about the filename of networks like forum1.net, forum2.net. It interpreted it as being the same network. So what I did was to import one network at a time. I don't know if it would be different had I changed the names of the second network.

You can select the type of metamatrix from the drop-down boxes. Since mine is composed of actors, I selected "Agent" as Source type and Target type. I also set "Source node labels: Yes".

Click Finish.

5. A small dialogue box will appear asking "Would you like to anonymize the data?" I hit "No" because I need the labels for the union network. You can always anonymize the data later with menu->Data Management->Meta-Network Anonymize...

6.All your loaded network will appear in the left panel. This should be familiar to people using the Network Workbench Tool.

7. To make a union of the two network (in my example one is size 83 and the other is size 61) click menu->Data Management->Meta-Network Union...

A dialogue box will appear. I kept the default "Currently loaded in ORA" and set "Select how to combine link weights" to Sum.

Then hit Compute.

8. It will create a new Network in the left panel with the name Union. Mine produced a network with 108 vertices from size 83 and 61 networks. Whereas Pajek will create a 144 size network.

9. To Anonymize the network select the Union network in the panel click menu->Data Management->Meta-Network Anonymize... or simply right click the Union network in the panel and select Anonymize...

It will show a dialogue box wherein I checked "Anonymize node titles". You can even create decoder files. Then hit the Anonymize button.

To examine your labels, just select Agent in the panel like in the figure. YOu can see the Node Titles are A-1, A-2 etc.

10. You can examine your Union network with menu->Visualization->Network visualizer->View Entire Meta-Network. It's visualizer is still sluggish and not as impressive as Pajek, but it could be because I'm using the wrong OS platform. See below. It seems it's using JUNG (http://jung.sourceforge.net) library for visualization.

Then click ORA Network Visualizer->menu->Save Image To File...

You can save the graph in PNG, JPG, SVG, and PDF formats.

And here's the output.

11. Converting from *ORA's DynetML to NetDraw's VNA

*ORA'S Network Format Converter will convert from DyNetML, ASSAMML, UCINET, CSV, and DL to DyNetML, UCINET, DL, CSV, and VNA.

Click on File->Network Format Converter and a dialogue box will appear. In the Load tab, select Format DYNETML and Load your Union file that was saved as .xml.

Then go to the Save tab, and select VNA and save.

Here's a graph render of the VNA output in NetDraw.

Problems encountered

I installed the Windows version in a Vista machine. But I coudld not import any network files. It resulted in Java errors.

I was able to import graphs in the Ubuntu Linux Jaunty installation. But *ORA seems 32 bit and I had 64 bit installed. It had a lot of GTK related errors, particularly with the canberra-gtk-module. So it was a bit buggy and I couldn't use the Network Format Converter because it would not open.

I had to shift to the Windows version with the saved DyNetML xml file, and convert it there. I don't think it converts back to Pajek net file though.

Perhaps if I can resolve the 32 bit GTK problem in my installation or install a 32 bit Ubuntu things will be better. But I managed to make a proper union of the networks without resorting to cumbersome spreadsheet techniques.

Unfortunately, I haven't been able to unify the vectors yet.

Aside from that, I don't know how *ORA will handle very large networks.

I think, if the developers continue to improve *ORA, particularly its visualizer it will become even better than Pajek in the future.

But remember that *ORA is NOT free software and is available only for research. Pajek too is proprietary and according to its "About Pajek" dialogue box is free (as in free beer) for noncommercial use only. I personally haven't found a free (beer) tool that equals Pajek in speed, visualization, and analysis.

The Network Workbench Tool (version 12.0.0 beta), a copyright of Indiana University is licensed under Apache License Version 2.0. Gephi version 0.5 Beta is GPL 3.0 (gephi.org) but it is only a graph editor. Perhaps Gephi can be partnered with the sna 2.0 R package (http://erzuli.ss.uci.edu/R.stuff) from Carter Butts which is GPL 2.0 and later.

References

Kathleen M. CArley, Center for Computational Analysis of Social and Organizational Systems (CASOS), Institute for Software Research International (ISRI), School of Computer Science (SCS), and Carnegie Mellon University(CMU). *ORA [Software]. Available at http://www.casos.cs.cmu.edu/projects/ora/

Butts, C.T. The SNA Package, v2.0 [Software]. Available at http://erzuli.ss.uci.edu/R.stuff

Indiana University. The Network Workbench Tool [Software]. Available at http://nwb.slis.indiana.edu/download.html.

R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

V. Batagelj, A. Mrvar: Pajek â€“ Program for Large Network Analysis. Home page: http://vlado.fmf.uni-lj.si/pub/networks/pajek/

Sunday, July 26, 2009

Union of all CCK08 Moodle Forums Social Network

Let me state first as an errata, that I have NOT been given any formal approval to study the CCK08 Moodle forums by Prof. Siemens or Prof. Downes, but was merely informed that they are publicly available. The data, graphs and research plan here is NOT connected nor endorsed by the said professors or their institution. All liability and errors are mine. I am doing this as a private individual and in good faith.

(EDIT: Removed grumbling about research ethics and replaced it with dataset- July 27, 2009)
I've been thinking all night about my ethical dilemma about getting consent from 536 CCK08 Moodle forums participants. I've decided to make an 180 degree turn around. I've decided to release the dataset, because the graphs are totally useless without them. So here they are.

Dataset of anonymized union of all forums in Pajek format (14.85 KB) http://www.mediafire.com/file/m0n3oritnof/cck08allforums_pajek.zip

Here is my justification, the SNA analysis processed two information from the interface of the Moodle forums, which is publicly available. The first is the names of the poster which was used as an id to identify vertices across the forum. The second information is the message id that was autogenerated by Moodle when the poster click reply or post. The body of the message was not touched at all. Gathering the participants' email addresses on top of their names seems to me exacerbating my ethical problem. It's like I need to gather more information about them without their consent so that I can ask their consent to get information that I already have. I have anonymized the Moodle participants' names, and that is all I can do. I can only claim "in the interest of science" as defense i.e. so that discussion about the theory will have hard data to stand on.

Any one who reads this blog and wants me to delete the dataset, just post it in the comment and I'll consider it.

(END of edit-July 27, 2009)

Caveat emptor, this social network of the CCK08 moodle forums can not tell us the whole story of CCK08. There are only 560 registered users. The others probably had their centrality in the blogs, wiki, facebook, twitter etc.

Even some of the isolates and peripheral participants in the moodle network may be in the other networks. I know a peripheral participant in the moodle network who was an active blogger. I myself posted only 7 times in the forum and centered my learning in the wiki resources.

Another source of uncertainty is what I have reported before, the type I error (wrong ties) and type II error (unreported ties).

Let me start with some basic parameters of the social network.

Table 1: Network Parameters of CCK08 Moodle forums

Forums	total initial posts	Total Replies	Total Posts	Participants	% of total unique actors (n =537)
1 (Introduction)	494	798	1292	501	93.3
2 (General)	122	1283	1405	165	30.73
3 (forum 1)	11	301	312	83	15.46
4 (forum 2)	25	291	316	61	11.36
5 (forum 3)	26	427	453	48	8.94
6 (forum 4)	7	131	138	28	5.21
7 (forum 5)	19	252	271	42	7.82
8 (forum 6)	16	96	112	30	5.59
9 (forum 7)	15	117	132	25	4.66
10 (forum 8)	17	244	261	85	15.83
11 (forum 9)	13	133	146	28	5.21
12 (forum 10)	11	77	88	24	4.47
13 (forum 11)	10	143	153	35	6.52
14 (forum 12)	14	105	119	29	5.4
grand total	800	4398	5198	537 (unique actors)
				560 (registered users)

Figure 1: Frequency of initial posts in the forums

Figure 2: Frequency of total replies in the forums

Figure 3: Frequency of total posts in the forums

Figure 4: Frequency of participants in the forums

All line graphs were generated with R.

CCK08 Moodle Forums Social Network

Dataset: http://www.mediafire.com/file/m0n3oritnof/cck08allforums_pajek.zip

Pajek renders (fruchterman reingold 2D layout)

Figure 5: Union of all forums without tie strength

Figure 6: Union of all forums without tie strength and frequency of posts as vector (size of vertices)

Figure 7: Union of forums with tie stregth

Figure 8: Union of forums with tie strengths and frequency of posts as vector (size of vertices)

Netdraw Renders (spring embeddeding layout)

Figure 9: No tie strength

Figure 10: With tie strength

Figure 11: Scale 1:1 of total posts attribute represented as sizes of nodes

Figure 12: Figure 11: Scaled to minimum size 4:maximum size 50 of total posts attribute represented as sizes of nodes

One way to connect the isolates in the moodle network with the bloggers is to look at the moodle activity logs (which is only available to the site admin), and probably to mine it with Weka. Then find out which of the core moodle participants they are following. Afterwards compare the semantic network of their blogs with the semantic networks of the said participants.

An interested researcher may do semantic network analysis of the CCK08 Moodle core participants posts probably with Kathleen Carley's Automap or Carter Butts' The metamatrix package for organizational analysis, a package for R.

(Edited July 27, 2009)
Despite the research ethics issue and my aching clicking arm, eyes and back, I've learned a lot from this attempt. I'll probably use the method I've learned here in a dataset where I have taken all necessary ethical requirements. All in all it was fun.
(Edited July 27, 2009)

References:

Borgatti, S. Netdraw [Software]. Available at http://www.analytictech.com/Netdraw/netdraw.htm.

R: A Language and Environment for Statistical Computing. R Development Core Team. R Foundation for Statistical Computing. Vienna, Austria, 2009,ISBN 3-900051-07-0}, http://www.R-project.org.

V. Batagelj, A. Mrvar: Pajek - Program for Large Network Analysis. Home page
http://vlado.fmf.uni-lj.si/pub/networks/pajek/.

Monday, July 20, 2009

CCK09: Validating Aneesha Bakharia's SNA Moodle Tool

Since I am unaware of any other validated tool to compare the data output of Bakharia's SNA Moodle Tool I would have to manually verify its reliability and validity. I am only going to show the method here and use a single discussion thread as an example.

My approach involves using the Firefox extension Web Developer to expose the message IDs and link details.

In Web Developer->Information->Display Id & Class Details, and Information->Display link details are enabled.

Figure 1: A CCK08 discussion thread opened in Firefox with Web Developer enabled

The figure above shows that Moodle forums identify each post with an ID. The initial post has an id #p3199. The same ID is displayed in the "Reply" link in the lower right corner of the message box.

Replies identify the parent message replied to which in the figure is encircled green as #p3199, and beside it is the link to the id of the reply e.g. #p3204 for the 1st reply.

This is how an automated method of data gathering is able to link a reply post to a parent post.

Reliability

The reliability issue with the SNA Moodle Tool is whether it is able to capture this data accurately. Firstly is whether it accurately count the frequency of posts by each actor. And secondly is whether the Moodle SNA Tool name pair ties is congruent to the HTML id pair ties.

Table 1: Name pairs versus ID pairs in the discussion thread

VNA DATA		HTML DATA
*Node	data
ID	posts	POST NAME	POST ID	NO. OF POSTS	EQUAL?
A	1	A	3284	1	yes
B	1	B	3491	1	yes
C	1	C	3327	1	yes
		D	3710	1
		D	3713	1
		D	3879	1
		D	3880	1
		D	3887	1
		D	3901	1
		D	4032	1
		D	4037	1
D	8			8	yes
E	1	E	3969	1	yes
F	1	F	3470	1	yes
G	1	G	5518	1	yes
		H	3443	1
		H	3535	1
		H	3742	1
		H	3871	1
		H	4024	1
H	5			5	yes
I	1	I	3679	1	yes
J	1	J	3652	1	yes
K	1	K	3300	1	yes
L	1	L	3302	1	yes
M	1	M	3515	1	yes
N	1	N	3650	1	yes
		O	3199	1
		O	3271	1
		O	3384	1
		O	3595	1
		O	3674	1
		O	3695	1
		O	3726	1
		O	3744	1
		O	3746	1
		O	3747	1
		O	3753	1
		O	4034	1
O	12			12	yes
P	1	P	3768	1	yes
		Q	3205	1
		Q	3635	1
		Q	3668	1
Q	3			3	yes
R	1	R	3605	1	yes
		S	4061	1
		S	4064	1
S	2			2	yes
		T	3730	1
		T	3737	1
		T	3752	1
		T	3754	1
		T	3755	1
		T	3885	1
		T	3886	1
		T	3893	1
		T	3911	1
		T	4001	1
		T	4042	1
		T	4089	1
T	12			12	yes
U	1	U	3204	1	yes
V	1	V	4178	1	yes
		W	3381	1
		W	3455	1
		W	3734	1
		W	3750	1
W	4			4	yes
X	1	X	3337	1	yes
		Y	3285	1
		Y	3335	1
		Y	3678	1
		Y	3705	1
Y	4			4	yes
		Z	3924	1
		Z	3926	1
		Z	3927	1
Z	3			3	yes
		AA	3293	1
		AA	3365	1
AA	2			2	yes
		BB	3545	1
		BB	3604	1
		BB	3609	1
		BB	3774	1
BB	4			4	yes
		CC	3222	1
		CC	3612	1
		CC	3873	1
CC	3			3	yes

Table 1 shows that there is 100% equivalence between the frequency of posts in the tool, and that of the html posts.

By listing in a table, manually, the id pairs associated with names I am able to test the reliability of the tool.

Table 2: Tie data and posts ids; Names in content.

*Tie	data	Is it correct id/name pair?		explicitly addressed	Is it the right person?		other
from	to		strength
A3284	CC3222	yes	1
B3491	W3381	yes	1
C3327	O3199	yes	1
D4032	H4024	yes		H	yes
D3879	H3871	yes	2
D4037	T3911	yes
D3887	T3886	yes	3
D3901	T3885	yes
D3710	BB3609	yes	1
D3880	CC3873	yes
D3713	CC3612	yes	2	CC	yes
E3969	T3754	yes	1	T	yes
F3470	AA3293	yes	1	A	yes
G5518	O3199	yes	1
H4024	D3710	yes	1	D	yes
H3535	M3515	yes	1	M	yes
H3871	T3754	yes	1	T	yes
H3443	W3381	yes	1	W	yes
H3742	BB3609	yes	1
I3679	CC3612	yes	1
J3652	Q3635	yes	1	???	no (name not in discussion)
K3300	Y3285	yes	1	Y	yes		H
L3302	AA3293	yes	1
M3515	H3443	yes	1
N3650	R3605	yes	1	R	yes
O4034	D3710	yes	1	D	yes
O3384	K3300	yes	1	K	yes
O3747	O3746	yes	1
O3674	Q3635	yes	1	Q	yes
O3746	T3737	yes	1	T	yes		W
O3271	U3204	yes	1	U	yes
O3753	W3750	yes	1	W	yes		T
O3726	Y3705	yes
O3695	Y3678	yes	2	Y	yes
O3595	BB3545	yes	1	BB	yes
O3744	CC3612	yes	1	CC	yes
P3768	O3744	yes	1
Q3668	J3652	yes	1	J	yes
Q3635	O3384	yes	2
Q3205	O3199	yes
R3605	Y3285	yes	1
S4061	D4037	yes	1	D	yes
S4064	T3730	yes	1	T	yes
T3730	D3710	yes
T3893	D3887	yes
T3911	D3901	yes
T3886	D3879	yes	5
T4042	D4037	yes
T4001	E3969	yes	1
T3885	H3871	yes	1
T3755	O3753	yes
T3754	O3747	yes	2
T4089	S4064	yes	1
T3737	W3734	yes	2
T3752	W3750	yes
U3204	O3199	yes	1
V4178	T3893	yes	1
W3455	H3443	yes	1	H	yes
W3750	O3746	yes		O	yes
W3734	O3674	yes	2	O	yes		Q
W3381	Y3335	yes	1
X3337	Y3335	yes	1
Y3335	C3327	yes	1
Y3705	O3695	yes	1
Y3678	R3605	yes	1
Y3285	U3204	yes	1
Z3924	O3384	yes	2	O	yes
Z3927	O3199	yes		O	yes
Z3926	Q3635	yes	1	Q	yes
AA3293	A3284	yes	1
AA3365	Y3335	yes	1
BB3609	H3443	yes	2	H	yes
BB3774	H3742	yes		H	yes
BB3604	O3595	yes		O	yes
BB3545	O3199	yes	2
CC3873	D3713	yes	1
CC3612	O3199	yes
CC3222	O3199	yes	2
		100% reliability	78	total explicitly named	32 (41.03%)	total other named	4 (5.13 %)
				total right name	31 (39.74%)	% other/32	12.50%
				%right name/32	96.88%
				total wrong person	1 (1.28%)
				% wrong name/32	3.13%

Column 3 of Table 2 shows that there is 100% reliability in the congruence of name pairs in the SNA tool and the id pairs in the html.

Validity

How sure are we that the person being resplied to as indicated in the html ids of the parent post is the actual person addressed in the content?

In Column 4 of Table 2 32 (41.03%) of 78 replies explicitly named the person being addressed. 31 (39.74%) is congruent with the parent post id, while 1 (1.28%) addressed a name that is not in the discussion. But looking at the context, the person who is addressed ignored the wrong name and replied. So I interpret it as an honest mistake.

If the sample n (32) is used then 96.88% of the names are correct while 3.13% is incorrect.

Lost Information

What about people who were addressed in the same post but were not reflected in the parent id. In Column 6 of Table 2, 4 (5.13%)post of 78 had two persons addressed in the same posts. This is 12.5% of all explicitly named replies (32) for this sample thread.

Another information lost is which messages are replies to which particular messages. Since the vertices in the SNA Tool represent actors, we can see from the tables that all similar ties, and all posts by the same actor are aggregated.

The SNA Tool have actors as vertices and would produce the following graph for this thread.

Figure 2: Pajek graph of SNA Moodle tool data

While making the particular posts as nodes and the actor id as partition will produce this graph.

Figure 3: Pajek output of posts from html discussion page

This last look to me like a communication graph, and not a social network.

Both graphs are less rich in data as opposed to what has been tried here: http://www.flickr.com/photos/37794987@N00/2854506800/ (x28x28de, 2008). This appears to me an overlay of an SNA graph (head icons) and a communication graph (vertices). This also has a time element. But it is difficult to see the connections at an instant with this approach. Perhaps if it was in 3d it would be better.

Dataset for graphs (33.09 kb) : http://www.mediafire.com/file/dygmojkmqyn/verifySNAtool1.zip

Plan

This only outlines the approach. To generalize, I am going to use stratified random sampling (Best & Kahn, 1989). I would need 99 sample discussion threads (groan) to be subjected to this approach.

I will check the number of posts and the name pairs. Also, the distribution of the proportions of verified names in the content, wrong names, and lost ties (others addressed in the content) will be analyzed across the sample.

My concern is how to test for statistical significance of these distributions of proportions? I wonder if a z-test will do?

References

Bakharia, A. & Dawson, Shane. Moodle SNA Analysis [Javascript program]. In Blackboard and WebCT - forum social network Analysis Tool. Random Syntax [Blog]. Retrieved July 12, 2009, from http://www.randomsyntax.com/blackboard-forum-social-network-analysis/.

Best, J.W., & Kahn, J.V. (1989). Research in education (6th ed.). New Jersey: Prentice-Hall, p.14 .

x28x28de. (2008, September 13). A centralized forum discussion. Retrieved July 20, 2009 from http://www.flickr.com/photos/37794987@N00/2854506800/.

Pederick, C. Web Developer 1.1.8, Firefox browser extension. [Software]. Available at https://addons.mozilla.org/en-US/firefox/addon/60. Developer homepage at http://chrispederick.com/work/web-developer/.