Friday, August 28, 2009

Paul Rosenbloom explains SOAR

If you can ignore the trial version text overlayed in this video, it is a good intro to SOAR.



And another video where John Laird talks about Soar's extensions e.g. declarative memory, emotions, and reinforcement learning.

Wednesday, August 26, 2009

ORA report on the Union of All Social Networks in CCK08 Moodle Forums

ORA ALL-MEASURES REPORT ON THE UNION OF ALL SOCIAL NETWORKS IN THE CCK08 MOODLE FORUMS

I can't really interpret all of these values because I haven't gotten around reading the ORA manuals yet. Also, the reader should take note that ORA is a "risk assessment tool for locating individuals or groups that are potential risks given social, knowledge and task network information. (CASOS, http://www.casos.cs.cmu.edu/projects/ora/)"

Download report in html & text format: http://www.mediafire.com/file/ifxy2zmmizm/cck08_ora_report.zip

Input data: Meta Network

Start time: Tue Aug 25 21:03:04 2009

COMMUNICATION RISK



The risk based on the level of communication and the authority structure of the organization. Are agents able to effectively communicate to accurately complete tasks? Is communication too centralized or decentralized? Do agents have recourse to managers to settle disputes?

























































Network Level MeasureValue
Average Distance3.18015
Agent x Agent
Clustering Coefficient/Watts-Strogatz0.151525
Agent x Agent
Component Count/Strong253
Agent x Agent
Component Count/Weak128
Agent x Agent
Connectedness0.568498
Agent x Agent
Density0.00808805
Agent x Agent
Diameter537
Agent x Agent
Efficiency0.98408
Agent x Agent
Efficiency/Global0.208385
Agent x Agent
Efficiency/Local0.263409
Agent x Agent
Fragmentation0.431502
Agent x Agent
Hierarchy0.48668
Agent x Agent
Link Count/Lateral1.02105
Agent x Agent
Link Count/Skip0.901203
Agent x Agent
Network Centralization/Betweenness0.0695023
Agent x Agent
Network Centralization/Closeness0.00309231
Agent x Agent
Network Centralization/In Degree0.53284
Agent x Agent
Network Centralization/Out Degree0.523494
Agent x Agent
Network Centralization/Total Degree0.529156
Agent x Agent
Network Levels7
Agent x Agent
Span Of Control12.7104
Agent x Agent
Speed/Average0.31445
Agent x Agent
Speed/Minimum0.142857
Agent x Agent
Transitivity0.217969
Agent x Agent
Upper Boundedness0.977557
Agent x Agent









































Node Level MeasureAvgStddevMin/MaxMin/Max Nodes
Centrality/Betweenness0.001618910.006348680278 nodes (51%) have this value
AA:Agent x Agent

0.070992v481
Centrality/Closeness0.00402850.001776810.0018622202 nodes (37%) have this value
AA:Agent x Agent

0.00557034v409
Centrality/Eigenvector0.00186220.007437110132 nodes (24%) have this value
AA:Agent x Agent

0.0811172v67
Centrality/In Degree0.01479340.05006080161 nodes (29%) have this value
AA:Agent x Agent

0.546642v67
Centrality/Information0.001862190.001478530202 nodes (37%) have this value
AA:Agent x Agent

0.00303777v80
Centrality/Inverse Closeness0.1383620.1176610202 nodes (37%) have this value
AA:Agent x Agent

0.374534v67
Centrality/Out Degree0.01479340.04833580202 nodes (37%) have this value
AA:Agent x Agent

0.537313v67
Centrality/Total Degree0.01479340.04851650123 nodes (22%) have this value
AA:Agent x Agent

0.541978v67
Clique Count8.0465530.98610292 nodes (54%) have this value
AA:Agent x Agent

329v481
Clustering Coefficient/Watts-Strogatz0.1515250.2253440289 nodes (53%) have this value
AA:Agent x Agent

114 nodes (2%) have this value
Constraint/Burt0.2128820.2916540202 nodes (37%) have this value
AA:Agent x Agent

148 nodes (8%) have this value
Effective Network Size/Burt7.7670926.04720200 nodes (37%) have this value
AA:Agent x Agent

293.096v67
Node Levels3.428312.80180202 nodes (37%) have this value
AA:Agent x Agent

730 nodes (5%) have this value
Simmelian Ties0.002939210.009699060431 nodes (80%) have this value
AA:Agent x Agent

0.0783582v67


CRITICAL EMPLOYEE RISK



The risk based on employees having exclusive knowledge, resources access, or task assignments. Would the removal of one or two employees from the organization greatly affect its ability to complete tasks? Do employees tend to have exclusive access to knowledge or resources?









Network Level MeasureValue
Fragmentation0.431502
Agent x Agent





























Node Level MeasureAvgStddevMin/MaxMin/Max Nodes
Boundary Spanner0.1061450.3080230480 nodes (89%) have this value
AA:Agent x Agent

157 nodes (10%) have this value
Cognitive Demand0.008072990.01700320202 nodes (37%) have this value



0.143389v67
Constraint/Burt0.2128820.2916540202 nodes (37%) have this value
AA:Agent x Agent

148 nodes (8%) have this value
Effective Network Size/Burt7.7670926.04720200 nodes (37%) have this value
AA:Agent x Agent

293.096v67
Interlockers0.05772810.2332290506 nodes (94%) have this value
AA:Agent x Agent

131 nodes (5%) have this value
Radials000all nodes have equal value
AA:Agent x Agent

0
Shared Situation Awareness0.00186220.01243320241 nodes (44%) have this value



0.164438v391
Triad Count25.070892.84830281 nodes (52%) have this value
AA:Agent x Agent

844v391


PERFORMANCE RISK



The risk based on ability to complete tasks accurately. Is the organization able to complete all tasks? How well does the organization build consensus? How many tasks would be left undone if a single employee were selected for removal?







Network Level MeasureValue













Node Level MeasureAvgStddevMin/MaxMin/Max Nodes


PERSONNEL INTERACTION RISK



The risk based on agent communication, either agents communicating who should not be, or vice-versa. Are agents with similar skills interacting? Are agents with complementary skills interacting? Are there groups of agents communicating in unexpected ways?









Network Level MeasureValue
Fragmentation0.431502
Agent x Agent





















Node Level MeasureAvgStddevMin/MaxMin/Max Nodes
Centrality/Betweenness0.001618910.006348680278 nodes (51%) have this value
AA:Agent x Agent

0.070992v481
Centrality/Closeness0.00402850.001776810.0018622202 nodes (37%) have this value
AA:Agent x Agent

0.00557034v409
Centrality/Total Degree0.01479340.04851650123 nodes (22%) have this value
AA:Agent x Agent

0.541978v67
Component Members/Weak16.469332.56981405 nodes (75%) have this value
AA:Agent x Agent

128v98


REDUNDANCY RISK



The risk based on redundancy in task assignments, resource access, and knowledge access. An organization with little redundancy is more adversely affected by an agent or resource no longer being available. On the other hand, too much redundancy makes an organization inefficient.









Network Level MeasureValue

















Node Level MeasureAvgStddevMin/MaxMin/Max Nodes
Constraint/Burt0.2128820.2916540202 nodes (37%) have this value
AA:Agent x Agent

148 nodes (8%) have this value
Effective Network Size/Burt7.7670926.04720200 nodes (37%) have this value
AA:Agent x Agent

293.096v67


RESOURCE ALLOCATION RISK



The risk based on resource allocation on the organization's ability to complete tasks. Is agent workload evenly distributed? Do agents have access to the resources they need to complete tasks? Do agents have access to resources they do not use?







Network Level MeasureValue













Node Level MeasureAvgStddevMin/MaxMin/Max Nodes


SLOW MEASURES



These are measures that are computationally and/or memory intensive.















Network Level MeasureValue
Average Distance3.18015
Agent x Agent
Hierarchy0.48668
Agent x Agent
Network Centralization/Betweenness0.0695023
Agent x Agent
Upper Boundedness0.977557
Agent x Agent





































Node Level MeasureAvgStddevMin/MaxMin/Max Nodes
Centrality/Betweenness0.001618910.006348680278 nodes (51%) have this value
AA:Agent x Agent

0.070992v481
Centrality/Eigenvector0.00186220.007437110132 nodes (24%) have this value
AA:Agent x Agent

0.0811172v67
Clique Count8.0465530.98610292 nodes (54%) have this value
AA:Agent x Agent

329v481
Cognitive Distinctiveness0.0154410.0157430.00808805202 nodes (37%) have this value



0.143837v67
Cognitive Expertise0.007773810.000604250.00425421v391



0.00811253v60
Cognitive Resemblance0.9845590.0157430.856163v67



0.991912202 nodes (37%) have this value
Cognitive Similarity0.01174150.01451840212 nodes (39%) have this value



0.0556525v476
Correlation/Distinctiveness0.0154410.0157430.00808805202 nodes (37%) have this value
AA:Agent x Agent

0.143837v67
Correlation/Expertise0.007773810.000604250.00425421v391
AA:Agent x Agent

0.00811253v60
Correlation/Resemblance0.9845590.0157430.856163v67
AA:Agent x Agent

0.991912202 nodes (37%) have this value
Correlation/Similarity0.01174150.01451840212 nodes (39%) have this value
AA:Agent x Agent

0.0556525v476
Shared Situation Awareness0.00186220.01243320241 nodes (44%) have this value



0.164438v391

TASK RISK



The risk based on task precedence and task assignment. Do agents have the resources to complete their tasks? Are tasks highly interdependent so that the inability to perform one task prevents many other tasks from being done?










Network Level MeasureValue













Node Level MeasureAvgStddevMin/MaxMin/Max Nodes




Node-level Tables




For each node class a file is made with tables for the measures. If there are no node-level measures of a particular type, then no file is generated.

Agent-level measures saved to: /root/All Measures_Agent-level-measures.html

Produced by ORA developed at CASOS - Carnegie Mellon University

Tuesday, August 25, 2009

Simulated Students: Pinocchio goes to school.

What is simulation?

Simulation is a way of understanding the world by building a simpler model of a complex structure or system being studied. (Gilbert & Troitzsch, 2005). It is great for exploring and developing theories about social processes including education.

What are simulated students?

Simulated students are machine learning systems whose behavior is consistent with data from human students. It can be used by teachers to practice their tutoring skills. It can be used as a collaborative learner that can shift from novice to expert as the need arises. It can be used by instructional designers to test their instruction. (VanLehn, Ohlsson, & Nason, 1994)

There are some researches on these three uses, and they will be noted in the following section.

Researches on simulated students

Simulated students for teacher training

Zibit and Gibson reported an on-going project called simSchool which aims to train novice teachers in teaching 7th-12th grade students. (2005) Their simulated students appear to rely on a database of data gathered from real students rather than a learning algorithm. Interaction with the simulated students involves a 2-D animated interface wherein what needs to be said by the teacher to the students is selected from a menu.

Simulated students as a collaborative learner

Vizcaino created a simulated student that can chat with two other students studying computer programming. The real students did not know which one was the simulated student. What the simulated student have to say is taken from a database rather than using natural language processing. They wanted the simulated student to intervene in the discussion to solve the problems of off-topic conversations, students with passive behaviour and problems related to students' learning. (2005)


Simulated students for instructional design

Mertz used a Soar simulated student. Then it is given training in how to assemble a circuit board. This training is iteratively done, and the lesson is refined until the simulated students learn better. Thereby the lesson is made better each time. (1997)

These researches show that the use of simulated students in improving teaching and learning is not in the realm of science fiction. We can expect it to develop even further in the future and probably even to become mainstream.


References:

Gilbert, N., & Troitzsch, K.G. (2005). Simulation for social scientist. (2nd ed.). England: Open University.

Mertz, J.S. (1997). Using a simulated student for instructional design. International Journal of Artificial Intelligence in Education, 8, 116-141 [Electronic version]. Retrieved August 7, 2009, from http://hal.archives-ouvertes.fr/docs/00/19/73/84/PDF/mertz97.pdf.

VanLehn, K., Ohlsson, S. and Nason, R. (1994). Applications of simulated students: an exploration. Journal of Artificial Intelligence in Education, 5(2), 135-175. Retrieved August 8, 2009, from http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2850F00F49F6DCE6F5E8F657D6EAE9C6?doi=10.1.1.4.6200&rep=rep1&type=pdf .

Vizcaino, A. (2005). A simulated student can improve collaborative learning. International Journal of Artificial Intelligence in Education, 15, 3-40 [Electronic version]. Retrieved August 7, 2009, from http://ihelp.usask.ca/iaied/ijaied/members05/archive/Vol_15/Vizcaino/Vizcaino05.pdf.

Zibit, M., & Gibson, D. (2005). simSchool: The game of teaching. Innovate, 1 (6). Retrieved August 17, 2009, from http://www.innovateonline.info/index.php?view=article&id=173.

Monday, August 24, 2009

Simulation of July 22, 2009 Solar Eclipse with Stellarium

I should have done this last July 22 but did not have the time. Since we in the Philippines was only going to see a partial eclipse I wanted to find out how Stellarium will simulate the solar eclipse.

I tried it on a Stellarium 0.9.1 on an Ubuntu Intrepid. (Stellarium 0.10.2 has an eclipse simulation! I should try that when I can find an ubuntu package.) The location setting is 89 degrees 40 minutes and 53 seconds east longitude, and 25 degrees 45 minutes north latitude. I could not change the seconds setting so this is an estimate of Kurigram, Bangladesh. It is where this Wikipedia photo of the solar eclipse is taken.



The time setting is between 7:00 A.M. and 11:00 A.M. Philippine time, because it was my locale.

Here are the videos captured with gtk-recordMyDesktop. I wanted to capture it until the moon separates from the sun but it seem to have gotten cut towards the end by gtk-recordMyDesktop.

without atmosphere


with atmosphere


Although I don't teach natural science I think this would be cool to use as an illustration since you can repeat it over and over again. You can also look at it from different locations.

Links

Solar eclipse scripts for Stellarium: http://www.stellarium.org/wiki/index.php/Scripts

Sunday, August 23, 2009

Hack Bandwidth Adaptivity for Moodle using the Multi-Language Content filter

Aptivate's "Web Design for Low Bandwidth" states that a web page should have about 25 - 75 KB of content otherwise the user will abandon the download of the page. Of course it would be difficult to estimate this, that is why my hack leaves it to the user to determine which version of the content to use. But the instructional designer should also keep in mind the overhead of the logo, headings and blocks of a moodle page. Perhaps this approach should be combined with an appropriate low bandwidth user selected theme. It is up to the designer to determine what are the cutoff bandwidths for low, mid, and high bandwidth content.

I developed this hack because I am thinking about a mix set of students coming from around the globe; specially in developing countries where bandwidth is still low.

This hack is for Moodle administrators who know where and what the moodle data folder is. You have to have access to the moodledata/lang folder to perform this hack i.e. if online you should have a Cpanel, shell or ftp access to the Moodle site. Try it first in a local test server before performing in a production server. There are no moodle files to be replaced, it only involves installing language packs which should be safe to do.

Download the language files: http://www.mediafire.com/file/fg4tjtdun1n/mdlbandwidth.zip (3.4 KB)

These language files contain only a moodle.php, and a language.php file with parent language set to en_utf8, the default. The only relevant string customized in both files was:

 $string['thislanguage'] = 'High Bandwidth'; 

Unzip the archive mdlbandwidth.zip to your lang folder in the moodle data folder which is usually moodledata/lang/. If you want to be able to edit these language files change the owner to the web server. In Ubuntu's Apache server this is www-data so:

open a terminal
cd to the moodledata/lang folder then,
sudo chown -R www-data:www-data *


After copying go to Moodle web site. You should see the languages Low Bandwidth (loband), Mid Bandwidth (midband) and High Bandwidth (hiband) listed in the language dropdown box. If you're using language caching it make take a while before the list gets updated.

Login as administrator. In Site Administration block ->Modules->Filters->Manage filters. If the Multi-Language Content filter is disabled (grayed out and the eye is closed), click on the eye to enable it (eye will open).



WARNING: Make sure the Multi-languge Content filter is above the Multimedia Filter if you are going to include multimedia files in the body of the content. Or make sure that the Multi-Language Content filter is above all other filters that you will use in the body of the content. Otherwise all the multimedia files will be displayed at the same time.





Go to your course and create a web page. Turn on editing and then Add a resource...->Compose a web page. Click the [<>] button in the HTML editor to enter code editing mode. Then type the following:

<span class="multilang" lang="XX">your_content_here</span>
e.g.

<span class="multilang" lang="loband">Low Bandwidth Content</span>

<span class="multilang" lang="midband">Mid Bandwidth Content</span>

<span class="multilang" lang="hiband">High Bandwidth Content</span>

Because you can only change the language setting in the front site page, it would be good practice to put links to the 3 types of content as a lead list to serve as navigation aid. Add &lang=loband_utf8, &lang=midband_utf8, &lang=hiband_utf8 to the web page url e.g. let's say the url is:

http://localhost/moodle/mod/resource/view.php?id=329

then the 3 links should be:

http://localhost/moodle/mod/resource/view.php?id=329&lang=loband_utf8
http://localhost/moodle/mod/resource/view.php?id=329&lang=midband_utf8
http://localhost/moodle/mod/resource/view.php?id=329&lang=hiband_utf8

Download file: Template content for bandwidth adaptivity in Moodle in utf-8 encoding (605 bytes)

I tested the setup with the following content:

Low bandwidth: transcript all text pdf 69.5 KB


Mid bandwidth: mp3 audio 5.9 MB


High bandwidth: avi video 95.5 MB


The content for this test is available as a scorm package at: http://matangdilis.moodle4free.com/mod/scorm/view.php?id=10

This hack was tested on a Moodle 1.9.4+ (Build: 20090204) installation in a local PC running in an Ubuntu Intrepid operating system. Sorry I can't show a live demo since I do not have access to the site files in my free online Moodle installation.

Reference:

Aptivate. Web Design for Low Bandwidth, Introduction. Retrieved August 22, 2009, from http://www.aptivate.org/webguidelines/Introduction.html.

Multi-language content. In Moodle Docs. Retrieved August 22, 2009 from http://docs.moodle.org/en/Multi_language_content.

Links:

Download the language files: http://www.mediafire.com/file/fg4tjtdun1n/mdlbandwidth.zip (3.4 KB)

Download file: Template content for bandwidth adaptivity in Moodle in utf-8 encoding (605 bytes)

Moodle forum discussion about this hack: http://moodle.org/mod/forum/discuss.php?d=127619/discuss.php?d=127619

Sunday, August 9, 2009

Links->Concepts Problem

George Siemens asked “what are the implications of people being connected in a certain way? (2009)”. That question is related to a corollary question: How do connections contribute to conceptual formation?

I am currently haunted by this question and I am writing here the leads that I found that may answer this profound problem.

My knowledge of how the brain works is quite superficial. I have always thought that the links are important as indexes for the brain, so that the entire semantic network need not be modeled in long term memory.

Indexing external semantic networks in memory


The indexes are either keywords, url, people you know, that will give you a semantic network. The semantic network's nodes may already exists in long term memory. It is the links that may be weak and even in danger of disappearing (forgetting). The equivalent or similar mental network in memory may be connected by weak links, so there is a need for a trigger to fire these links. That trigger is the indexes which may draw the semantic network structure in working memory and remind the brain of the path of the links to fire.

The question is whether the concepts in the semantic network of the external document is mapped in the declarative (semantic, episodic) or procedural memory in the cognitive architecture. Another is how many times do we need to keep drawing the external semantic network (e.g. asking a friend, rereading a blog) in order to strengthen the links and reduce the centrality or importance of the index.

Of course this is just my guess, I don't know whether this is true, I don't have any idea. But there may be a way to find out.

Simulating the CCK08 Moodle Forums

I am not a neuroscientist and the closest I can study a living brain is an artificial one. Fortunately cognitive scientists have developed cognitive architectures which are agents that tries to model the brain. One of these is Soar. I find Soar fascinating as it's underlying design of working memory is "organized as graph structures in states (Laird, 2008)". I was totally suprised to find nodes and links in their cognitive architecture. So I think it's not far fetched if someone has already found a way to relate this memory network to semantic networks and social networks. It's just a matter of time before I find those papers. Here is how I located the tools used in my on-going study of the CCK08 Moodle forums in Nigel Gilbert's diagram of the logic of simulation as a method (2005).


(based on Gilbert as cited in Gilbert & Troitzsch, 2005, p.17)

Ron Sun's Cascading Levels of Analysis

Although I have not fully read Ron Sun's "Cognition and multi-agent interaction (2006)" I found his idea of cascading levels of analysis providing some methodological meat to my skeleton of a general model of distance learning. I gather that he is advocating the integration of cognitive architectures and agent-based modeling in this book. He further states that "we may view different disciplines as different levels of abstraction in the process of exploring essentially the same broad set of questions (Sun, 2005)". His hierarchy of this different levels of abstractions are as follows:

Sun's Hierarchy of Four Levels

Level Object of Analysis Type of Analysis Model
1 inter-agent/collective processes social/cultural collections of agent models
2 agents psychological individual agent models
3 intra-agent processes componential modular construction of agent models
4 substrates physiological biological realization of models
(Source: Sun, 2006, p.7)

These levels looks similar to what I call modes in my skeletal model. And he further argue that we should engage in cross and mixed level analysis. I won't go any further on Sun's ideas since I still need to read the entire work, but I find it really exciting. His advocated approach may shed light on the problem of how linkages affect conceptual formation.

My skeletal model of distance learning


References

Laird, J.E. (2008, August 27). The Soar 9 tutorial. Part 1. Available in the software package here: http://sitemaker.umich.edu/soar/soar_software_downloads

Gilbert, N., & Troitzsch, K.G. (2005). Simulation for social scientist. (2nd ed.). England: Open University.

Siemens, G. (2009, July 30). Different Social Networks. Retrieved, August 9, 2009, from
http://www.elearnspace.org/blog/2009/07/30/different-social-networks/.

Sun, R. (2006). Cognition and multi-agent interaction. From cognitive modeling to social simulation. Cambridge: Cambridge University.

Wednesday, August 5, 2009

CCK09: CCK08 Moodle Forums' Density of Networks according to Pajek

Forums Density (loops allowed)
Introduction 0.0028287
General 0.0263728
1 0.0345478
2 0.0561677
3 0.1119792
4 0.1186224
5 0.0890023
6 0.0911111
7 0.1264000
8 0.0242215
9 0.1173469
10 0.1041667
11 0.0832653
12 0.0986920
Forums 1-12 0.0346763
Union of All 0.0082464

Tuesday, August 4, 2009

CCK09: CCK08 Forum's Social Networks in Pajek

Download dataset (.net & .vec): http://www.mediafire.com/file/retuy1yn45i/cck08_pajek.zip (51.76 KB)
Open dataset in Pajek.

Figure 1: Introductions


Figure 2: General Discussion


Figure 3: Forum 1, What is connectivism?

Figure 4: Forum 2, Rethinking Knowledge

Figure 5: Forum 3, Networks

Figure 6: Forum 4, History of Learning Networks

Figure 7: Forum 5, Groups and Networks

Figure 8: Forum 6, Complexity and Chaos

Figure 9: Forum 7, Instructional Design

Figure 10: Forum 8, Power, Authority, Control

Figure 11: Forum 9, Changing Roles

Figure 12: Forum 10, Openness

Figure 13: Forum 11, Systemic Change

Figure 14: Forum 12, Next Steps & Research

Figure 15: Union of Forums 1-12

Figure 16: Union of All Forums

CCK09: CCK08 Forum's Social Networks in Netdraw

Download dataset: http://www.mediafire.com/file/myd1zzd1jqy/cck08_netdraw.zip (78.77 KB)
Open dataset with Netdraw.

Figure 1: Introductions


Figure 2: General Discussion

Figure 3: Forum 1, What is connectivism?

Figure 4: Forum 2, Rethinking Knowledge

Figure 5: Forum 3, Networks

Figure 6: Forum 4, History of Learning Networks

Figure 7: Forum 5, Groups and Networks

Figure 8: Forum 6, Complexity and Chaos


Figure 9: Forum 7, Instructional Design


Figure 10: Forum 8, Power, Authority, Control

Figure 11: Forum 9, Changing Roles

Figure 12: Forum 10, Openness


Figure 13: Forum 11, Systemic Change


Figure 14: Forum 12, Next Steps & Research

Figure 15: Union of Forums 1-12

Figure 16: Union of All Forums

Monday, August 3, 2009

Anonymizing CCK08 forum network files

I hit a wall last week when I tried to anonymize the forum level network VNA (Netdraw) files. First I tried converting them to ODS (Openoffice.org Calc files) and trying vlookup, but that only worked with numbers as search for the lookup table. Since I don't have excel I couldn't use VB macros, and I don't know how to create OOO macros. I am not a programmer and I don't know SQL, bash scripting with sed and awk make my hair stand on end. I've forgotten my elementary perl and python as well.

My problem is that I have to anonymize the subset forum level networks by using as a lookup table from the union of all networks vertex labels, that was anonymized in Pajek. If I anonymized all the forum network files in Pajek, then they will not be comparable because Pajek will renumber then from 1 to n. I need them to be comparable so as to track ego's in each forum. Ex.

original file
"Roel Cantada" "Juan dela Cruz"
...

lookup table
"Roel Cantada" "v1"
"Juan dela Cruz" "v2"
...

target anonymized file
"v1" "v2"


I couldn't find a tool for the purpose. rpl appears to me to require me to input 537 codes one by one. So I ended up writing a python script. In the script, anonall is the lookuptable, origfile the vna file to convert, and newfile is the anonymized text file. The python file needs to be in the same folder as the VNA files and it requires manual input of the names of the VNA files. In addition the output needs cleanup of double quotes of the headers of the VNA. But it's better than manually anonymizing all the network files. Here it is.

import csv
origfile = raw_input('csv filename to anonymize: ')
newfile = (origfile + 'new.txt')
table={};
anonall = csv.reader(open('anonall_code.csv'), delimiter=' ', quotechar='"');
forum1 = csv.reader(open(origfile), delimiter=' ', quotechar='"');
output = open(newfile,'a');

for row in anonall:
table[row[0]] = row[-1]

for row in forum1:
for i,j in table.iteritems():
if i in row[-4]:
row[-4] = j
for i,j in table.iteritems():
if i in row[-3]:
row[-3] = j
output.write('\"'+row[-4]+'\" \"'+row[-3]+'\" \"'+row[-2]+'\" \"'+row[-1]+'\"\n')
output = open(newfile,'a')
Python file: http://www.mediafire.com/file/3ttzggdnmtk/anonymizecck08.py.zip (0.43KB)

I'll be sharing the VNA files for the individual forums within this week.
 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.