My problem is that I have to anonymize the subset forum level networks by using as a lookup table from the union of all networks vertex labels, that was anonymized in Pajek. If I anonymized all the forum network files in Pajek, then they will not be comparable because Pajek will renumber then from 1 to n. I need them to be comparable so as to track ego's in each forum. Ex.
original file
"Roel Cantada" "Juan dela Cruz"
...
lookup table
"Roel Cantada" "v1"
"Juan dela Cruz" "v2"
...
target anonymized file
"v1" "v2"
I couldn't find a tool for the purpose. rpl appears to me to require me to input 537 codes one by one. So I ended up writing a python script. In the script, anonall is the lookuptable, origfile the vna file to convert, and newfile is the anonymized text file. The python file needs to be in the same folder as the VNA files and it requires manual input of the names of the VNA files. In addition the output needs cleanup of double quotes of the headers of the VNA. But it's better than manually anonymizing all the network files. Here it is.
Python file: http://www.mediafire.com/file/3ttzggdnmtk/anonymizecck08.py.zip (0.43KB)
import csv
origfile = raw_input('csv filename to anonymize: ')
newfile = (origfile + 'new.txt')
table={};
anonall = csv.reader(open('anonall_code.csv'), delimiter=' ', quotechar='"');
forum1 = csv.reader(open(origfile), delimiter=' ', quotechar='"');
output = open(newfile,'a');
for row in anonall:
table[row[0]] = row[-1]
for row in forum1:
for i,j in table.iteritems():
if i in row[-4]:
row[-4] = j
for i,j in table.iteritems():
if i in row[-3]:
row[-3] = j
output.write('\"'+row[-4]+'\" \"'+row[-3]+'\" \"'+row[-2]+'\" \"'+row[-1]+'\"\n')
output = open(newfile,'a')
I'll be sharing the VNA files for the individual forums within this week.
No comments:
Post a Comment