Open Educational Tools: Limitations of the data sets for CCK09 forums in this blog

Friday, December 31, 2010

Limitations of the data sets for CCK09 forums in this blog

I'm already halfway in completing the data processing of the LMS forum network of the Connectivism and Connective Knowledge 2009 (CCK09) course. I thought it wise to remind readers of the limitations of these data sets.

The CCK09 forum datasets presented in this blog only show a small part of the learning environment in CCK09. Other participants had opted to centralize their personal learning networks in other social networking sites like blogs, Twitter, Facebook and Second Life. I am not aware of the visualization/research on the learning network in those sites.

In addition there are limitations to the machine process of data gathering the ties (links, lines, or network data) in this visualization. I used the Moodle Forum SNA Tool rather than the newer version called SNAPP because the latter is slower due to graphing code transfered over the net. This application merely picks up tie data from "replies to" posts and not necessarily to actors. What I mean is that it fails to take into account replies to more than one person that is indicated in the body of the post. Clicking the "reply to" button in the Moodle forum, tags the post but there is no facitlity to explicitly state to whom you are replying to. If the Moodle forum would allow a dropdown list of posters (more than one of course) then a machine process may be able to pick up whom the poster intends to reply to.

Processing the body of the post will increase the complexity of data gathering. Even in current text analysis software that I've tested, human intervention in the form of scrubbing (editing the text for machine processing) is applied. There is a lot of variability in raw data. Posters usually use only the a part of the full name or even a nickname that is different from the username of the person they are talking to. I've also seen posters mistake the name of the one they're talking to, but the person being addressed simply ignored the mistake and replied appropriately. There is also the problem of language. People may use a different language, or they may not be as competent in the English language as native speakers.

Of course I could always fall back on human coding but I simply don't have the time, and I can't afford to hire people. So when you use these data sets always take into consideration the probability of lost ties.