Wednesday, February 13, 2008

Status report for 02/07/2008 to 02/13/2008

I have been doing tag analysis to find underlying relationships between tags, users, and resources in folksonomies. I think there are possibly two different approaches: i) frequency analysis based on resource vectors spanning over the term space and ii) graph-based analysis based on tag graphs.

In the frequency analysis, we can use Principal Component Analysis(PCA) or Independent Component Analysis(ICA). Similar approaches have been done in the field of IR. An example can be found in here, where PCA and ICA were used for tag-advertising matching.

With PCA and ICA, I produced the following figures:

connotea 

(a) PCA with Tag-graph

connoteaICA

(b) ICA with Tag-graph (number of component = 3)

It is very interesting that we can see some tag relationships (or tag clusters) in both two pictures but reasonable interpretation is not so easy to get. I think it's a good starting point to cluster terms for extract meaningful information. I will keep working on those graph.

As for the next step, I'm going to do analysis with graph-based methods.