Wednesday, November 14, 2007

Status report for 11/08/2007 to 11/14/2007

I've been reading a paper about ranking in folksonomy, called FolkRank, proposed in Trend detection in folksonomies, Information Retrieval in Folksonomies: Search and Ranking.

FolkRank is based on well-known PageRank algorithm used in Google (The Anatomy of a Large-Scale Hypertextual Web Search Engine and The PageRank Citation Ranking: Bringing Order to the Web).

In a nut shell, the basic idea of PageRank is that a page is important if there are many pages linking to it. This can be formulated by the following matrix equation:

R(u) = c \sum R(v)/N or R(u) = c \sum R(v)/N + cE(u)

where

u is a web page and v is a page referring u,

R(u) and R(v) are PageRank of u and v repectively,

N is a number of all links in page u, and

E(u) is called a source of rank.

 

Folksonomy can also be represented as a graph by using the similar concept of PageRank but a few exceptions:

  • Folksonomy graph is undirected, while web graph in PageRank is directed.
  • Nodes are heterogeneous and the graph is triadic, while in PageRank the nodes are homogeneous.

I think it's a good start to begin with applying PageRank algorithm to folksonomy recommendation system. For this purpose, I'm planing to dig into more details on the PageRank algorithm and find some sample code for better understand.

Thursday, November 08, 2007

Status report for 11/01/2007 to 11/07/2007

1. I made a reading list about folksonomy at my CGL blog or del.icio.us. I will update the list frequently by adding related articles.

2. Characteristics of Folksonomy Network

The folksonomy networks show the following characteristics:

a. Small-world Network

Almost similar with a random network but having much larger clustering co-efficient factor. I.e., Small shortest path but a larger clustering coefficient.

b. Scale-free Network

A scale-free network is a network in which any two nodes can be connected no matter what the system size is. This is because there is a node called "hub"which is a highly connected node than any others. In folksonomies, popular keywords can act like this hub and thus the network shows scale-free network properties.

A scale-free network shows the following rules:

(1) Power law : Degree distribution follows the Yule-Simon distribution, which is called a power law: P(k) ~ k^(-r), where k is a degree of connectivity of a node and P(k) is it's probability

(2) Preferential Attachment : A way to build a scale-free network. The idea is to make a connection with a more connected node.

3. Next step

a. Build a simple graph by using CITEAM tag data

b. Find some algorithms about recommendation

Sunday, November 04, 2007

Reading list about folksonomy analysis

(Also available at http://del.icio.us/yyalli/folksonomy)

Formal Model, Graph

Semantic Web

Pattern, Structure

Recommendation, Ranking, Trend

Tag Generation

PageRank

Clustering (referred from Filippo Menczer's class material)

Learning and Classification (referred from Filippo Menczer's class material)