Warning: file_exists() [function.file-exists]: open_basedir restriction in effect. File(/home/infospac/public_html/wordpress/wp-content/plugins//../../../../../../../../../../../../../../../../../tmp/sess_4eaa6211e6625b852d1d3756afe2bf47.txt) is not within the allowed path(s): (/home/infospac/:/usr/lib/php:/usr/local/lib/php:/tmp) in /home/infospac/public_html/wordpress/wp-settings.php on line 114
InfoSpaces » Blog Archive » Hierarchical taxonomies from flat tag spaces

Hierarchical taxonomies from flat tag spaces

Paul Heymann and Hector Garcia-Molina (Department of Computer Science, Stanford University) have recently published the paper Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems.

Paul is a PhD student doing research on how to recreate hierarchical taxonomies from flat tag data, moving from the idea that this kind of structure is already implicitly (on different degrees) contained tagging systems:

…hierarchies are indisputably useful for a major type of information retrieval task: browsing. When we do not know exactly what we are looking for, it is much easier to be able to broaden and narrow our area of interest than to perform some sort of random walk from idea to idea. The top few categories of a traditional hierarchy give us a much better idea of the contents of a media collection than thousands of individual tags, even if these tags are ranked by their frequency in the collection.

…….We are working to make these [tagging] systems better by automating production of hierarchical taxonomies that describe the data from the raw flat tags generated by users.

This is a small portion of a sample hierarchy extracted from a del.icio.us data set through the proposed algorithm:

Hierarchical Tags


complete picture here

While other studies exist about the inherent hierarchical relationships between tags in folksonomy based systems (see for example Hierarchical Subject Relationships in Folksonomies) and about how to elicit these structures through hierarchical clustering algorithms (see Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering), the results don’t seem to be so exciting.

This new paper investigates a different approach to hierarchical relationships elicitation to improve the ease of navigation of folksonomies making use of cosine similarity graphs and fast approximation of graphs generality. The proposed algorithm works better when the tag set has higher values for:

  • Density (the frequency with which users annotate objects)
  • Overlap (the frequency with which users are annotating the same objects as one another)

Low density and low overlap imply the need of a larger sample dataset to let the algorithm working effectively. It’s curios to note that this is the case for CiteULike, while Del.icio.us shows a higher density and overlap probably due to its different users’ profile.

The algorithm is based on three assumptions:

  • Hierarchy representation (the hierarchy is reassumed by the similarity graph)
  • Noise assumption (there alwasy noisy connections between unrelated tags)
  • General-general assumption (noisy connections are more common higher up in the hierarchy)

Finally researchers seem to start understanding the righ direction for tagging evolution ;) but I’m not so sure that algorithms can effectively replace the users work..


In the same category:

Leave a Reply