Tuesday, May 6, 2008

Folksonomies - Collaborative Tagging

This posts explains the user-defined collaborative tagging and how it is applied to organize and share information. And it will also provide ideas around the information classification to achieve search and retrieval faster and more efficient and relevant.

Metadata by definition is "data about data", basically adding more intelligence to data and establishing better relation between pieces of information. Metadata is usually created either by professional taxonomy experts or content authors. First method is very expensive since it involves formal taxonomies and controlled vocabularies. Besides it's even impossible when you have large legacy data which is yet not tagged. Second method overloads the authors with additional tasks and requires an understanding of metadata tags. These approaches are still disconnected with the end users because it does not understand the intent and context of the end user. Though the methods help in searching the relevant information faster and easier through search and taxonomy navigations.

Third form of tagging is called user created collaborative tagging i.e. folksonomies. It is also call community classification of information another aspect of social networking. Google has used an approach of collaborative classification in page ranks using PageRank algorithm. The number of links pointing to a web page allows Google to optimize its search relevance and rank the page appropriately. It is another form of implicit user created collaborative classification of information. Amazon has used customer reviews effectively to add intelligence to catalog, another implicit form of classification.

Social Bookmarking employs explicit form of user created collaborative tagging. Sites like Del.icio.us, Digg, StumbleUpon etc. provides ability to bookmark your urls or sites and create additional tags to the urls. These tags are created by end user in context with the intent and usefulness of the site. They also allow users to describe and organize content with any vocabulary they choose. It is completely disconnected with the owner and author of the information. In addition to automatically generated chronological ordering of bookmarks saved to the system, the tags are used to navigate the bookmarks within a user’s collection. Additionally, these tags are also used to collocate bookmarks across the entire system, so for example, looking at the page http://del.icio.us/tag/web2.0 will show all bookmarks that are tagged with “web2.0” by any user. Functionality of these social bookmarking sites varies from one another but the basic idea is the same, ability to tag the urls and be able to share across the system.

Although folksonomy is not a controlled vocabulary, and does have limitations, there have lot of advantages that bring lot of value in sharing, collaboration and social networking.

Finding the information faster - In order to find the a relevant content, one has to browse the websites or search through the web search engines. At times exploring the bookmark tags in the social bookmarking sites, one can find many recent resources from a wide variety of authors and sites that likely would never have been visited before. There is a fundamental difference in browsing the tags to find interesting content, as opposed to searching to find relevant documents in a query. The other users have found these content items or sites relevant and useful, so it is higher in relevance optimization.

User centric vocabulary - most important strength of a folksonomy is that it directly reflects the vocabulary of end users. End user can tag the information based on intent and context of the information, not based on the intent of author. In fact it is not derived from taxonomy expert or intellectual property producers or information owner, but from the consumers of the information. In this way, it directly reflects their choices in terminology and language.

Limitation of folksonomies are mostly user centric and system confined, and by no means limits the use of the system.

Uncontrolled vocabulary - Ambiguity of the tags can emerge as users apply the same tag in different ways. There are no explicit systematic guidelines and no scope notes.

Structure of tags - structure is usually single word with no spaces, sometimes makes it difficult to merge various tags. For example web2.0, web2, web20 means the same tag but would show as separate taxonomy nodes.

Semantic tags - There is a limitation of synonym words in the system as tags are user generated. For example, web and www should be classified under same tag.

Social Bookmarking sites have extended functionalities of folksonomy with social aspects like commenting, rating, community building etc. In Digg, one can start a feedback loop on tagged content and also see who all have tagged this content. In addition, communities can be build using data collected on the social bookmarking sites.

The folksonomy can be used in enterprise 2.0 to supplement existing taxonomies and provide additional access to materials by encouraging and leveraging explicit user generated tags. If enterprises begin to incorporate user-centric information management systems, the folksonomies developed by the users have great value in information sharing and retrieval systems.

No comments: