When James Shaw blogged about this 2005 O'Reilly Emerging Technology Conference presentation recently, I remembered enjoying it when IT Conversations first released it. So with James' fresh recommendation I thought I'd listen to it again; I listened to it a couple of times, actually. I'm down for all things ontological, baby. Did I ever tell you that I have Masters in Library Science? (Okay, only about eight times.) But hey, whether you've got a library degree from an ALA Accredited Institution or you're just interested in how information is organized, I think you'll really enjoy this podcast.
The presentation is titled "Ontology is Overrated: Links, Tags and Post-hoc Metadata," by Clay Shirky. Ontology has several definitions, but the main definition of ontology used by Shirky has to do with entities and their relations in the world, the essence of a thing, its "is-ness." Shirky argues that the attempt itself to create a perfect ontological system is ill-conceived, and that categorization should be organically created from the bottom-up. More on how that works later.
Shirky describes the Periodic Table of Elements as a categorization scheme that comes about as close to perfect by describing the thing's essence as you can come, where you get both "descriptive and predictive value."
This is something James mentioned in his post. The essence of a book is "book." The Library of Congress categorization scheme looks like a high-order, hierarchical view of the world, but the system really exists to optimize linear seek time of books on shelves. The real goal of categorization schemes for things like libraries is to optimize physical storage, not the intellectual capacity of the content. "Ideas can be all over the place; it is the book that has to be in one place." These schemes may look like the organization of ideas, but they are more about the organization of the containers of those ideas. There is no shelf, no physical storage container for ideas.
Yahoo began with categorization, a metaphor for the shelf, and Google came along and said there is no shelf. Google actually came up with a categorization service similar to Yahoo's but removed it because no one was using it.
Those who create the categorization schemes have the power, and can override the users' needs. The search paradigm is the opposite.
You can't collapse many terms without signal loss. "Movie people don't want to hang out with Cinema people."
"The Cataloger looks at the Delicious tag 'to read' in horror. This is context dependent and temporary. But so was East Germany!"
The merging of tags derives something from the content, creating overlap, not synchronization. People who tagged this "one thing" often tagged it "another thing" as well. Users and time are core attributes. With user-generated categorization you can start to do things like inclusion, exclusion, grouping and decay. From user and world will come groups. Signal loss in this approach is from expression, not compression, and is due to the multiplicity of points of view, or people coming to the terms from different contexts. The filtering is done post-hoc. It can be argued that this bottom-up categorization scheme needs an editor, but no, filtering is being done by the public, by the user requests themselves.
If we, from many different points of view, can roll-up terms which have value in aggregate, it should be done without any ontological goals. It is all context-dependent, and will become a radical break from tradition categorization schemes in building new ordering systems around the flexibility of the link.