Taxonomy in SharePoint – lessons learned from real-world deployments along with some tips, tricks and gotchas when mixing taxonomy with search…
On 21st May 2008 I presented to an audience of taxonomy professionals within the UK public sector. The last session of the day, I had 30 minutes to present on “Taxonomy within Microsoft Office SharePoint Server 2007 (MOSS): Lessons learned from real-world deployments”.
My aim was to briefly explain what MOSS can and cannot do with taxonomy and provide a few tips on how to leverage MOSS taxonomy features to improve information findability. The session generated quite a bit of note-taking and debate. Here are the slides:
Key messages from the presentation:
- MOSS uses elements of taxonomy to improve search and navigation. The core feature is ‘columns’, used for metadata. Case study: a tag-driven user interface created for the New Zealand Ministry of Transport. A great end result but a lot of effort required to implement and maintain
- MOSS does not (yet) provide taxonomy management tools. Taxonomy management is about defining and managing schema(s), and classifying content agains those schemas
- Taxonomy is not the holy grail. Schemas need to continually evolve to be effective. Often there is a disconnect between the language used by those creating the schema and those looking for information that the schema is for. This perhaps explains why folksonomies have achieved more success than official taxonomies, but…
- User tagging is less accurate or consistent than automatic classification. Comment from Google founder Sergey Brin: Semantics and tagging are great as long as computers are doing it [not people].” Automatic classification is by no means perfect either. Accuracy rarely exceeds 70% – lots of development going on to improve this
- 4 tips to improve the use and value of taxonomy within MOSS today:
- Where possible, define columns at the site collection level, not per library. Do it per library and each instance will be treated as separate crawled property in the index. By managing per site collection, you can also control what values can be entered, improving consistency across sites and libraries
- Avoid using sites and sub-sites to mimic file structures (popular when creating file plans). One of the relevance algorithms is URL depth. The deeper the URL, the less relevant and you don’t want empty sites returned in search results. Alternative approach: create a link-driven UI to mimic the file plan but apply it using columns and store content in as few sites as possible
- Check out your sources. When indexing content, if one source has a lot more metadata than others, it can dominate search results. A common issue for mergers and acquistions, or re-orgs within government. Solution: split the index and/or use federated search
- Maximise the effectiveness of automatic metadata, such as titles and descriptions. Avoid bland document titles (e.g. ‘meeting notes’ x 50) and irrelevant link titles (‘Click here’ versus a title that describes where the link takes you)
- Most likely scenarios to want to go beyond MOSS are concept-driven search and automatic classification. You can use bespoke code and lightweight tools like the Faceted Search tool on Codeplex. But it is usually better to engage a partner.
- Final case study: legal firm – lots of taxonomy but just getting search up and running was a big win. People found it easier to find information using basic search than the formal navigation structures created by file plans…