Rethinking the fileplan

Perhaps one of the loudest unspoken messages from the SharePoint conference held recently in Seattle was the need for information architects and managers to work more closely with their user interface (UI) and technology-focused counterparts. Thanks to the Internet, we are unlikely to see a downturn in the market for digital information in the foreseeable future. But the methods used to classify, manage and access information are still dominated by techniques taken from the physical world of information – paper and its storage methods: micro (books) and macro (libraries).

Let’s pick on ‘The Fileplan’

A common scenario I see in organisations, especially government ones, is the use of a fileplan to store and access content. Here’s the definition of a fileplan, courtesy of ‘Developing a Fileplan for Local Government‘ (UK) (My comments in brackets):

¨The fileplan will be a hierarchical structure of classes starting with a number of broad functional categories. These categories will be sub-divided and perhaps divided again until folders are created at the lowest level. These folders, confusingly called files in paper record management systems (hence the term ‘fileplan’), are the containers in which either paper records or electronic documents are stored.¨

And why do we need fileplans

¨An important purpose of the fileplan is to link the documents and records to an appropriate retention schedule.¨

Really? Just how many different retention schedules does an organisation need to have? One per lowest-level folder? I doubt that. Let’s create a (very) simple fileplan: Geography – Business Unit – Activity

Taking 3 geographies, 3 business units and 3 activities. These are the folders you end up with:

  • UK/finance/budget/
  • UK/finance/managementaccounts/
  • UK/finance/projects/
  • UK/IT/operations/
  • UK/IT/procedures/
  • UK/IT/projects/
  • UK/Sales/campaigns
  • UK/Sales/products
  • UK/Sales/projects
  • France/finance/budget/
  • France/finance/managementaccounts/
  • France/finance/projects/
  • France/IT/operations/
  • France/IT/procedures/
  • France/IT/projects/
  • France/Sales/campaigns/
  • France/Sales/products/
  • France/Sales/projects/
  • Germany/finance/budget/
  • Germany/finance/managementaccounts/
  • Germany/finance/projects/
  • Germany/IT/operations/
  • Germany/IT/procedures/
  • Germany/IT/projects/
  • Germany/Sales/campaigns
  • Germany/Sales/products
  • Germany/Sales/projects

So we have 27 different locations to cover 3 geographies with 3 departments and 3 activities. Now scale this up for your organisation. How many different folders do you end up with?

The ultimate killer with this scenario? There isn’t any content in the first 2 levels of the hierarchy. You’ve got to navigate through 3 levels before you can even start to find what you are looking for. This is because a librarian approach is used for storing and locating information:

Go upstairs, ‘Technology’ section is on the left, you’ll find ‘Computing’ about halfway along. Third shelf up is ‘Programming Languages’, books organised alphabetically by author…

In the physical world, we can’t do a ‘Beam me up, Scotty!‘ and magically arrive at the shelf containing the book containing the page(s) we want. But in the digital world, we can. If fans of the fileplan designed Google’s navigation, it might look something like this:

And they probably wouldn’t include the search box on the first two pages. Fortunately for everyone who uses the Internet to search for information, Google took the ‘Beam me up, Scotty!’ approach.

The fileplan approach causes problems for everyone. Authors have to find ‘the right’ location to put their stuff. If they are working on anything remotely ambiguous, it is unlikely there will be one clear option. That’s why everyone ends up defaulting to the ‘projects’ folder (‘miscellaneous’ is another popular destination). Search engines that use URL depth algorithms (such as PageRank) struggle to identify relevant content – is the folder ‘Finance’ more important than a document called ‘Finance’ that is two levels deeper in the hierarchy buried under Projects/Miscellaneous? If someone is searching for documents about France, are documents located in the France folder hierarchy more important than documents containing references to France that have been stored in the UK hierarchy? Authors don’t know where to put their stuff, and searchers can’t find it. What about those all important retention schedules? They might be different for different geographies (governments don’t seem to agree or standardise on anything much, globally) but then what? Do all of Finance docs have a different retention schedule to all of IT? Within Finance, do different teams have different retention schedules? (Quite possibly – certain financial documents need storing for specific periods of time). Current solution? Sub-divide and conquer, i.e. create yet another level of abstraction in the fileplan… I have seen solutions where users have to navigate through 6 levels before reaching a folder that contains any content.

So what’s the alternative?

Perhaps a better question would be ‘what’s an alternative?’ The desire to find one optimal solution is what trips up most information system designs. Here are some of my emerging thoughts. If you’ve got an opinion, please contribute in the comments because I certainly don’t have all the answers.

Step 1: Stop thinking physically and start thinking digitally

There are two fundamental problems with the fileplan. First, it originates from the constraints enforced by physical technologies. A paper document must exist somewhere and you don’t want to have to create 100 copies to cover all retrieval possibilities – it’s expensive and time-consuming. Instead, all roads lead to one location… and it’s upstairs, third cabinet on the right, second drawer down, filed by case title. This approach creates the second problem – because content is managed in one place, that one place – the fileplan – must cover all purposes, i.e. storage, updates, retention schedule, findability and access. Physical limits required you to think this way. But those limits are removed when you switch to digital methods. What we need are multiple file plans, each suited to a specific purpose.

Information specialists can help identify the different purposes and different ‘file plans’ required. Technologists need to help create solutions that make it as easy as possible (i.e. minimal effort required) for authors and searchers to work with information and ‘fileplans’. And user interface specialists need to remind everyone about what happens when you create mandatory metadata fields and put the search box in the wrong place on the page…

Digital storage of content should be logical to the creators, because authors ultimately decide where they save their documents. Trying to force them into a rigid navigation hierarchy designed by somebody else just means everything gets saved in ‘miscellaneous’. Don’t aim for a perfect solution. Instead, provide guidance about where ‘stuff’ should go. Areas for personal ‘stuff’, team ‘stuff’, community sites, collaborative work spaces, ‘best practices’ sites. Ideally, you still want to stick to one location. Not because of any resource constraints but rather to avoid unnecessary duplication that can cause confusion. If an item of content needs to appear ‘somewhere else’ then it
should be a link rather than a copy, unless a copy is required to fit a different scenario (e.g. publishing a copy of a case study up onto a public web site, but keeping the original held in a location that can only be edited by authors)

To improve relevance of search results, thesauri and controlled vocabularies can help bridge the language barriers between authors and readers. A new starter might be looking for the ’employee manual’. What they don’t know is what they are actually looking for is the ‘corporate handbook’ or ‘human remains guide’ that may contain the words ’employee’ and ‘manual’ but never together in the same sentence. The majority of search frustrations come from information seekers using a different language to the one used by the authors of the information they seek. Creating relationships between different terms can dramatically improve relevance of search results. Creating tailored results pages (a mix of organic search results and manufactured links) can overcome differences in terminology and improve future search behaviour.

And the elephant in the file system – retention schedules. First identify what retention schedules you require to comply with industry regulations and to manage legal risk. And do they apply to all content or only certain content? (I doubt many government organisations have kept junk paper mail for 30 years.) And at what point do they need to be applied? From the minute somebody opens a word processor tool and starts typing, or at the point when a document becomes finalised? This is the area that needs most coordination between information specialists and technologists. As we start to move to XML file formats, life could potentially become so much easier for everyone. For example, running scripts to automatically track documents for certain words that give a high probability that the document should be treated as a record and moved from a community discussion forum to the archive. Automatically inserting codes that enable rapid retrieval of content to comply with a legal request but that have no effect on relevance for regular searches.

On the Internet, Google introduced a tag ‘nofollow’ that could be applied to links to prevent the link improving a page’s relevance rank. (PageRank works by determining relevance based on the number of incoming links to a page. If you want to link to a page so that people can look at it but you don’t want the page to benefit from the link in search results, you can insert ‘nofollow’). Maybe Enterprise Search solutions need a similar method. Different indicators for metadata that helps describe content for searches versus metadata that organises content for retention schedules versus metadata that helps authors remember where they left their stuff. And again, XML formats ought to make it possible to automatically insert the appropriate values without requiring the author to figure out what’s needed. The ultimate goal would be to automatically insert sufficient information within individual content items so that requirements are met regardless of where the content is stored or moved to. I email an image to someone and its embedded metadata includes its fileplan(s).

There are lots of ways that technology could be used to improve information management and findability, to meet all the different scenarios demanded by different requirements. But to achieve them requires closer interaction between people making the policies regarding how information is managed, people creating the so-called ‘technology-agnostic’ (in reality it is ‘technology-vendor-agnostic’) file plans to satisfy those policies and the technology vendors creating solutions used to create, store and access the content being created that have to cope with the fileplans and the policies.

The information industry has to move on from the library view of there being only one fileplan. Lessons can be learned from the food industry. There was a time when there was only one type of spaghetti sauce. In the TED talk below, Malcolm Gladwell explains how the food industry discovered the benefits from offering many different types of spaghetti sauce (and why you can’t rely on focus groups to tell you what they want – another dilemma when designing information systems):

Direct link to TED talk (in case video doesn’t load here)

There is a great quote within the above talk:

¨When we pursue universal principles in food, we aren’t just making an error, we are actually doing ourselves a massive disservice¨

You could replace the word ‘food’ with ‘information’. It’s not just the fileplan that needs rethinking…

SPC 2008 – Analyst Mash-up

Anyone who attended the analysts session at SPC 2008 will immediately understand the title… read on to find out more.

Yesterday I attended a session – Analyst Panel: AMR Research, Forrester and Gartner Anaslysts Weigh In. I just know it is going to have been the best session of the conference. Apologies to those who have yet even to present.

Introductions: Tom Rizzo moderated the session. Hat tip to Rizz, he was brilliant! The analysts who braved the audience:

  • Gene Phifer: Gartner
  • Kyle McNabb: Forester
  • Rob Koplowitz: Forester
  • Erica Drive: Forester
  • Jim Murphy: AMR

The following are my scribbled notes typed during the session. They are not a perfect transcription, no guarantees regarding accuracy yade yade yade. My comments are either in brackets within the answers or called out separately under ‘my comments:’

Q – What is your view on enterprise records management in MOSS

Kye: Is RM ready for primetime? Usual answer – it depends. Truthfully, for larger organisations with complex requirements for retention management, then no. Microsoft knows this and has partnered with other companies in this space.

Gene: Ditto. Business records management is one of the missing pieces. It is not feature complete. MOSS with additional third parties is required.

Q – So what are the top 3 things that MOSS needs to make it ready for prime time records management?

Kyle: 1st = experience. Tower and others have been around for longer (this is the stupidest answer. What can MS do about this? Keep plugging away with random releases for the next few years just to prove they’ve been around for long enough?). 2nd = additional certifications, such as UK’s TNA/PRO. It is a checklist requirement for projects but good practice (Agreed but this also has little to do with improving the actual technology.) 3rd = notion of a single secured records repository is being blown away by new technologies such as LinkedIn, Facebook, web apps. You need to put policies on the asset itself (i.e. the record) rather than the store. (Agreed, but what current prime time vendor is already doing this today. If SharePoint is playing catch-up, it needs to catch-up. Trying to reinvent the records management market,despite it being a much needed effort, is unlikely to be a winning strategy for MS in the near term. In other words, the analysts didn’t come up with one concrete requirement that MS needs to build to make SharePoint a prime time platform for records management. Like the much needed ability to handle compound documents, manage complex taxonomy structures such as multi-faceted and poly hierarchies for classifying content, how to create a file plan/classification schema that both works and doesn’t kill the search engine in the process…there you go – 3 things I think MOSS needs to make it ready for prime time record management!)

Q – what surprised most about MOSS 2007

Jim: the breadth of capability. MOSS is covering 6 formerly distinct areas of technology. It is not surprising to see MS do it but it is interesting to see how MOSS is disrupting all 6 spaces at once.

Erica: 1. How quickly it evolved into an application platform that competes with Lotus Domino; 2. Adoption rate is like weeds and wildfire.

Rob: It was an aggressive release, almost entrepreneurial in nature, and aggressive in terms of continued improvement

Gene: In addition to the pace of adoption, I’m more surprised by what wasn’t there. Mash-ups – not mentioned during the keynote. Popfly is doing that stuff. Connecting web parts does not create mash-ups, that’s just interoperability

Tom: Hey, MOSS itself is the biggest composite application of all.

My comments: a great debate sparked up on what is or isn’t a mash-up (a.k.a composite application), including audience participation. Gene was absolutely adamant that MOSS cannot do mash-ups. His attitude – mash-ups are about RESTful applications. I was on Tom’s side in this debate. In my opinion, a mash-up is simply the outcome from mashing two independent sources of information together to create new information. For example, mapping Flickr images using Virtual Earth. SharePoint performs two core roles – 1. storage of content in sites, lists and libraries. 2. a web user interface for accessing digital content – stored locally within SharePoint and elsewhere. Web parts can be used to display content in many different ways, including creating connections between multiple sources (integration) and creating visualisations that are the outcome from mash-ups. No, you wouldn’t use SharePoint itself to create the mash-up. (Ditto for other portal/collaborative platforms) And in this way, Gene was correct. But you would use SharePoint as the user interface to display the outcome from a mash-up. Should we see Popfly type functionality within SharePoint in the future? I should hope so… but I wonder if it will be a feature native to the future hosted version of SharePoint only… Tom certainly kept the session fun and lively by managing to squeeze in references to what is or isn’t a mash-up throughout the remaining questions 🙂

Q – What’s your take on the consumerisation of IT

Gene: We’ve seen it for years, consumer technology coming inside the organisation.

Rob: Forrester is working on an agenda called Technology Populism – they believe the current approach is more dramatic than we have seen in the past. Research shows huge upswing in web 2.0 as organisations create strategic directions based on it. The problems are growing – content going out to public domain sites that shouldn’t. You need a strategy to take control of this. Applaud Microsoft’s partnerships with Newsgater and SocialText to tackle this space.

Kyle: The line between work inside the organisation and out of hours activities is disappearing. I.T. in the past has been about applications and hard-core technology. Now it is becoming more about information and how you manage information, down to the individual item level. Stuff outside the organisational boundary is currently completely unmanaged. No technology company has come up with a solution yet to tackle this space.

Jim: The dissolving lines between work, home, and community is a more fundamental change than consumerism of technology (which has been happening since PCs first appeared). How can businesses take advantage of technologies like Facebook, monetise it? Anticipate extreme pressures in terms of risk over the next few years, with regard to information leaking. But it is an issue that we need to get over as companies globalise their businesses and become more transparent as those lines dissolve (The added challenge here is that governments show no signs of embracing globalisation in terms of policies placed against information assets and how government data is managed.)

Q – what’s the biggest technology shift in general that will affect the market that sharepoint plays in?

Rob: SaaS. You’ve got to rearchitect applicationss from bottom up to fit the new data centre environments (I’d second this – a separate blog post is brewing on this topic)

Q – what comes after Web 2.0

Erica: 5 – 7 years is going to be Web 3D. 3D environment with avatars, integrating with regular 2D environment. We will see it used for training, demos, simulations… SharePoint has nothing in this space today, no sign of a bridge, I keep telling MS that this stuff is coming. (I think most analysts over estimate the impact of SecondLife and friends to mainstream business, and underestimate other peoples’ ability to see what’s coming. Yes, it has fantastic potential in certain scenarios, but it’s simply not even close to being available in a usable format for the majority of business. And honestly, does Erica think that nobody in MS has heard of SecondLife or is aware of the growing integration of virtual 3D environments into real-world processes and activities? Not sure that the current form of SharePoint is the place to be developing such capabilities… perhaps the hosted online version of the future. For 3D environments to become integrated into daily activities will require them to be hosted on massive data centers…)

Kyle: (staring at the audience) You in IT need to learn what your people do! They use this stuff all the time, you just don’t know it. (I’m not sure who in the audience he thought this actually applied to. Sure, there are some IT depts who have a complete disconnect with the people they support… but I doubt many in that category are allowed out to have fun and attend conferences. Mandates to lock down computers and ban access to Facebook and friends rarely start from within the IT dept…)

Q – What’s your take on virtualisation

Gene: Seeing it at the hardware and OS layer first, it will naturally work up the stack

(Checked the audience for who was using virtualisation in some form – most were, and a lot were for SharePoint. Audience member asked out – what’s Microsoft support for virtualisation with SharePOint)

Tom: SharePoint is supported on Microsoft’s own virtualisation server. PSS will do a best effort level of support on other virtualisation technology like VMWare. According to VMWare, SharePoint is 4th largest load being virtualised

Q Put yourself in Microsoft’s shoes. What’s the one next thing MOSS should do next?

Erica: its the silverlight stuff – more visual and contextual ways of working

Gene: Figure out how are you going to charge for SharePoint SaaS-based services. How are enterprises going to adopt this? What’s the monetary model going to be. Google is free… MOSS should be more capability but how to justify the charging

Kyle: coping with the hybrid that is going to be a mix of on-premise and in the cloud

Jim: Mash-ups and more analytics. How does sharepoint tell me how i’m doing, what the business value is of what i’m getting, how well is this working… and better taxonomy management

Q – It’s 2020. What are the hot topics

Kyle: The interesting question will be what will have happened to Google. We can imagine IBM still being around, ditto for Microsoft. But will Google still exist even? (Spot on with this observation. 2020 is 12 years away, 12 years ago, Google didn’t exist, the web was barely
2 years old from a non-academic/non-military perspective.)

Bob: There will be a fundamentally different approach to business processes.

Jim: lingual support is likely to be a bigger issue (Good point, will China be the super power by then?)

Gene: by 2020, the digital natives will be running companies. So stuff that seems ‘out there’ to today will be business as usual… (very good point, makes you realise just how inevitable the transformation of business is going to be in what we will look back on as a very short space of time. At the moment, it’s an uphill battle to get many organisations to acknowledge the ineffeciencies in current busines practices.)

I’m bagging the final comment. I was surprised that ‘green computing’ wasn’t called out. I can see the need to reduce carbon emissions (likely driven via government policy) as being a driver to move I.T. services into ‘the cloud’ and start using hosted data centers rather than attempt to grow your own to host applications (3D?) and information that require increasing amounts of hardware resources…

Technorati tag: SPC 2008

Documents versus Records

During 2002, I was involved in a project to implement electronic records management (not SOX, that act was still in development…) The subject has cropped up again and brought back some memories…

The biggest challenge the project faced was that electronic records had to be managed in an identical way to paper-based files. That experience resulted in point #4 in ‘Why is KM so difficult‘. I remember, in frustration, muttering ‘you can’t shred electronic records’ when trying to explain why paper files and folders had fundamentally different properties to electronic ones. (Enron was headline news at the time, as was Microsoft‘s DoJ case)

The second challenge was that the project also wanted to improve document management (including collaboration) and, unsurprisingly, some thought it would be a great idea to roll out a single system to cover both. Now, in theory, it can be done but it needs to be recognised that managing the archiving of a record and managing the creation of a document are very different activities:

Records Management:

  • Management of the record is more important than the content of the record
  • The record never changes (although its properties might)
  • Records require access controls, lots of them

Document Management/Collaboration:

  • Without content there is no document
  • The document changes a lot, that’s the whole point of collaboration
  • Access controls restrict and impede collaboration, the fewer there are the better

The ability to have one store manage the entire document lifecycle process depends on what’s involved. If you require very granular records management capabilities, the database design will be different to what you need for most document management and collaboration activities. Mashing the two together could lead to performance and scalability problems. It’s no different to any other situation where you can have dedicated solutions or combine them together. A sofa-bed is not as comfortable as a sofa or a bed, but if you can’t have both, a sofa-bed may be better than only having a sofa or a bed… You have to decide if the trade-off is worth it.

If you insist on forcing an electronic system to replicate a paper-based system, you risk stretching the technology beyond reasonable design limits. Either the system will fail, or people will fail to use the system. As the LATCC learned, a change in tool sometimes requires a change to the process. Failure to do so can lead to escalating costs for a system that will never perform optimally – just because something is possible does not make it plausible.

My advice for companies who need to provide records management and who also want to improve document collaboration:

  • Collaboration and records management have different goals and objectives
  • All records are documents, not all documents are records
  • The ideal records management solution should be an extension to the collaborative environment, not the other way round
  • Collaborate to create the document, apply locking controls when the final document is declared a record
  • If the required controls are complicated, the records may need a separate database designed specifically to provide those controls
  • If you are in charge of designing the required controls, please give some thought to the economic implications of your decisions

And never forget, people can always circumvent the system. Put too many controls in place and they simply will not create the document – no document, no record…