Who gets to own or access the data?

Data Center

Short version: It’s easier to comprehend why Facebook bought Instagram for crazy money if you ignore the social networking and instead focus on the value in automatic location updates via Internet-connected mobile devices. That’s a place where Facebook can build a serious business model.

“All models are wrong, but some are useful.” – George Box, Statistician, circa 1978

“All models are wrong, and increasingly you can succeed without them.” – Peter Norvig, Google, 2008

One of the reasons the technology sector is in the news so much at the moment is the emergence of five connected trends disrupting so many traditional industries: massive online social networks, social media tools, internet-connected mobile devices, cloud computing and ‘big data’ analytics.  The social networks enable us to connect with anyone globally, social media tools have made it easy to share thoughts and opinions with those connections, internet-connected devices enable us to post updates instantly and from any location – no more waiting ’til you get home and login to your computer.  Cloud computing enables all this information to be stored and accessed over the Internet.  And accessing massive amounts of data, updating in real-time, enables new forms of analytics not previously possible.

An early new market is the world of social media analytics – providing feedback in real-time about what people are saying about your organisation or product/service. Sentiment analysis adds emotion – are people using words that are positive or negative, happy or sad, loving or hating.  Mining Internet data such as Tweets and other status updates is far more effective than standing on a street corner trying to conduct a market survey.

But who gets to access all of this data? We share it freely and lose ownership in the process.

In February, the New York Times published an interview ‘Just the facts? Yes, all of them’ with Gil Elbaz. His first company, Applied Semantics, was acquired by Google and formed the basis of Adsense, Google’s business model. Gil’s latest venture – Factual – is focused on acquiring massive data sets, and then selling access to them.  Current storage is running at 500 Terabytes:

FACTUAL sells data to corporations and independent software developers on a sliding scale, based on how much the information is used. Small data feeds for things like prototypes are free; contracts with its biggest customers run into the millions. Sometimes, Factual trades data with other companies, building its resources.

Factual’s plan is to build the world’s chief reference point for thousands of interconnected supercomputing clouds.

And now this month (via Techmeme) Forbes has an article  asking ‘Will Data Monopolies Paralyze the Internet?’. Is it the end of Web 2.0 as blogs and status updates become locked inside password-protected social networks?  They think not because more data lies outside them and if it can be mined, any entrepreneur can do it given sufficient resources.

But not all data is open to mine.  The Forbes article highlights a new area of focus and I disagree with their position (emphasis mine):

Some very promising data hasn’t been collected on a large scale yet and might be less susceptible to monopolization than things like status updates. Lots of people I spoke with at the Where conference last week were excited about new ways to approach ambient data. …[collecting] the little specks of data that we’re constantly releasing–our movements, via smart phone sensors; our thoughts, via Twitter feeds–and turn them into substantial data sets from which useful conclusions can be inferred. The result can be more valuable than what you might call deliberate data because ambient data can be collected consistently and without relying on humans to supply data on a regular basis by, say, checking in at favorite restaurants. It also offers great context because constant measurements make it easier to understand changes in behavior.

The article is right to emphasise the value of automatic updates over manual ones – a phone automatically registering your location versus you manually ‘checking in’ to a location is both easier and more reliable. (Hence why Instagram is potentially more valuable than Foursquare).  But it also highlights just how important mobile devices are in this equation.

Who gets to own or access those updates captured by a mobile device’s sensors?  Simple. The device manufacturer (e.g. Apple), the network operator transmitting the data (e.g. AT&T), and/or the app you granted access to record the data (e.g. Instagram – automatically geo-tagging your photos for you).  Sure, the social network gets a look in if you allow the app to share. But it’s far lower down the chain compared to the app installed on the mobile device.  And top of the queue is the device itself. You can connect those dots for yourself.  Small wonder there are constant rumours that Facebook and Google are building/buying their own mobile devices. Presumably Microsoft too (well they’ll probably buy Nokia)…

In that context, Instagram is valuable to Facebook way beyond its benefits as a social network.  Those location updates originating from Apple and Android devices are a large, accurate and valuable dataset that Facebook now owns.

Related blog posts


Lies, Damned Lies and Statistics

Tyranny of Numbers

The Economist has an article ‘Don’t lie to me Argentina‘ explaining why they are removing a figure from their indicators page:

Since 2007 Argentina’s government has published inflation figures that almost nobody believes. These show prices as having risen by between 5% and 11% a year. Independent economists, provincial statistical offices and surveys of inflation expectations have all put the rate at more than double the official number…

What seems to have started as a desire to avoid bad headlines in a country with a history of hyperinflation has led to the debasement of INDEC, once one of Latin America’s best statistical offices…

We see no prospect of a speedy return to credible numbers. From this week, we have decided to drop INDEC’s figures entirely.

Whilst we often talk about how statistics can always provide the answers people are looking for (hence the popular quote used as the title for this post), there is another angle to consider – are the underlying numbers telling the truth? It is a critical question when decisions are based on increasingly complex calculations that are then converted into summary data visualisations to assist decision-making. Was the original source data automatically scraped from systems or keyed in by people? Just how balanced is that scorecard..?

One of the current hot trends on the Internet is the emergence of ‘Big Data’ – being able to scrape massive quantities of information automatically generated, such as the search and surfing habits of everyone who ever logged into Facebook…  and then analysing for patterns. One of the potential attractions is eliminating human error – or influence (in terms of truthfulness) – over the underlying data sources. Doesn’t solve the challenge of influence through gaming of the system but that’s perhaps a post for another day.

If you are interested in the use and abuse of statistics, there’s an excellent short book that walks through historical examples of where statistics simply don’t work – The Tyranny of Numbers: Why counting won’t make us happy, by David Boyle – Click Here for an old book review I wrote.  And naturally, the book is listed on Amazon.

Related blog posts:

When security leaks matter

Laptop secure but not

There’s lots of news about the latest release of classified documents on Wikileaks. If you want to have a peek, The Guardian has a great visualisation to get started.

I had a mooch. Everything I scanned through was thoroughly boring. As is the case with most information, even the classified stuff.

Over the past 10 years, I’ve worked with numerous government organisations. When discussing intranets, collaborative sites and knowledge management systems, one of the most frequent concerns is how to secure access to information and prevent the wrong eyes from seeing it. It is no small irony that the first ever monetary fines applied by the Information Commissioner’s Office (ICO) this month were for breaches that had nothing to do with networks.

The first was a £100,000 fine against Hertfordshire County Council for two incidents of faxing highly sensitive personal information to the wrong people. The second was a £60,000 fine against an employment services company for the loss of a laptop.

Here’s another example. A fair few years ago, I was in Luxembourg to present at an EU event. The night before the meeting I was in my hotel room when an envelope appeared under the door. Assuming it was details about the event, I opened it and pulled out the documents. The first hint that the documents might not be to do with the event was seeing Restricted stamped across the top of the first page. The second indication was, when scanning the content, it became apparent the documents were something to do with nuclear weapons facilities across Europe. By that point, I looked at the front of the envelope to discover that it wasn’t addressed to Miss S Richardson (i.e. me) but was instead addressed to <insert very senior military rank I can’t remember> Richardson. A rather terse conversation took place in the hotel reception as the documents were forwarded to their rightful recipient.

All three examples above were security breaches due to stupid human error. None involved networks or bypassing security systems. But only one required legal intervention – the faxing of content to the wrong people that was both legally confidential and highly sensitive.

And that’s the rub. Most security leaks don’t matter. A few years ago, some idiot in the UK tax office HMRC downloaded oodles of personal details to a CD and then lost it in the post. If there has been a bout of serious identity theft as a result, I haven’t heard about it. Ditto for a more recent breach that managed to send addresses and bank details to the wrong people. More people have been affected by a cock-up in the calculation of tax due to incorrect data than from identity theft due to lost data.

Most of the content on Wikileaks is embarassing to its targets (usually governments and/or large corporations) rather than dangerous. Yes there are exceptions, such as the failure to redact personally incriminating information from documents that could put lives in danger. But they are the exception, rather than the norm we tend to assume when documents are classified as confidential.

One of the recommendations I give to clients looking to improve the use and value of their intranets is to devalue information. Make it easier to access. The confidentiality of most content is over-rated. It’s importance and usefulness to other people is often under-rated.

For most organisations, content falls into one of three categories:

  • Legal – fines and prison (though that is rare) may result from failing to protect legal documents
  • Sensitive – contains information that could put at risk or be damaging to an individual or organisation
  • Everything else

There’s no arguing over legal documents. No prizes for guessing at least most content falls (or should) under Everything Else. And sensitive…whilst some is easily justified (such as research into a new prototype that you wouldn’t want your competitors knowing about) an awful lot is considered sensitive purely to avoid embarrassment or conflict. Sometimes people should question verbalising their opinions, let alone putting them in writing… And if you work in government, for goodness sake don’t save it on your laptop!

One closing quote whilst on the subject of securing information. Whilst it refers to anonymity, it equally applies to trying to hide information from public view, which too often appears to be the reason for confidential classifications: (apologies, I can’t recall the source)

Providing a level of anonymity is great for play but prevents accountability

References and examples:

Information is still sticky

Just over four years ago, I wrote a post called Sticky Information. It’s a topic I rarely see discussed but it can have a big influence on our decisions and actions. If we ‘like’ a piece of information, we will cling on to it often in the face of any attempts to alter that state. Equally, information that doesn’t fit our view quickly slips into obscurity. We do not treat all information equally.

The Nielson Company, publisher of web trends, has come a cropper with a statistic for iPad downloads. Their original article claimed one third of iPad owners have never downloaded an app. A startlingly sticky statistic that was promptly regurgitated on many news sites. The Nielson Company has since updated the article and corrected the number from 32% to 9%, a pretty big reduction. But news sites don’t tend to hang around a story for long. I wonder how many will correct their articles or will someone come across and use the incorrect data. Unlikely to be serious consequences for this one but it shows the importance of checking and re-checking original sources when relying statistics…


PerformancePoint – A brief history

A few years ago, I published an infographic showing the history of SharePoint, to help decypher the different twists, turns and acquisitions that influenced what went into (and out of) SharePoint. (May get round to doing an update on that sometime…)

A related product has also had a few twists and turns of its own – PerformancePoint. The clue is in the name, it’s in the same family of products as SharePoint and originally targeted performance management solutions. Here’s its life story so far…

PerformancePoint History

Back in 2001, business intelligence and performance management were quite hot topics but became overshadowed by the rise of the portal. An early market leader was ProClarity and most people thought Microsoft would acquire it. Instead they purchased Data Analyzer, owned by a ProClarity partner.In the same year, Microsoft acquired Great Plains, a provider of business applications to small and medium-sized organisations. Included with the acquisition was FRx Forecaster which had been acquired by Great Plains the previous year.

Data Analyzer remained available as a desktop product for a while before disappearing. Some of the technology merged into what would become Microsoft’s first performance management server product: Business Scorecard Manager 2005 (BSM – naturally, not to be confused with the British School of Motoring if you’re reading this in the UK 🙂 )

BSM enabled you to define key performance indicators (KPIs) and then create scorecards and dashboards to monitor and analyse performance against targets. The product included web parts that could display those KPIs, scorecards and dashboards on a SharePoint site. It even had a little bit of Visio integration producing strategy maps (a key component of an effective business scorecard).  BSM was a classic v1 product: difficult to install, basic capabilities and limited adoption by organisations.

In 2006, Microsoft finally acquired the company it should have bought in the first place – ProClarity, which had a desktop and server product. The products were available standalone and some of the technology integrated into the replacement for BSM – PerformancePoint Server 2007 (PPS). Also integrated into PPS was a new forecasting capability based on the FRx Forecaster

PPS was effectively two products – a Monitoring Server and a Planning Server. The Monitoring Server included a revamped Dashboard Designer with improvements to the core monitoring and analysis capabilities – KPIs, reports, scorecards and dashboards. It also leveraged corresponding web parts available in SharePoint Server 2007 Enterprise Edition. The Planning Server included a new Planning Business Modeler that enabled multiple data sources to be mapped and used to plan, budget and forecast expected performance. The Planning Server proved particularly problematic to configure and use…

In 2009, Microsoft announced that PerformancePoint Server was being discontinued. The Monitoring Server elements were to be merged into future releases of SharePoint (and anyone licensed for SharePoint Server 2007 Enterprise Edition was immediately given access to PerformancePoint Server 2007 as part of that license). The source code for the Planning Server elements was released under restricted license as a Financial Planning Accelerator, ending its life within Microsoft. The FRx technology returned to the Dynamics product range.

In 2010, SharePoint Server 2010 was released and the Enterprise Edition includes the new PerformancePoint Service complete with dashboard and scorecarding capabilities but no planning options. This year also saw the release of Management Reporter which offers both monitoring and planning capabilities with direct integration into the various Dynamics products. And a new BI tool was released – PowerPivot for Excel, an add-in that enables you to create pivot tables and visualisations based on very large data sets. A trend worth keeping an eye on…

Going forward, Microsoft has business intelligence and performance management solutions in two camps: the Office and SharePoint platform that can provide a front-end to business applications and data sources of all shapes and sizes; and the Dynamics Product range that provides end-to-end business applications for small- to medium-sized organisations (and divisions within larger organisations). Dynamics can also leverage SharePoint as its front-end, just like any other business application.

Microsoft Business Intelligence and Performance Management tools

SQL Server continues to provide the core foundation for all data-driven solutions – offering its own database capabilities as well as warehousing and integration with other ODBC-compliant data sources plus the reporting and analysis services on which BI solutions are built. SharePoint provides the web front-end for information and data-driven solutions amongst other things, like search, collaboration etc… Office continues to provide desktop tools as well as web-based versions that integrate with SharePoint. Excel now has its sidekick PowerPivot (wish they’d named that one PivotPoint…), Visio continues to be, well, Visio – one of the few acquisitions to keep its original name intact. And also worth a mention are Bing Maps and MapPoint, which provide location-specific visualisations. I originally wrote that MapPoint was discontinued. But did a search to check when it stopped being available only to find it alive and well as MapPoint 2010… hey ho!

You’d be right to think this performance management roadmap has looked a little rocky. What’s interesting to note is there is a Corporate Performance Management team within the Dynamics group, whilst Business Intelligence messaging barely mentions it, focusing instead on subsets of performance management – reporting and analysis.

If you are a performance management purist, you will likely be disappointed with the capabilities offered by PerformancePoint, much in the same way a taxonomy purist will gripe at the limitations within ManagedMetadata. Both are services within SharePoint 2010 that help manage and visualise information – they are part of a platform as opposed to specialist niche solutions that will typically offer a more comprehensive feature set. But if you want to start improving how everyone interacts with information and data as part of daily decisions and activities, a platform is a pretty good place to begin, requiring less skills or resources to get started.

Final note: All the above comments are based on my own opinions and observations. They do not represent any Microsoft official statements from the past, present or future 🙂 Have to mention on this sort of post as it covers the period of time I worked at Microsoft.


Related blog posts