Information | Joining Dots

Computer Vision Accuracy

January 18, 2016 / Sharon Richardson / 6 Comments

As the accuracy and range of image detection increases, so will it’s uses. How good are computers at detecting faces, facial attributes and emotions…?

The uncertainty of data

July 21, 2015 / Sharon Richardson / No Comments

The biggest risk with relying on data is not its accuracy but treating something to be certain when it may be anything but… Read More

The inaccuracy of data

July 21, 2015 / Sharon Richardson / 1 Comment

The bigger the pot of data, the more likely that inaccuracies will creep in to the analysis. The consequences depend on what happens next…

Glastonbury Heatmap

July 10, 2015 / Sharon Richardson / No Comments

Plotting spatial updates shared via Twitter during the recent Glastonbury festival Read More

100 years of Bank of England Data

May 22, 2015 / Sharon Richardson / No Comments

One of the benefits of an interactive visualisation is the ability to zoom in and out of the data, altering the scale of analysis. From summaries and averages to individual data points or specific periods in space and time.

Defining Intranets, Defining Systems

May 23, 2013 / Sharon Richardson / No Comments

Before replacing an intranet, first check whether the original definition still holds true. Digital workspaces continue to evolve…

Do you need a portal?

August 24, 2012 / Sharon Richardson / 1 Comment

Back in April, I presented some sessions at the International SharePoint Conference held in London. One of the sessions was titled ‘We need a portal’. It was part of the business track, exploring how to run SharePoint projects.

Who gets to own or access the data?

April 13, 2012 / Sharon Richardson / No Comments

Short version: It’s easier to comprehend why Facebook bought Instagram for crazy money if you ignore the social networking and instead focus on the value in automatic location updates via Internet-connected mobile devices. That’s a place where Facebook can build a serious business model.

“All models are wrong, but some are useful.” – George Box, Statistician, circa 1978

“All models are wrong, and increasingly you can succeed without them.” – Peter Norvig, Google, 2008

One of the reasons the technology sector is in the news so much at the moment is the emergence of five connected trends disrupting so many traditional industries: massive online social networks, social media tools, internet-connected mobile devices, cloud computing and ‘big data’ analytics. The social networks enable us to connect with anyone globally, social media tools have made it easy to share thoughts and opinions with those connections, internet-connected devices enable us to post updates instantly and from any location – no more waiting ’til you get home and login to your computer. Cloud computing enables all this information to be stored and accessed over the Internet. And accessing massive amounts of data, updating in real-time, enables new forms of analytics not previously possible.

An early new market is the world of social media analytics – providing feedback in real-time about what people are saying about your organisation or product/service. Sentiment analysis adds emotion – are people using words that are positive or negative, happy or sad, loving or hating. Mining Internet data such as Tweets and other status updates is far more effective than standing on a street corner trying to conduct a market survey.

But who gets to access all of this data? We share it freely and lose ownership in the process.

In February, the New York Times published an interview ‘Just the facts? Yes, all of them’ with Gil Elbaz. His first company, Applied Semantics, was acquired by Google and formed the basis of Adsense, Google’s business model. Gil’s latest venture – Factual – is focused on acquiring massive data sets, and then selling access to them. Current storage is running at 500 Terabytes:

FACTUAL sells data to corporations and independent software developers on a sliding scale, based on how much the information is used. Small data feeds for things like prototypes are free; contracts with its biggest customers run into the millions. Sometimes, Factual trades data with other companies, building its resources.

Factual’s plan is to build the world’s chief reference point for thousands of interconnected supercomputing clouds.

And now this month (via Techmeme) Forbes has an article asking ‘Will Data Monopolies Paralyze the Internet?’. Is it the end of Web 2.0 as blogs and status updates become locked inside password-protected social networks? They think not because more data lies outside them and if it can be mined, any entrepreneur can do it given sufficient resources.

But not all data is open to mine. The Forbes article highlights a new area of focus and I disagree with their position (emphasis mine):

Some very promising data hasn’t been collected on a large scale yet and might be less susceptible to monopolization than things like status updates. Lots of people I spoke with at the Where conference last week were excited about new ways to approach ambient data. …[collecting] the little specks of data that we’re constantly releasing–our movements, via smart phone sensors; our thoughts, via Twitter feeds–and turn them into substantial data sets from which useful conclusions can be inferred. The result can be more valuable than what you might call deliberate data because ambient data can be collected consistently and without relying on humans to supply data on a regular basis by, say, checking in at favorite restaurants. It also offers great context because constant measurements make it easier to understand changes in behavior.

The article is right to emphasise the value of automatic updates over manual ones – a phone automatically registering your location versus you manually ‘checking in’ to a location is both easier and more reliable. (Hence why Instagram is potentially more valuable than Foursquare). But it also highlights just how important mobile devices are in this equation.

Who gets to own or access those updates captured by a mobile device’s sensors? Simple. The device manufacturer (e.g. Apple), the network operator transmitting the data (e.g. AT&T), and/or the app you granted access to record the data (e.g. Instagram – automatically geo-tagging your photos for you). Sure, the social network gets a look in if you allow the app to share. But it’s far lower down the chain compared to the app installed on the mobile device. And top of the queue is the device itself. You can connect those dots for yourself. Small wonder there are constant rumours that Facebook and Google are building/buying their own mobile devices. Presumably Microsoft too (well they’ll probably buy Nokia)…

In that context, Instagram is valuable to Facebook way beyond its benefits as a social network. Those location updates originating from Apple and Android devices are a large, accurate and valuable dataset that Facebook now owns.

Related blog posts

Lies, damn lies and statistics – Feb 2012
Thinking in reverse – July 2008
Zillionics change perspective – April 2008

References

Will data monopolies paralyze the Internet? – Forbes, Apr 2012
Just the facts. Yes all of them – New York Times, Mar 2012

Lies, Damned Lies and Statistics

February 26, 2012 / Sharon Richardson / No Comments

The Economist has an article ‘Don’t lie to me Argentina‘ explaining why they are removing a figure from their indicators page:

Since 2007 Argentina’s government has published inflation figures that almost nobody believes. These show prices as having risen by between 5% and 11% a year. Independent economists, provincial statistical offices and surveys of inflation expectations have all put the rate at more than double the official number…

What seems to have started as a desire to avoid bad headlines in a country with a history of hyperinflation has led to the debasement of INDEC, once one of Latin America’s best statistical offices…

We see no prospect of a speedy return to credible numbers. From this week, we have decided to drop INDEC’s figures entirely.

Whilst we often talk about how statistics can always provide the answers people are looking for (hence the popular quote used as the title for this post), there is another angle to consider – are the underlying numbers telling the truth? It is a critical question when decisions are based on increasingly complex calculations that are then converted into summary data visualisations to assist decision-making. Was the original source data automatically scraped from systems or keyed in by people? Just how balanced is that scorecard..?

One of the current hot trends on the Internet is the emergence of ‘Big Data’ – being able to scrape massive quantities of information automatically generated, such as the search and surfing habits of everyone who ever logged into Facebook… and then analysing for patterns. One of the potential attractions is eliminating human error – or influence (in terms of truthfulness) – over the underlying data sources. Doesn’t solve the challenge of influence through gaming of the system but that’s perhaps a post for another day.

If you are interested in the use and abuse of statistics, there’s an excellent short book that walks through historical examples of where statistics simply don’t work – The Tyranny of Numbers: Why counting won’t make us happy, by David Boyle – Click Here for an old book review I wrote. And naturally, the book is listed on Amazon.

from atoms to bits and back again

Tag: Information