big data | Joining Dots

Spotify’s strategy for machine learning

June 11, 2021 / Sharon Richardson / No Comments

On June 10, 2021, Google hosted an Applied ML seminar with an opening keynote by Tony Jabara, VP of Engineering and Head of Machine Learning at Spotify. Jabara presented some fascinating insights into Spotify’s current strategy for machine learning and their growing use of reinforcement learning.

photo of a couple walking along a street

Modelling socio-spatial dynamics

July 10, 2020 / Sharon Richardson / No Comments

The near-pervasive adoption of mobile devices and the growing use of sensors embedded in physical environments are enabling a new generation of models for studying human-environment interactions.

Machine and Deep Learning Limits

November 14, 2018 / Sharon Richardson / 2 Comments

If decisions are to be delegated to artificially intelligent machines, we need to appreciate the limits of intelligence without cognition

How sensor data gets smart

October 24, 2018 / Sharon Richardson / No Comments

Talk delivered at The Things Conference. Discussing the good, the bad, the ugly and the beautiful ways in which data traces from digitised physical interactions can be converted into actionable insights…

What the data never tells you

January 29, 2013 / Sharon Richardson / No Comments

Predictive analytics will only get you so far in picking the right people. And sometimes it will get you so wrong. Teams are about more than just skills and abilities. Tired teams outperform fresh teams when making key decisions Read More

image: push a button and share knowledge

Digital trends compressing processes

January 24, 2013 / Sharon Richardson / 2 Comments

In November 2012, Joining Dots delivered a presentation at Ovum Analysts’ Business Process Management event: ‘Compressing processes – How productivity is going social and mobile’

What we share online in 60 seconds

October 10, 2012 / Sharon Richardson / No Comments

For online communications in a world where information can be shared in an instant, does blogging still matter? Choose the shortest format to tell your story and work back from there to cover the different platforms where your audience resides

Neville Hobson, also known as @jangles, has posted a recent presentation to Slideshare: Is there any point in blogging? The slides are a great walkthrough the different formats now popular for communicating online and how organisations can use the channels effectvely. Also included in the slides is the infographic above. Published in June 2011 by GO-Gulf.com, it gives a real feel for the massive volume of opinions flowing across the Internet and why ‘big data’ matters. Imagine being able to mine those insights in real-time to influence decisions.

Here’s Jangles slidedeck:

And back to the question that forms the title. Is there any point in blogging? Well here I am, writing a blog post… 🙂 But the answer, in true consulting style is ‘it depends’. The overall value is definitely lower than 5 years ago due to the sheer number of blogs out there. Thanks to power laws and long tails, discoverability now has little correlation with quality of the content. Industries that benefit from visuals and location-awareness may find short format alternatives like Instagram and Pinterest of more benefit than the longer format of traditional text-y blog posts.

Two tips to get the most value out of blogging/online communications:

If you have to pick one medium, choose the shortest format that tells the story. Work back from there.
Make the content available on the platforms that your target audience inhabits.

Who gets to own or access the data?

April 13, 2012 / Sharon Richardson / No Comments

Short version: It’s easier to comprehend why Facebook bought Instagram for crazy money if you ignore the social networking and instead focus on the value in automatic location updates via Internet-connected mobile devices. That’s a place where Facebook can build a serious business model.

“All models are wrong, but some are useful.” – George Box, Statistician, circa 1978

“All models are wrong, and increasingly you can succeed without them.” – Peter Norvig, Google, 2008

One of the reasons the technology sector is in the news so much at the moment is the emergence of five connected trends disrupting so many traditional industries: massive online social networks, social media tools, internet-connected mobile devices, cloud computing and ‘big data’ analytics. The social networks enable us to connect with anyone globally, social media tools have made it easy to share thoughts and opinions with those connections, internet-connected devices enable us to post updates instantly and from any location – no more waiting ’til you get home and login to your computer. Cloud computing enables all this information to be stored and accessed over the Internet. And accessing massive amounts of data, updating in real-time, enables new forms of analytics not previously possible.

An early new market is the world of social media analytics – providing feedback in real-time about what people are saying about your organisation or product/service. Sentiment analysis adds emotion – are people using words that are positive or negative, happy or sad, loving or hating. Mining Internet data such as Tweets and other status updates is far more effective than standing on a street corner trying to conduct a market survey.

But who gets to access all of this data? We share it freely and lose ownership in the process.

In February, the New York Times published an interview ‘Just the facts? Yes, all of them’ with Gil Elbaz. His first company, Applied Semantics, was acquired by Google and formed the basis of Adsense, Google’s business model. Gil’s latest venture – Factual – is focused on acquiring massive data sets, and then selling access to them. Current storage is running at 500 Terabytes:

FACTUAL sells data to corporations and independent software developers on a sliding scale, based on how much the information is used. Small data feeds for things like prototypes are free; contracts with its biggest customers run into the millions. Sometimes, Factual trades data with other companies, building its resources.

Factual’s plan is to build the world’s chief reference point for thousands of interconnected supercomputing clouds.

And now this month (via Techmeme) Forbes has an article asking ‘Will Data Monopolies Paralyze the Internet?’. Is it the end of Web 2.0 as blogs and status updates become locked inside password-protected social networks? They think not because more data lies outside them and if it can be mined, any entrepreneur can do it given sufficient resources.

But not all data is open to mine. The Forbes article highlights a new area of focus and I disagree with their position (emphasis mine):

Some very promising data hasn’t been collected on a large scale yet and might be less susceptible to monopolization than things like status updates. Lots of people I spoke with at the Where conference last week were excited about new ways to approach ambient data. …[collecting] the little specks of data that we’re constantly releasing–our movements, via smart phone sensors; our thoughts, via Twitter feeds–and turn them into substantial data sets from which useful conclusions can be inferred. The result can be more valuable than what you might call deliberate data because ambient data can be collected consistently and without relying on humans to supply data on a regular basis by, say, checking in at favorite restaurants. It also offers great context because constant measurements make it easier to understand changes in behavior.

The article is right to emphasise the value of automatic updates over manual ones – a phone automatically registering your location versus you manually ‘checking in’ to a location is both easier and more reliable. (Hence why Instagram is potentially more valuable than Foursquare). But it also highlights just how important mobile devices are in this equation.

Who gets to own or access those updates captured by a mobile device’s sensors? Simple. The device manufacturer (e.g. Apple), the network operator transmitting the data (e.g. AT&T), and/or the app you granted access to record the data (e.g. Instagram – automatically geo-tagging your photos for you). Sure, the social network gets a look in if you allow the app to share. But it’s far lower down the chain compared to the app installed on the mobile device. And top of the queue is the device itself. You can connect those dots for yourself. Small wonder there are constant rumours that Facebook and Google are building/buying their own mobile devices. Presumably Microsoft too (well they’ll probably buy Nokia)…

In that context, Instagram is valuable to Facebook way beyond its benefits as a social network. Those location updates originating from Apple and Android devices are a large, accurate and valuable dataset that Facebook now owns.

Related blog posts

Lies, damn lies and statistics – Feb 2012
Thinking in reverse – July 2008
Zillionics change perspective – April 2008

References

Will data monopolies paralyze the Internet? – Forbes, Apr 2012
Just the facts. Yes all of them – New York Times, Mar 2012

Web 3.0 and the Semantic Web

June 3, 2010 / Sharon Richardson / No Comments

I’ve got mixed feelings about the viability of the semantic web but this video is a great compilation of the challenges facing information discovery and possible options. It’s become way easier to create information than to manage it…

Social Media judges the Olympics

March 1, 2010 / Sharon Richardson / No Comments

Techcrunch has an interesting article: How We Hate NBC’s Olympic Coverage: A Statistical Breakdown.

The statistics are coming from a couple of different ‘Sentiment Analysis’ services that track what people are saying about brands online. Twitter Sentiment tracks positive and negative comments on Twitter, updated in real-time (image shown above). Another service, Crimson Hexagon, went further to breakdown into specific categories, discovering only 15% were happily watching NBC’s Winter Olympics coverage (more details are provided in the TechCrunch article) whilst 85% were complaining.

What’s interesting is how easy it has been for these services to gather the data. Crimson Hexagon analysed over 20,000 tweets and 5,700 blog posts and forum comments. Twitter Sentiment is continually updating in real-time, as the tweets are posted. When I grabbed the screenshot above, over 2,500 tweets had been automatically categorised as positive or negative.

The analysis demonstrates just how easy it is to discover what people really think thanks to the Internet. People who take the time to tweet and write blog posts are more likely to be giving raw opinions than a selected audience targeted to respond to a survey. For sure we tend to be more compelled to write when we have something bad to say, so results are almost always going to skew towards the negative. But they are readily available, often for free or little cost, and offer an insight into how products and services could be improved. Sentiment analysis shows how businesses can benefit from getting involved in social media, even if only to listen.

References:

How We Hate NBC’s Olympic Coverage: A Statistical Breakdown (TechCrunch)
Twitter Sentiment (enter a keyword and hit ‘search’ to track a brand)

Related posts:

from atoms to bits and back again

Tag: big data