Lessons from Facebook’s experiments

[Update] Adding links and references as they bubble up on this topic…

There has been a range of news recently about Facebook’s latest approach to users’ privacy.

Wired has an article – Facebook’s Gone Rogue; It’s Time for an Open Alternative – explaining the concern being raised by many. By default, Facebook is now connecting and publishing every piece of data you choose to share on the platform. You may think you are only sharing your photos with your friends and family, but you are granting permission for Facebook to share your content with everyone and anyone on the Internet.

Robert Scoble has an article – Much ado about privacy on Facebook – with the counter argument. That we’re kidding ourselves if we ever thought anything we share on a computer, especially one connected to a network, is private. Facebook is just exploiting that which others have exploited less visibly (or easily – and that’s the key difference) in the past, and in the process helping people find what they need in ways Google never can.

Robert has a point. However the picture is a little more complicated. Not everyone wants to share their entire life online with everyone else and every organisation on the planet. Some people have very good and legitimate reasons not to. You could argue that such people simply shouldn’t be on Facebook. But in the past, it wasn’t a problem – the default behaviour in Facebook’s privacy policy was that information would only be shared amongst your network, which could be as large or small as you choose it to be. And your content stayed within the walls of Facebook unless you chose to opt-in to third party applications. That has now all changed and Facebook does not make deleting anything easy. Even if you choose to leave, if your ‘friends’ have already shared your content or tagged their own content with your name then your identity will continue to persist without you. And if you choose to stay, for certain content it is now all or nothing – if you try to opt-out of sharing with everyone then it will be removed from your profile and friends will no longer see it either.

Facebook is transitioning from a site for building social networks between friends to being one giant social network. A new mesh of connected personalised data is being created that has never before been possible. And that mesh is being shared with whatever organisations Facebook chooses to do business with. At the same time as we are seeing new tools arise that can mine massive amounts of data for patterns and profiling… We don’t yet know what all the implications – good and bad – will be. And whilst Robert highlights the good, history tells us there will also be bad. This is a live experiment that over 400 million people (and that’s just the active users) unknowingly volunteered to participate in.

Related Blog Posts


Other posts of interest on this topic:

All web sites great and small

Spotted a depressing article on Techmeme on Friday – Hackers turn Google into a vulnerability scanner (Infoworld). I suppose it was inevitable that this would happen.

Hacking group Cult of the Dead Cow (CDC) have kindly released a tool that uses Google to automatically scour web sites for sensitive information. Because it is automatic, it means that new and novice web sites are no longer protected by relative anonymity. If you are storing information anywhere in ‘the cloud’ and are worried about it being kept private and secure, the best approach is to run the tool for yourself and find out if your site needs fixing.

Whether Google likes it or not, they are as good as a monopoly on the Internet. There isn’t the proprietary lock-in achieved by a certain other technology company. But Google is the one location that most* people go to in search of stuff and therefore the one location most web sites aim to be discovered by. The trouble with technology monopolies is the lack of diversity. It’s what makes Microsoft software so vulnerable. Give a cold to one computer and you can pass it on to them all. Now the Internet is the focus and Google is the target to exploit. The CDC tool doesn’t care if your web site is on page 1 or page 1,000,001 of Google’s search results. It can and will find you (cue Terminator music).

The ultimate irony – the tool takes advantage of Google’s index, has been written using Microsoft .NET and is licensed as free open source… it’s not often you see those three areas come together as a single solution. Pity it had to be this one.

*According to comScore World Metrix, Google hosted 62.4% of web searches in December 2007. Next nearest rival was Yahoo with 12.8%, trailed by Microsoft with 2.9%

I know what you bought last summer

The rumblings over Facebook banning Robert Scoble have opened up all sorts of conversations about who owns or controls your data – see also: Data as currency. One issue that has been highlighted is how easy it is for people to scrape enough information about you to form an identity. Scoble was running an automated script to pull out contact details by the thousand.

Yesterday, another related article cropped up on Techmeme – Sears Exposes Customer Purchase History. It appears that Sears added a feature on their web site where you could look up your purchase history. All you had to do was enter your name, address and telephone number. Trouble is, whilst you had to have an account and login to the site, you could then enter anybody’s name, address and telephone number to view their purchases. Somebody forgot to restrict access to only purchases associated with the authenticated user. Since the news became public, Sears have disabled the feature to sort it out.

But it does raise yet another warning about how easy it is for companies to accidentally make too much information public, be it downloading database records to a CD or making those records available online. Mash-up poor (or missing) security controls with automated scripts to gather contact details and our criminal friends won’t need to go phishing for dinner.

Controlling the Data Cloud

Nicholas Carr wrote another post about the emerging world of giant data centres (yup, English-English spelling on this post as opposed to English-US) – Google’s Cloud – describing how Google wants to host all of our data and our applications. Microsoft is pursuing a similar strategy through its Live plans.

Whilst there are lots of advantages in going down the hosted route and handing over big chunks of your I.T. to a hosted data centre, plenty of large organisations are reluctant to let go of their data completely. Not just because of their own concerns about security and privacy (as long as employees can send unencrypted emails outside of the company – or just use their vocal chords – there is only so much you can do on that road) but also because of government and industry regulations that dictate responsibilities regarding data ownership. Some governments simply do not allow companies to host their data overseas. Whilst business and society may becoming increasingly globalised, governments still operate very much on a localised level.

But never mind all the government-related challenges, Salesforce has just demonstrated the much simpler concern voiced by many organisations, as reported in the Washington Post – Salesforce.com Acknowledges Data Loss. It appears that an employee fell foul of a phishing scam and accidentally handed over the keys to their customer database. Oops!

Security challenges in Web 2.0

An interesting blog post has highlighted how Gmail accounts can be hacked – Google Email Hijack Technique. Aside from the issue that it appears quite easy for someone/thing who knows what they are doing to start snooping on your email (more than slightly worrying), the blog post highlights a new security challenge for anyone beginning to rely on hosting data in ‘the cloud’ – i.e. stored on remote data centres and accessed using online services. Think Gmail, Flickr, YouTube, Facebook, Office Live, MySpace, LiveJournal, SalesForce

When viruses first appeared, the primary method of spread was through infected disks. People had a habit of leaving floppy disks in computers. When the computer was next switched on, a virus would copy across from the floppy disk (way back when, the floppy disk drive was the first item read when your computer started up and the most common form of network for file sharing). Your computer would start to behave oddly as files became corrupted and you lost all your data. People, through training, threats and learning the hard way through experience, began to get better at not leaving disks inserted in computers when they switched off. But it didn’t matter because the threat changed…

Along came email and networks. New ways of hacking accounts, crashing computers and corrupting data arose that no longer relied on a floppy disk to spread the havoc. And new challenges appeared – spam overwhelming inboxes, phishing scams persuading people to willingly hand over bank details. Whilst some attacks were purely web-based (fake sites pretending to be your friendly bank), the majority of attacks still focused on taking control of your computer and doing bad stuff with it. But having a computer crash has become less of a worry as more data is being uploaded onto the web. Our need to have our data available regardless of the device we happen to be using means our devices are more resistent to damage. If your computer gets hacked, wipe it and rebuild it, then re-sync with your online services. And so the threat changes again…

The Gmail exploit doesn’t care about your computer, or your mobile phone or whatever device you choose to use. It lives in ‘the cloud’, hacking directly into the online services that are hosting your data. If Gmail gets hacked, what do you do? You can’t just format and rebuild, as has worked in the past with computers. You don’t control the service or the computers where your data is stored. Instead, you have to trust Google (or whichever service provider you happen to be using) to fix the issue. It’s a different dynamic and one that will need to be considered by any organisation planning to switch from local servers to fully hosted services.

Technorati tags: Web 2.0; Enterprise 2.0; Gmail

Who controls your data

There is a bit of a furore going on over a piece of code being leaked to the web that enables you to crack HD-DVDs. However, one of the blog posts/news articles includes a snippet of information that I am more interested in, because it highlights a big flaw in the strategy for moving your data into the Internet cloud. Snippet from a blog on Wired, documenting a takedown notice from Google to someone using their Google Notebook application (bold highlighting is mine):

… Google has been notified, according to the terms of the Digital Millennium Copyright Act (DMCA), that content in your notebook Google Notebook Entry allegedly infringes upon the copyrights of others. The particular section of your notebook in question is the section covering www.digg.com/users/entangledstate/news/dugg

…. If you do not do this within the next 3 days (by 4/30/07), we will be forced to remove your entire notebook. If we did not do so, we would be subject to aclaim of copyright infringement, regardless of its merits. We can reinstate this content into your blog upon receipt of a counter notification pursuant to sections 512(g)(2) and (3)of the DMCA…

Back in March, I wrote a post – Google and Microsoft looking alike – talking about Google’s strategy for getting us to use their online services for storing our data. If they are happy to act as big brother on behalf of people who use the DCMA as an easy form of censorship, will we be comfortable to hand over the keys to our information?

Take a simple scenario. I use Gmail for email. Someone sends me an email containing content that might infringe copyright. Google receives a notification from the copyright owner and issues notices similar to the one above with 3 days to comply. I happen to be on holiday and don’t check my email, so have not even read the allegeded offending email, let alone seen the takedown notice. When I return to work, my entire Gmail account has been deleted. What if I ran my entire business using Google services? Would they all be deleted too? Hmmm…

I last blogged about the DMCA in January 2006 – Post and be damned. The NewScientist magazine had published an article examining the use of the DMCA as a form of censorship. One study found that 47% of takedown notices concerned material that would likely have been deemed fair use. However, the DMCA enables content owners to issue takedown notices without having to go to court, placing the onus on the individual to legally challenge them. Targeting the Internet Service Providers (ISPs) has proven effective – they will simply remove the content unless the individual web site owner is prepared to finance a legal challenge to the notice. Picking on Google (and any other player in the web software/services playground) makes it even easier. Google can simply shrug and say ‘we have to do this or else we would be subject to a claim’. But the impact on the individual or organisation targeted is now even bigger. You don’t just lose your web site, you could lose your entire ability to do business if you rely on web-based services…

Security versus Control

Microsoft has a (relatively new and not well known) technology called Rights Management Services (RMS). When used with Office 2003, it provides the ability to apply rights to individual documents and emails, enabling an author to control access and distribution. For example, if you wanted to send out an email containing sensitive data, and did not want any recipient to forward the email on to other people, you click a button and, hey presto, the email is sent with certain features unavailable. Recipients can not forward the email, print it, cut/copy & paste it and if they reply to the email, the original message is removed (they can’t even open it to read the email if they are not on the approved recipients list). Another example: if you have a document containing time-sensitive content, such as a price list, you might want to set an expiry date. Beyond the expiry date, the document can no longer be opened – this could prevent people from accidentally using an out-of-date price sheet when selling products. If you have a document you want to collaborate on with only a limited group of people, you can restrict who has the right to view, edit and print the document.

This ability is sometimes called document security, but that description is wrong and can be misleading. The accurate definition is controlling distribution of content. It’s a subtle but important difference. When a document has rights applied to it using RMS, the rights (lets call them ‘a lock’) live with the document. When someone tries to access the document, they will be challenged – the appropriate certificate (let’s call it a ‘key’) is required before the document can be opened. However, because the rights live with the document, and the document is allowed to travel outside the boundaries of a company’s own IT systems, the potential will always exist for someone, with suitable tools and patience, to crack open the document without a key. It’s just like a safety deposit box. You put items into a safety deposit box (locked) to control who has access to those items (the key holders). However, if you decide to leave the safety deposit box in the park, someone is going to pick it up and, eventually, they will get the box open by fair means or foul. That’s why you store the safety deposit box in a vault. The vault is the security layer. Yes, vaults do occasionally get broken into. But it’s a lot harder to do than taking that safety deposit box home and working on it in your own time, and when it happens, you know it has happened. The big hole in the wall and the people wearing balaclavas are a bit of a give away.

The Rights Management Service is a useful tool when you have a need to control distribution of content. It is not unbreakable – you can’t stop someone using a camera to take a photo of the document whilst it is displayed on their monitor screen – but it is a lot lot better than no restrictions at all when handling sensitive content, and is certainly better than traditional methods, such as sealed and recorded delivery of physical copies of the documents. If you want document security, you need to consider the vault – the store where the document will reside – and you need to consider the implications of allowing the document to be removed from that vault, from a security perspective.