February 24, 2006

Is Your Web Site Revealing Your Secrets?

By Evan Schuman, Ziff Davis Internet

Opinion: As the Washington Post discovered when a photo's metatag revealed the identity of a confidential source, Web sites can reveal to the public a lot more than its site administrators intend.

One of the more intriguing movies with a technology theme in the last few decades was a 1983 flick called "WarGames," starring Matthew Broderick in his second movie.

Critics of the movie from the IT community at the time said that the film's plot—about a sophisticated war game computer that confused its NORAD masters into thinking that a simulated nuclear attack was real—was unrealistic because every computer has more failsafes than the one in the story.

Maybe, but I saw the film's theme as dramatizing a valid point: Any sophisticated program is going to have capabilities that most of its day-to-day users are unaware of and that ignorance can cause huge problems.

In 2006, current Web capabilities are rapidly growing and few users understand how much data is being collected and potentially shared. That's a lesson learned this week by the Washington Post.

The Post has extensive experience protecting confidential sources. As journalists, we know the clues people look for in our stories to try and identify confidential sources, and we avoid them.

When the newspaper posted pictures to accompany the story, they included metatag info that identified the source's location as a particular small town, making the source's identity easy to guess given all of the details in the story.

When the photo department placed that information on the image for its catalog purposes, didn't it realize it would be world-viewable? How many retail IT execs would have thought of that?

Another good example: A recently published book called SpyChips discussed privacy concerns about RFID. But one tactic that the authors (including someone finalizing a Harvard doctorate program) used was to go to various vendor sites and—so help me—type in the word "confidential." Sure enough, certain sites then displayed confidential documents.

In doing lots of Web searches, I found various internal memos and documents that were archived for a company's internal review and no one realized they'd be discovered by search engines.

Far too many Web managers have faith in what is known as security by obscurity, which essentially means that a piece of content can't be found unless the user knows the exact URL path or finds a link to the document somewhere.

That approach may be fine for a casual site visitor checking out your electronic storefront, but if someone searches for a word included in that document, you're toast.

The reflexive response to security by obscurity is to stick a password on the front page of that document. The problem is that many sites do not properly set up their password security.

Done quickly, the password can block someone who tries to click on that document's front door, but if a search engine's spider hits every other possible combination, it might get through to some of the internal pages.

I saw this recently on a site that had a large number of audio files (it was a post-production Web cast) and a mandatory registration page associated with the link to go to that collection of audio links.

That mandatory reg page was comparable to a password in that it was a script that required a specific response before it would allow anyone through.

The problem was that the site needed to password protect each and every file within that site.

Another scary piece of Web magic: GoogleDesktop. That is an amazing piece of software. When I started using it, it wonderfully archived tons of files and images and E-mails. But I was startled during one search when it found someone's phone number that had only existed in a brief IM exchange I had had weeks earlier and then deleted.

Of course, there's the well-publicized Microsoft Office revision changes, which are often retained, even when invisible. Those documents can only reveal changes, but who the document had been sent to and lots of other great details that few people realize are being transmitted to clients and suppliers and anyone else on their distribution list.

Google's own CEO—Eric Schmidt—learned that lesson the hard way in March when he used Microsoft Office. Specifically, his data costume reveal was when he posted some PowerPoint slides he had used for an analyst presentation. The slides included speaker notes that revealed a theretofore secret plan to replicate a user's entire harddisk on Google servers.

The more sophisticated and automated Web programs get, the more data they'll need and that means the more data it might inadvertently share.

It's something to consider before you send your boss an accidental nastygram, your wife a love letter that originally had someone else's name on it and your customers information revealing your markup.

You do that a few times in a day and WarGames' version of global thermonuclear war will start to look appealing.