Do You Turk? Use Mechanical Turk to Jumpstart Semantic Sites

One of the big challenges with social websites is getting them off the ground in the first place. It’s not Kevin Costner’s Field of Dreams – just because you built it, does not necessarily mean they will come.

And, there’s nothing sadder than a sparsely inhabited social space. It’s like when the conference room is way too big for the number of attendees. You feel alone, even when there’s a crowd.

So, when launching a social space, you must have a strategy in place to get people to generate content quickly. You have to get past The Dip, and get to the other side – where your community is creating that content without you pushing them.

Part of the equation is quality – your biggest advocates need to have a vocal presence. But, part is also quantity – and that’s where Amazon’s Mechanical Turk can help.

Mechanical Turk is a marketplace for small, discrete tasks that are done by human workers. The name is a take-off on The Turk, an 18th century fake chess-playing machine. The inside of The Turk was tricked out with fancy-looking mechanical parts to make people think that it was an automated machine. But, in reality, an expert chess player was hidden inside the box. So, the human-controlled system appeared to be the magic of computers.

Amazon’s Mechanical Turk works in much the same way. The buyer creates a Human Intelligence Task – a HIT – that asks people to do a discrete task, like tag photos. This distributed task can be done by thousands of workers at one time, allowing you to get thousands of pictures tagged within just a few minutes. This task is not something that machines can actually do because image processing software is not sophisticated enough to interpret images for semantic meaning.

In a short period of time, you can get thousands of individual confirmations or opinions of a large data set.

The uses for a new social site should be obvious. You can get real humans to create seed content very cheaply and quickly. Then, when your target audience arrives, they’ll see a vibrant community that becomes an attractive place for them to spend time.

And, the cost is very attractive. As the buyer, you set the price for what you’re willing to pay for each HIT. If it’s very simple, you can offer as little as $0.01. If it’s quite complicated (say, asking someone to test a website redesign for user acceptance, and make comments on video through their webcam), you may pay $5.00. HITs are designed to be small, discrete items, which allows you concrete control over whether a task was accomplished, and gives the worker flexibility to work for a few minutes, or a few hours.

You have the ability to control for quality, too. First, it’s important to note that (PDF) 75% of Turkers (Turk workers) are from the US, and more than 70% have bachelor’s degrees. Plus, you have the ability to create a qualifications test for Turkers to pass before they can work on your project. So, you can control quality to a high degree.

Any data-driven application could potentially use Mechanical Turk to enhance its database. If you want to expand your existing data into a new category, Mechanical Turk can help you put some meat on the bones of a category quickly.

As we move into the semantic web more, Mechanical Turk has even more applications. Want real human analysis of the Twittersphere? Push an RSS feed of a Twitter search into Mechanical Turk, then crowdsource sentiment and theme analysis. You can keep up with the commentary in real time, and index tens of thousands of posts for just a few hundred dollars.

In order to take advantage of this resource, you will need to plan a workflow of discrete tasks, and use the SDK to create an automated system that flows information through in a predictable way. But, what you get is instant scalability. Need a workforce of 10,000 people to knock through 400,000 data points? Done!

More and more social advantage will come from having large data sets and making semantic meaning out of them. Mechanical Turk can help on both fronts.