The expanding Zooniverse

Kevin Schawinski pours over his 50,000th galaxy image of the week. It had been a monumental effort to reach this figure, and he still had another 950,000 to go. “At that point he went and told his supervisor that he wasn’t going to do this anymore, and they needed to think of a better way to do it.” Grant Miller of Zooniverse explained to me. “So Chris, who was actually a fresh new post-doc in Oxford working on cosmology, had this idea that they could build a website and get people to classify the galaxies after a short tutorial on how to do it. And that’s where it all started. Chris is still working in Oxford now with a whole Zooniverse grown out of it and he has not even done the post-doc he was brought in to work on.”

These are the humble origins of the Zooniverse. A phD student with too much data and a post-doc with a simple idea: outsource the work to the public. “The idea behind crowdsourcing of data analysis is just using human brains as a very big computer, a very big, fast computer.” Galaxy Zoo began in 2007 to process the remaining 950,000 images of galaxy morphologies taken by the Sloan Digital Sky Survey. After the first 39 hours that Galaxy Zoo had been running the users were classifying as many images in one hour that Kevin was able to do in a week.

Zooniverse REER talk

We need to talk about Kevin: 39 hours after the launch of Galaxy Zoo online users were analysing as much data in one hour as Kevin was able to do in a week. (Image credit: Zooniverse REER talk)

From that one project Zooniverse now has 23 active projects analysing everything from old ship logs to the Sun’s magnetic activity. “But we’ve grown away from the exploitation as it were of the human brain, to saying – unlike a computer – these people are fascinated by the data that they’re analysing. They are scientists, they don’t have a degree in the science, or get paid for doing it, but they are still scientists. We call them citizen scientists. It’s giving science back to the masses, if you want to make it sound like a revolution.”

Crowdsourcing scientific information is nothing new. The Christmas Bird Count is in its 114th year and involves tens of thousands of participants annually. And like the Christmas Bird Count, Zooniverse projects require no expert knowledge. “I see it as a huge shift in the paradigm and I see it as revolutionary as far as science is concerned. But, if you only take it back a couple of hundred years science was done by amateurs. People were absolutely obsessed with measuring things and their idea of science; and they were amateurs, they weren’t getting paid for it. So it’s going back to this amateur ‘for the love of it’ science, which was the way for hundreds of years before we developed this idea of ‘professional science’.”

Why is Grant so sure that Zooniverse volunteers are doing it for the love of science? In one study of user behaviour on Snapshot Serengeti an unusual trend was spotted. In this project users look at photographs taken by remote cameras attached to motion sensors and record the species of animal that can be seen in each photo. Some photos are triggered by animals moving in front of the sensor, but sometimes the sensor is triggered by grass blowing in the wind. “If you were shown grass at the start then you would stay longer, and if you were shown very complex, fantastic images of zebras and lions and elephants and everything then you would do them and then you would leave.”

One would have thought that stunning photos with lots of animals in would be more rewarding for the user than photos of grass. So why do people stay on the project longer if they are shown more images of grass? Grant explains that an image with grass in can be categorized in seconds by the button ‘Nothing Here’. Whereas an image with three lions hunting 20 zebras with an elephant in the background, although more interesting, would take far longer. Ultimately both these images are contributing the same amount to science. And that’s what people are interested in.

“If you classify an image with grass in it, that’s equal to one image that has 20 zebras, three lions and an elephant, you’ve contributed the same – this one unit of classification’s worth, and you can do a grass one in a second, you’re getting that feedback quicker, of saying ‘right, you’ve done science, here’s the next one. Do it again, do it again.’”

To test this, after the Andromeda project had processed enough data they put up a notification to say ‘We’ve processed all the data we need, you can still continue to classify but your classifications won’t contributed to the research anymore’. “When we put that message up, it went from there being thousands of classifications per hour to 10s or maybe fewer than that. Everyone just disappeared as soon as that message came up. This says to me they’re not enjoying the task intrinsically. The only reason they were taking part in that was because they were contributing to the science. And once they know that their analysis isn’t contributing to our science anymore then they left to go and do something else.”

There are some citizen science projects that try to appeal intrinsically by including a game element. This is something that Zooniverse has to date tried to avoid. “There are some citizen science projects that gamify, like Fold It and EyeWire, and recently Cancer Research UK released a game called Genes in Space, which was relatively successful. But we tend to think you can build a really good game or you can build a really good science project, and there’s this uncanny valley in between where you try to build both and it just comes out kind of rubbish.”

“We saw that with Genes in Space a lot of the feedback that we read said that ‘this is a fantastic project’ – obviously because this is for analysing cancer data – ‘this is an amazing project, I can’t believe that I’m contributing to cancer research, but the game’s not great’. People were always really positive about the goal, but not always positive about the game. So what’s the point in having the game there?”

If you’re dealing with amateurs how do you ensure that the analysis is correct? Zooniverse projects use a consensus model, but sometimes malicious users can still find a way through. “We do have a person on the Galaxy Zoo project that seems to be malicious from a scientific point of view. The user doesn’t believe in a certain type of galaxy called a merger, where two galaxies are colliding and forming into one galaxy. They don’t believe that this is a physical process that can exist. So they go through and everything that is marked as a merger they mark as something else, not as a merger. And then they go into the discussion area and comment on all galaxies that are marked as mergers and saying that they’re not. So that’s the top level of maliciousness because they’re actually trying to convince other users that what they’ve marked is wrong. And even that one of the fundamental laws of understanding how galaxies form and evolve is wrong.”

“You would be surprised how involved some people can get in the projects, and how militant and how controlling and really serious. Which is great for us if it’s a user that is really into the project and wants it to succeed and to help other volunteers and users on it then that’s fantastic for someone to put in that kind of energy and time. But a few times it can go the other way, where they’ll scare people off with being too militant and involved.”

“The point behind Zooniverse projects is that we don’t actually know the answer, so to say that they’re wrong isn’t necessarily correct. People who always disagree tend to be wrong, because the crowd tends to be right.”

Citizen science projects like these are about leveling the playing field, democratising science and returning to science’s amateur roots. But if projects are created by scientists, made by developers and then worked on by amateurs is this really a democratic process? True democracy would require input from all angles on what projects are made, and what areas are studied, regardless of your academic background. This is exactly what Zooniverse are looking to do.

We were recently given $2 million from Google as an award. What we’re going to do is build a platform so that anyone can build their own citizen science project. We get throughout the course of the year about 50 proposals from different science groups but we can only build about 5. So 90% of the people who propose to us, we don’t build their project. The way we’re envisioning it is like you would set up a WordPress blog, but instead it’s for setting up a citizen science project. So a group could just come on and they could write the title of their project, they could quickly say what the task is and then they could plug in their data and can put it live. So we’ll suddenly go from Zooniverse hosting 20 projects to hosting 200 projects.”

“And you don’t even need to be a scientist. We’re not going to say okay you need to come from a scientific institute to build one, it can be anyone. You just need data. You can go out with your iPhone and take a million pictures of trees and put that up and make your own scientific project. And that’s one of the really cool aspects we’re looking forward to; people who are non-scientists creating their own citizen science projects.”

A version of this article first appeared on Refractive Index (31/07/2014)
Image Credit: NASA (via Flickr).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.