In the VISION Cloud project we are focused on storage concepts. One area that is highly important to understand for this work is how media companies might use IT systems, storage and data in the future. This will actually be part of the media use case, which will start its scheduled work in upcoming weeks.
Understanding needs of media companies
For preparation purposes we already do research, looking for examples and best practices that help us understand what will be really needed.
From a software developing point of view the most helpful insights are gained when editors or journalists describe how they work with data, what problems they experience and how they solve them. These are real world blueprints, that might eventually lead to innovative software.
Example: How the BBC special news team works with data
So, this one article from UK journalist/educator Paul Bradshaw, who publishes the "Online Journalism Blog", is what we are looking for: Bradshaw spoke to Bella Hurrell, who is the Specials Editor with BBC News Online
The team at the BBC is responsible for producing "specials", including interactive multimedia presentations, graphics and other visualizations that might help readers to understand complex issues. As part of this work, data is often the basis. Some datasets might be small, others are very big and usually they are in a disorganized state.
"With data stories that involve thousands of documents we face two challenges. Firstly deciding whether we can provide a platform or tool for people to look at the documents or data. This can be valuable but might involve significant technical resources and may not be worth doing if others are already providing this service. Secondly we need to find the stories and then report them but clearly that can be tricky when there are thousands of documents to examine".
Why media needs a different kind of cloud
Based on research and talks so far, there is an emerging pattern. Data-Journalism is a new way how to use data. Combined with the growing Open Data movement all around the world, there will be need for a new type of storage - simply put - an archive of the future: With highly different data sets that need to be incorporated, often without connection to each other. One day the data might consist of scanned documents, in another case there might be geospatial information or billions of tweets.
Early patterns how to organize work with data
What is becoming clearer is this: The ideal organization for journalist data-teams is hand-in-hand process of different competencies. This is what can be extracted from work done at the pioneering media companies in the data-journalism field like the Guardian
, the New York Times
, Pro Publica
Editors, developers and graphic artists work together very closely, mainly because every new data-driven journalism project
from the one before
. And in order to make a real impact, it is important that the team finds a way to pull a real story from the pile of data it might get. Effectively, using data for journalism calls for a fluid, agile, highly visual
approach to working with software, storage and data.
This, quite clearly, is not covered, nor intended in even the best of today's storage concepts and business analytics software.
Example: Workflows at the New York Times
It seems, that teams in media companies are basically coming to quite similar patterns of organization and workflow right now. Just look into a presentation from Alan McLean of the "New York Times", where he presents some of the principles for data-journalism. The presentation was given at a data-driven journalism event
in Amsterdam in 2010.
A cautious assumption where this might lead to
What does this mean for possible media clouds? Basically, enabling data-journalists to ingest, filter, sort and visualize information is one of the possible directions we will be working on.
In short: A cloud platform that enables and helps with the workflow that is needed to turn data into stories. There is a small (but growing) number of companies that are getting it right.
Here is our current "short-list". There are many omissions, but this is just to guide others that might be interested to examples:
Links:Paul Bradshaw: Bella Hurrell on data journalism and the BBC News Specials Team, Online Journalism Blog (Feb, 18, 2010)
- Guardian Data Blog and Data Store
- New York Times (Multimedia/Data Coverage)
- Document Cloud (funded by the Knight-Foundation, founded by journalists from Pro Publica and the New York Times
- Signiant (commercial company specializing in media distribution)