Yesterday I attended ‘Data Reveals Stories’: Hoult Session # 2 with mySociety – with the blurb describing it as:
Northern Film & Media host mySociety for our second Hoult Session: a one-day seminar in Newcastle on June 16 about how Britain’s new wealth of open data can be harnessed to reveal hidden stories, tell new ones, inspire practical services and provide a new point of research for film and television.
One of the key reasons for attending is that I had blogged this about how Government data could / should be made available, and mySociety is a key player in helping make that happen (and also some pretty creative things with that data).
The following post is about my impressions, but isn’t a record of the day – I’m doing it more because Ewan asked me to type up the flipchart notes I had taken ;-).
An early quote from Tom Steinberg was that the mission of mySociety is:
How to use the internet most efficiently to improve lives
It was an interesting session, about how the stories around data have been driven by the access to data enabled by the Internet.
It as also mainly a session about how to use public sector data – in that the Public Sector are either duty-bound and / or able to make available their data. In the private sector, data is often a source of revenue or competitive advantage, so is only available through third parties (like NGOs), as Francis Irving explained.
Ewan McIntosh has helpfully tagged some notable websites mentioned in his delicious links, but also getting a namecheck were:
- It if was my Home
- Tin Eye – Reverse Image Search
- Also Analyst’s Notebook
After lunch Tom talked about the following clusters:
- Using site search syntax like “site:bbc.co.uk”
- Also “filetype:xls salmonella site:scotland.co.uk” – to seach for all Excel spreadsheets containing references to salmonella on the Scottish Government’s website
Compiling it yourself
- Crack open a new Excel spreadsheet & start typing!
- Also, Google Docs, Freebase.com, and Surveymonkey
- Apparently ScraperWiki is a great resource for this
- Scenic or not – a website to harness the wisdom of crowds to rate various parts of the UK in terms of how ‘scenic’ it is :-)
Something that struck me is how many of mySociety’s tools are based on games, making it fun for people to participate and contribute.
Afternoon session – my flipchart notes
I took part in the ‘Tools’ break-out session, and (as above) I said to Ewan McIntosh that I’d type the notes:
Flipchart 1 – tools
There were questions around Data:
- Where one could find the datasets that underlie all of the above
- Are there search or query tools to drill into the data sources
- Are there websites which signpost these datasets
- There was mention of Up My Street
- There was also a question mark about how to find out about data sets in particular domains
Flipchart 2 – tools #2
- Discussion around RSS wondered about how a feed could be re-published and stakeholders could then comment on what they thought about items in this feed.
- There were also questions around Geocoding or tagging content with metadata
- More questions around where to find crime information / data
- We’d be finding out information about retrieval, but there was a question about how to publish data sets and / or where to find a repository for raw data
Flipchart 3 – ‘Data set compendium’
- Following on from the thoughts captured in the previous charts, the group brought together a mini-requirement of what would be helpful (i.e. this repository for raw data) – which I called the ‘Data set compendium’, for want of a better name! ;$
- One of the group was interested in “What problems are we trying to solve”
- The group was looking for a ‘meta-database’ of public data sources
- Ideally it would have ratings & reviews of each data source, with the ability to identify the source of the data, and allow users to annotate / describe the source
- Another useful field would capture who was using the dataset, and what for?
- If the system could allow datasets to be clustered in terms of similar topics, and inter-relationships between datasets established / shown, that was seen as desirable
- Francis said that something along these lines was available at Comprehensive Knowledge Archive Network:
CKAN is the Comprehensive Knowledge Archive Network, a registry of open knowledge packages and projects (and a few closed ones).
CKAN makes it easy to find, share and reuse open content and data, especially in ways that are machine automatable.
Flipchart 4 – Detailed tools
- There was a question about where to find actuarial data [perhaps via the ONS, said Francis]
- Tools for ‘manipulation’ and / or visualisation
- List from Francis:
- He recommended the programming language, Python
- Apparently Many Eyes has an embed feature
- Freebase is useful for cleaning up data
- Excel, Google Docs & MS Access all good
- Yahoo Pipes
- RSS feed into Google maps – e.g. the RSS from Fix my Street & this example I’ve just made for Chester-le-Street w00t!
- EduBase – “The up to date database of educational establishments across England and Wales.”
Flipchart 5 – Upload
Tools / sites for helping process data:
- Elance “provides a simple and cost effective way to hire and manage your team online.”
- Mechanical Turk
- Journalisted – “Journalisted is an independent, not-for-profit website built to make it easier for you, the public, to find out more about journalists and what they write about.”
- Dr Foster – “Dr Foster is the UK’s leading provider of comparative information on health and social care services.”
- Crime maps – I googled this and found CrimeMapper
- Digital Dockyard – “A network for North East individuals and companies to share ideas, find events and discuss the challenges of digital media innovation.”
- Thanks to NorthernNet and Northern Film & Media for, respectively, funding support and hosting the event.
Um, that’s about your lot, or else this post will never go live…! ;-)