Hi everyone,

I thought this data project might be of interest to this group generally, and specifically because I've previously mentioned our 'genderisation' methodology here and asked for data sources to inform it.

This week the BFI launched our new BFI Filmography web application and announced the underlying data project.. This has been a five year project within the BFI's Film Forever five year strategy (Lottery funded). Some links below, including an area where you can read a bit more on the project itself and the genderisation work (built on forename-based inference using ONS gendered baby names datasets). 

More detailed pieces will follow there over the next few months, including more technical description of the genderisation methodology (and hopefully some of the Python too) and the web application build process.

The web application itself:
https://filmography.bfi.org.uk/

The Filmography story on the BFI website:
http://www.bfi.org.uk/news-opinion/news-bfi/announcements/bfi-filmography-complete-story-uk-film

The Filmography FAQs:
http://www.bfi.org.uk/archive-collections/bfi-filmography

We also partnered with NESTA to give them access to the raw data, their quantitative research fellow in the Creative Economies team undertook deep data analysis and visualisation, and published the results on the NESTA website:
http://www.nesta.org.uk/blog/women-film-what-does-data-say

That is the first of many collaborative research projects we hope, where we provide access to the raw data and collaborate on research questions, formally or informally. There was hope of publishing as open data with CC-BY-SA but I didn't make that happen. Instead external access to the data will be via organised projects.

At the risk of trumpet blowing, there was quite a bit of media coverage for a data project - much of it focussing on the main agenda in our launch strategy, gender parity in British feature film, but some on the trivial pursuits stuff we loaded to the press release to engage a different audience:
https://news.google.com/news/story?ncl=dnTYGdbbIQipvIMGYmSVDbZN6yKKM&q=bfi+filmography&lr=English&hl=en&sa=X&ved=0ahUKEwjv67C3ubrWAhXBZlAKHXbwA4kQqgIIKjAA&dogfood=no

For those interested in the stack for the web application - as I said we'll be publishing an article at some stage, but for now: it was built for us by Magnetic North in Manchester, it's a React app using ubiquitous D3.js for the dataviz rendering, MongoDB for the data store / querying. Each set of filter parameters is shareable by a persistent URL, and the application offers social media sharing and iframe embedding of every dataviz. All the film and person data is created and stored in our Adlib CMS, and imported to the application's MongoDB store overnight three times a week from our Adlib REST API. It's the same data that powers the BFI Collections Search, the BFI Player, the new BFI Mediatheque on SouthBank, and the soon-launching crowdsourcing platform built for us by SciFabric atop Pybossa - all of those use the Adlib API to retrieve and cache, on a schedule. 

C.O.P.E. innit*

All the best,
Stephen,
Head of Data, British Film Institute

*Create Once Publish Everywhere: http://collectionstrust.org.uk/resource/create-once-publish-everywhere-cope/
**************************************************************** website: http://museumscomputergroup.org.uk/ Twitter: http://www.twitter.com/ukmcg Facebook: http://www.facebook.com/museumscomputergroup [un]subscribe: http://museumscomputergroup.org.uk/email-list/ ****************************************************************