I recently read a book on the adoption of open data policies in cities around the world. It’s called Beyond Transparency (publicly available at this link) and it consists of case studies showing the successes and obstacles of the Open Data policies in the early 2010s. As the title hints at, providing accessible and free datasets is a move towards a more transparent government, but not only. The book talks about how this data spurred innovation, improved government efficiency and encouraged new civic habits, such as more citizen participation. Data professionals, including engineers and data scientists, also formed their own solutions around these new datasets, building better models and apps. For some examples of these new civic tech ecosystems, check out the Building a Smarter Chicago chapter and the analytics work that arose from websites like Data SF or Chicago Data Portal! Another personal favorite chapter: The Data-Driven City, on how collecting 311 calls allows NYC to model emergency services and resource allocation.

Since the 2010s, hundreds of countries have passed their own open data laws. The Global Data Barometer (license: Creative Commons Attribution 4.0) measures the state of open data in 109 of them, answering “To what extent are countries managing data for the public good?” The study uses a mix of quantitative indicators and qualitative descriptions to provide a clear picture of how the world is doing in terms of open data. This is an incredibly thorough work so I think it’s interesting to spend some time exploring part of it with this story.

Quick note: for the story, I’m using Observable notebooks (JS-based) to do the visualizations and Jupyter for the data wrangling; links to both at the very end.

Let’s get started!

Index Overview: how do countries rank?

The overall countries’ index ranges from 0 (non-existent) to 100 (exhibiting best practices).

Open Data Overall Index (image by author)

The chart above shows quite a bit of spread. The highest scoring country (70) is the United States and the lowest scoring (10) is Turkmenistan, whereas the mean score is 38.51. This index is generated by examining each country’s open data practices across 4 fronts or “pillars”: governance, capabilities, availability and use and impact. For each of these pillars, the countries provide information on existence of a particular element (e.g. a data protection framework), elements (quality related features and open data features), and extent (the limitations and applicability of a specific framework across a country). The study also tracks secondary indictors for each pillar, each of which was scored out of 100. Therefore, an overall index score of 100 would represent a type of “normative ideal” across all these primary and secondary indicators.

Countries with similar scores still have huge variance across indicators

Let’s focus on the countries whose index is around average (35 to 45). There’s 20 of them, including Albania and Kosovo, as well as countries from all over the world: Jamaica, Kazakhstan, Paraguay, Philippines, Peru, Thailand and South Africa among others.

Despite overall index similarity, when looking into their mean scores across modules, their performance is not nearly as consistent. The Barometer compiles thematic or module scores in addition to pillars, and the 7 modules look into open data practices in the following areas: Health and COVID-19, Land, Public Finance, Procurement, Climate Action, Political Integrity and Company Information. Here’s a view at the Climate Action, Political Integrity and Procurement modules for these 20 countries:

Climate Action (image by author)
Political Integrity (image by author)
Procurement (image by author)

Many countries have inconsistencies:

  • Jamaica has the highest Climate Action but only a score on the lower end in Procurement. A deeper look into climate indicators reveals that Jamaica provides environmental data through its Statistical Institute, including metrics about “rainfall, sunshine, pollution incidents, greenhouse gas, protected forest area, sea level” as well as having “no evidence of data gaps”. In Procurement, however, there’s no public information at the planning stage and the data is low quality: it doesn’t contain names/identifiers for companies awarded contracts or information on “spending against the contract”.
  • In Albania, detailed procurement information is available online (link here), however contract implementations are not covered and the data is only partially machine-readable. Political financing data is also available, but it doesn’t contain income data for parties and candidates or historical tracking information.

Making progress on all of these modules leads to a more transparent and efficient government. For instance, having better procurement data allows citizens to analyze where and how money gets spent in all stages of a project or push for more equitable allocations. Similarly, having transparent lobbying data and asset declarations allows for more accountability for political integrity. Climate information allows the public to stay informed on aspects such as biodiversity, emissions or vulnerabilities.

Governance frameworks exist but are largely fragmented

Governance is one of major pillars of the dataset. The questions it answers involve assessing the state of policies and frameworks guiding how data is protected and managed. The research on governance tracks a couple of indicators: “Open data policy” being one of them, and “data protection”, “data sharing frameworks”, “data management” some of the others. Here’s how countries score on these governance indicators:

Governance Indicators across 109 countries (image by author)

A lot of countries are performing moderately well on metrics like Data Protection, Open Data Policy or Data Management. Around 92% of the surveyed policies have a common definition of open data, 72% of countries have some type of data management framework in place, and 90% of them offer data protection regulations (report here). Many of the countries offer partial or complete regulations on issues of data consent, rights of redress and access or correction.

While forms of data governance frameworks exist, Global Open Data shows that countries still have severe limitations. For instance, only 24% of frameworks address issues of location information and only 31% address algorithmic decision making (also on the report). Most of these countries are in Europe and North America: these two regions comprise 17 out of the 23 countries answering “Yes” to “Frameworks explicitly cover the protection of location-related data”, and 20 of the 31 whose “framework addresses algorithmic decision making”.

The last two indicators, Accessibility and Language coverage, evaluate the regulations for ensuring the data is accessible to people with disabilities and available in each country’s official language(s). The later is particularly important for countries with many such languages, but nevertheless also fractured: 13 of the 109 countries achieve a score of 100 on this category (possessing a framework with the force of law).

Handling the COVID-19 Response was a challenge but also an opportunity for data governance

The COVID-19 pandemic tested many of these data systems, especially those at the local level. The study measures data capacities not only on the availability of vaccination data, but also on real-time healthcare data (e.g. ICU beds) and vital statistics. Vital statistics include birth and mortality information, historical spans and how locally available this data is within a country. Here’s a heat map of how countries are doing:

Health and COVID-19 Module Scores (image by author)

Vaccination data was largely available in most of the countries in the dataset, however not without its issues. Only about 50% of available datasets were broken down by age, and around 33% were disaggregated by sex (report statistics). Furthermore, real-time healthcare data was only available in around 50% of countries and even fewer published information on the number of available beds. For some of these countries, this type of data was only made available for the first time during the pandemic, which allows for countries to start building on it for better healthcare reporting in the future.

And how easy was it for users to explore the data? A deeper look reveals that 61 out of the 109 countries didn’t offer official open tools that allowed citizens to access Vital Statistics data. Similarly, 63 out of the 109 didn’t provide official and accessible COVID-19 vaccination data. 57 countries also did not offer machine-readable data (such as CSV), which is important for easy distribution and reproduction.

What’s Next For the Future of Open Data? (and resources)

A major takeaway of the report was the relatively lacking open data environments at global level (recall the average overall index: 38/100). As we already saw in this story, having fully formed laws around open data is simply essential. In addition to the research being done by Global Data Barometer, there’s other repos that track new legislation around open data issues around the world. The State of Open Data by The Gov Lab is one of them. So, take a look at it for laws by sector or type of collaboration!

Broadly, though, the Global Data Barometer report illuminated the practical challenges in adopting these laws, including data gaps or inaccessible or unavailable data. One striking example we looked at was the publication and management of health data, especially in emergency situations such as COVID-19 where having timely information is crucial. This example, however, also showed how new challenges can also spur data to promote transparency and allow citizens to stay informed which is very promising! Overall, the report points at very specific areas for each country to focus on, offering context-specific strategies for better data collection while still providing a big picture view of the current challenges with open data.

Here are the notebooks (Jupyter and Observable),

Thanks for reading!

