When Bad Data Happens to Good Hackers

Open government data (OGD) is the main platform for emerging civic technology applications developed to facilitate civic improvement. Josh Tauberer, one of the founders of civic tech company PopVox, wrote that OGD “is a sort of civic capital, a raw material that can be transformed like a diamond in the rough into something far different and much more powerful.” Civic Commons provides an entire “marketplace” of applications built on OGD and this data is absolutely vital to projects like the Civic Data Challenge, which seeks to turn “the raw data of ‘civic health’ into beautiful, useful applications and visualizations, enabling communities to be better understood and made to thrive.” 

Applications built on OGD are perhaps the most exciting and potentially transformative elements of the Gov 2.0 movement, allowing government to directly utilize untapped citizen resources. But since these applications are built on top of government datasets, the quality of those datasets is vital to current and future generations of civic applications. Good data ensures the continued health of the entire Gov 2.0 movement.

The Current State of US OGD Quality

Image by opensourceway on Flickr.

Josh Taberer, in his book, defined data quality as “whether the data has an acceptable level of precision and accuracy for a particular purpose within an acceptable cost.” Precision is “the depth of knowledge encoded by the data” and accuracy is “the likelihood that the information extracted from the data is correct.”

Tom Lee, the Director of Sunlight Labs, said in a Skype interview that data quality varies greatly across government agencies and pointed to Sunlight’s own Clearspending project, which tracks the accuracy of the federal website USASpending.Gov, which includes numbers more than a trillion dollars off of Sunlight’s calculations. Clearspending measures data “usefulness” by “consistency, completeness, and timeliness.” While timeliness has improved over years past, consistency, and completeness are still lacking.

While there have been a number of amazing civic apps built from OGD, not everyone recognizes the utility of publishing government data, and there are several large challenges to greater quality and availability. One problem is government’s mindset about sharing “their” data. Information is power and many government departments are reluctant to relinquish power by releasing datasets that are locked up under their control. This fear of letting data out is echoed by a fear of letting citizens in by allowing them to access data on government performance. It is almost if as some feel that government transparency equals a sort of audit that is meant to catch government in the act, when even those who actually audit government departments contend that analyzing government performance is about improving performance and not about punishment.

Transparency Camp wall photo by <a target="_blank" href="http://www.flickr.com/photos/ellemccann/">Laurenellen McCann</a> on Flickr.
Transparency Camp wall photo by Laurenellen McCann on Flickr.

In a Transparency Camp session, Javaun Moradi and Alex Howard pointed out that governments may not release data when doing so might have negative political implications for them, such as Japan’s reluctance to release radiation data during the Fukushima disaster. In that example the government did not want to release the data without knowing if it would make them vulnerable to criticism, so they just held onto it and risked the lives of citizens trying to evacuate in the proper direction.

Another obstacle is a lack of resources in creating datasets and publishing them in usable electronic formats. In his paper examining transit data openness, Roland Cole points out “the agency itself may not have the in-house capacity in terms of technological expertise and assistance to maintain their data in open formats, and may not have the budget to outsource for such services.” Cole adds that agencies may also be reluctant to provide their data for free when they are “receiving revenue from providing its data on a less than open basis.”

There is a risk that a focus on accuracy can erode timeliness. In many cases departments have multiple layers of quality controls that are, in theory, designed to make data as accurate as possible but, in practice, can slow down the publication process so much that the data is unusable. But as demonstrated with the Fukushima disaster cited by Moradi and Howard, transparency as well as timeliness is often absolutely vital to the usefulness of data.

The Outlook for OGD

Photo from Transparency Camp by <a target="_blank" href="http://www.flickr.com/photos/stereogab/">stereogab</a> on Flickr.
Photo from Transparency Camp by stereogab on Flickr.

Despite serious challenges, the road ahead for OGD looks positive. Lee said government data is improving as a whole, as increasing number of people entering government realize that releasing data can save time and money for their department. According to Lee, agencies mandated to gather and publish data, like the Congressional Budget Office, Census, and Bureau of Labor Statistics, are very good about ensuring that they release accurate data and work with developers to educate and update their practices. Agencies where releasing data is not part of their chief mission sometimes fail to put in the same type of dedicated effort to ensure that their data is accurate and accessible.

In a Skype interview, Code for America Fellow Jim Craner said that data quality is improving rapidly and has “moved light years” beyond where it was in years past. He points to Data.gov, a federal data portal, as a key reason for OGD improvement. When the federal government took “a strong lead” on OGD, it produced a trickle-down effect to local and state governments, according to Craner. Josh Tauberer echoed these sentiments in an email and added that:

“The cause of these improvements is a cultural change that comes with leadership — especially the federal Open Government Directive — and also to an extent with personnel change. For instance, Chicago has an amazing new-ish CTO, John Tolva, who sees open gov data as a part of the core mission of the city to spur innovation and to foster trust in government. Another factor is simply the maturation of technology and processes for sharing data. And the community in the private and nonprofit sectors are getting a better understanding of how government works so that we can work as more effective partners with governments on promoting open gov data.”

Transparency Camp #VIPHack photo by <a target="_blank" href="http://www.flickr.com/photos/ellemccann/">Laurenellen McCann</a> on Flickr.
Transparency Camp #VIPHack photo by Laurenellen McCann on Flickr.

Hope for continued change can be seen in NYC’s new open data law, which co-founder of Civic Commons Phil Ashlock named the “best open data law in the world” at a Transparency Camp session. Ashlock mentioned that one reason New York was able to pass such an effective law was because it was carried forth by New York City Transparency Working Group, a strong coalition of well-established civil society organizations and newer civic technology groups. He explained that connecting the civic tech groups with well-established CSOs provided the political connections and knowledge vital to navigating the legislative process. Perhaps this hybrid coalition could be a model for crafting further open data legislation in other governments.

I asked Lee if he had a message for those interested in the outlook for OGD and he responded that, at Sunlight “we believe that transparency makes government better” across the board and benefits citizens. “It is true that there are limitations out there but there is nothing better than diving in and showing people that this is useful.”

129

Don't miss a Shareable thing!

Get Shareable in your inbox once a week

.