I’ve been thinking a lot lately about data being collected about cities through remote sensor networks.
It strikes me that this is a very relevant issue for those in the open data movement, as the data generated by urban sensor networks is likely to be mashed up with publicly available data from cities on crime, land use, service requests and a host of other things to drive better decision making. There’s a natural connection between the kinds of data we find in open data portals and the kind of data that is generated by emerging sensor networks.
It also strikes me that most municipal open data portals are not well suited to provide access to realtime data – the kinds of data that sensor networks are really good at generating.
Pretty much every modern open data portal provides a way to programmatically access data that is housed in it – data is accessed via an API by making an HTTP request (with the required information – e.g., authentication – in the request) and getting a response back (typically in either JSON or XML format). This data access paradigm fits well with the way that most of the data in municipal open data portals is updated – usually not more frequently than daily.
If data updates happen frequently – or if a data consumer wants to check and see if data has changed since the last time it was accessed – a consumer application can poll the API for changes at set intervals. And though this approach works acceptably well for data that doesn’t change all that often, it is far from acceptable from data that does (or could) change more frequently. In fact, the closer updates to data get to realtime changes, the less optimal this approach is because it places a heavier burden on consumers (who must poll the API for data chances more frequently) and for the data portal itself (which must handle and respond to more frequent requests from API consumers).
Other – more efficient – approaches to accessing data can be used when data updates occur more frequently. These approaches – like server-sent events and Websockets (which are both part of the HTML5 specification), or registering a callback URL (or Webhook) – benefit both the data consumer and the data producer.
The closest thing I can identify to a realtime open data API is one that we built in the City of Philadelphia for flight information from the Philadelphia International Airport. This API uses data from the airport flight information system and is updated every three minutes (about the same frequency as data is updated on the Airport’s website and on flight information displays in the airport terminals). It provides a simple REST API for making standard HTTP calls for data on specific flights, and was also designed with a Websocket endpoint to allow realtime connections.
Another interesting realtime data project from Chicago is ClearStreets (a project of Open City, which has built a number of powerful civic apps for the City of Chicago) that shows the realtime position of plows as they clear the streets after heavy snow.
Even more exciting is the OpenSensors project which is a platform that supports data aggregation from remote sensor networks – the project hosts open data projects at no cost and allows anyone to subscribe to data feeds from these open sensor network projects.
I think these examples show how municipal open data portals can more in the direction of supporting realtime data, and – perhaps more importantly – how governments can begin to understand the coming importance of providing ways for data consumers to use realtime methods for accessing data.
It can be tempting to think of the need for realtime data as being closely coupled with the use of sensors. But even in places where sensor networks are not yet built out (or even planned), there are lots of opportunities for open data to become closer to realtime.
Crime incidents, parking citations, 311 service requests, road closures, permit and license issuance – these are all activities that occur every hour of every day as a part of municipal operations. And yet the data that is generated by these activities is still largely consumed through open data portals in a fashion that best fits data which is updated only periodically.
Wouldn’t it be useful if data consumers could subscribe to a specific topic or channel (like Service Requests or Building Permits) for a specific neighborhood, register a callback URL and then receive a push of JSON representing the specific event when it occurred? No more wasteful polling for changes that consume resources on both the client and data portal side – just send me information on an event I care about when it occurs.
In some instances, the barriers to moving toward making more realtime data available from governments is related to technology – some legacy systems may not make it practical to expose data in this way. But as cities start producing more and more data – particularly as remote sensor networks become more common – the demand for ways to consume data in more appropriate ways will increase.
Will municipal open data portals be able to keep up with this demand? We’ll see.