Transnational methodology

From MyWiki
Jump to: navigation, search

Introduction

Scope and targets

In the CitiEnGov project, Participant Partners have been asked to describe energy-related data about buildings, transport and public lighting.

The medium-term goal is to made these data available (whole datasets or subsets) based on a harmonized “energy data model” together with ICT services for sharing energy-related data. This document describes a transnational methodology, based on one hand on the evaluation of tools implemented by CitiEnGov partners, and on the other hand on standards and technologies already available at European scale for sharing interoperable energy-related data.

Due to the technical its nature, the text presented here is mainly addressed to ICT and geo-ICT experts, with sufficient skills on:

  • Data and database modelling, data extraction/transformation/load
  • Web services for presenting and sharing data
  • Standards for interoperability, in particular related to geographic information

This text is also available in pdf version, available here.


The need for harmonized energy-related data

A major challenge in climate change mitigation is the timely access to robust energy-related data that can underpin public sustainable energy policies as well private investments for reducing energy consumption of buildings and transport and improving their efficiencies.

Cities are the places where energy is produced and consumed, and therefore it makes sense to focus the attention particularly on the cities as they yield great potentials in terms of energy consumption reduction and efficiency increase. As a direct consequence, a comprehensive knowledge of the demand and supply of energy resources, including their spatial distribution within urban areas, is therefore of utmost importance.

Precise and integrated knowledge about urban space, energy infrastructures, buildings’ functional and semantic characteristics, and their mutual dependencies and interrelations play a relevant role for advanced simulation and analyses.

As reported by the Joint Research Centre of the European Commission in Location data for buildings related energy efficiency policies "to implement and monitor energy efficiency policies effectively, local authorities and Member States are required to report on baseline scenarios (e.g. the Baseline Emissions Inventories in the Covenant of Mayors initiative) and on progress made at regular intervals (Annual Reports for the Energy Efficiency Directive and the Energy Performance of Buildings Directive and Monitoring Emissions Inventories every two years for the CoM)”.

Indeed, reporting tools are already available to local authorities and Member States, but they are very basic and only allow users to input aggregated and approximated values (for example, local authorities may rely on national data when local data are not available) for planning and monitoring progress towards targets. Therefore, a common framework for monitoring of energy efficiency policies, with harmonised data from building to district and ending at national level could improve the interoperability of the different directives / initiatives.

Scaling and relation between EU Directives and location (source: EC JRC, 2015, Location data for buildings related energy efficiency policies)

Within such a framework, geo-referencing all the relevant building data accurately and consistently will significantly improve data quality and reliability, enable effective scenario modelling to fill gaps in data, and support the overall policy process. Furthermore, from a potential market perspective, web-based tools providing access to the energy performance of geo-referenced buildings could improve territorial knowledge, and support, for example, the activities of energy service companies and companies involved in construction / renovation of buildings.

The CitEnGov harmonized data model

In CitiEnGov the three main “sectors” considered are:

  • Buildings
  • Mobility
  • Public lighting

Actually, as described in the Covenant of Mayors’ website, “action plans (SEAPs or SECAPs) should include actions that cover the sectors of activity from both public and private actors, covering the whole geographical area of the local authority committed” (open reference).

Signatories are free to choose their main areas of action. In principle, it is anticipated that most action plans will cover the sectors that are taken into account within the emission inventory and risk and vulnerability assessment (for SECAP only). For the mitigation part (both SEAP and SECAP), it is recommended to include actions targeting the Covenant key sectors:

  • Municipal buildings, equipment/facilities
  • Tertiary (non municipal) buildings, equipment/facilities
  • Residential buildings
  • Transport
  • Industry
  • Local electricity production
  • Local heat/cold production
  • Others (e.g. Agriculture, Forestry, Fisheries)

For the adaptation part (SECAP only), the identification of the sectors to increase the resilience in a city is highly contextual; some of the main sectors that can improve the resilience of cities include:

  • Infrastructure
  • Public Services
  • Land Use Planning
  • Environment & Biodiversity
  • Agriculture & Forestry
  • Economy

Transnationality and the need for using INSPIRE

The idea presented here is to build up the “transnational template” starting from initiatives already defined at European level by the data specifications related to the INSPIRE Directive.

The conceptual model starts from the Data Specifications defined by the INSPIRE Directive as baseline, and considers all requirements and characteristics of energy data that partners provided.

Even though the implementation of INSPIRE data models is not the focus neither the goal of CitiEnGov they will be used as a starting point and as a common approach to get a common view and common semantics about energy-data.

Therefore, the objective of this activity will be twofold:

  • a common conceptual data model, to be considered as a possible target schema for exporting and sharing data outside the local context and outside the organization;
  • a reference implementation, as SQL-based relational database (possibly for Oracle and PostGIS platforms)

It is noteworthy that the final goal is not to force CitiEnGov partners to change the way they use energy-related data internally, but to help them to generate a neutral and standardized semantics.

The importance of sharing the same semantics about energy-related data can be simply clarified with the following example: on March 2017, during a CitiEnGov videoconference (SIPRO, GOLEA, DEDAGROUP PUBLIC SERVICES) it was discussed a practical requirement coming from Slovenian regions, where data about energy consumptions are usually shared from utilities (data providers/custodians) and Public Authorities. Data about consumption are:

  • temporally aggregated on annual basis
  • divided by fuel (e.g. gas, electricity, district heating, … )
  • divided by “building” categories

In the case of building “categories” GOLEA mentioned that they usually get these data divided in terms of “uses of buildings”:

  • residential
  • industrial
  • offices
  • commerce

Indeed, even though these categories are quite similar in different countries, often they do not have the same meaning. That’s why we need to look at INSPIRE in terms of semantics (and not merely in terms of Directive’s principles, data requirements or technical specifications); semantics practically means that we already have some basic concepts like buildings’ typologies, or (better) “uses of buildings” as already defined by INSPIRE: http://inspire.ec.europa.eu/codelist/CurrentUseValue

The codelist above contains what INSPIRE conceives when we think of “uses of buildings”. This codelist is:

  • not closed, but can be extended, as in this example
  • available in different EU languages … therefore users can switch from English to German or Slovenian or Polish and get the clear definition of each value, in national languages, as in this example

Of course, this is just a simple example of what we mean when talking about “semantics” related to energy data. In the deliverable DT1.2.1 project partners already shared a common definition of other “concepts” like:

  • energy type (primary, estimated, final, …)
  • energy source (biogas, natural gas, electricity, solid fuels, warm water o stream, …)
  • heating systems (central heating, district heating, electric radiators, solar heating, stove, …)
  • … etc

A first conceptual version of the data model has been provided to CitiEnGov partners in July 2017. To facilitate the understanding and the further agreement of the conceptual model (by September 15th, 2017), CitiEnGov partners have been provided 2 different documents:

  • PowerPoint slides, explaining the rationale of the proposed data model
  • Excel spreadsheet, containing the list of classes/tables and their attributes needed to cover all possible aspects of “energy database” related to buildings, transport and public lighting

The data model consists of 3 main classes (that will be tables in the physical database implementation) corresponding to the 3 sectors the project is focused on:

  • building
  • transport
  • installation (public light)

A physical implementation of the data model has been developed in CitiEnGov with a standard SQL structure provided to all CitiEnGov partners; the CitiEnGov SQL data model is available for the two spatial relational database platforms mostly used: Oracle and PostgreSQL/PostGIS.

Conceptual data model

As aforementioned, the data model relies on the INSPIRE Data Specifications: for instance for the "Buildings" sector the Technical Guidelines considered are available at: http://inspire.jrc.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_BU_v3.0.pdf

The following image summarizes the conceptual data model:

CitiEnGov ConceptualDataModel.png

The model is based on the following basic classes:

Buildings 
allows to store data about the building stock at different level of details: building units, buildings, energy plants and facilities, block of buildings or districts.
open details about Buildings
Installation 
allows to store data about building HVAC systems, energy meters and public lighting lamps and lines.
open details about Installation
Transport 
allows to store data about transport (number of vehicles, renewal rate, …) of different transportation groups (municipal fleet, public transport, private/commercial cars, etc). It can optionally associate a spatial element (e.g the administrative area where the fleet is contained, the spatial extent of the public transport line being described, etc).
open details about Transport
EnergyAmount 
allows to store all the information about energy: primary energy production, final energy consumption, renewable production, vehicle fuel consumption.
open details about Energy amount
Geometry 
classes Building, Installation and Transport all have a geometry attribute (mandatory for the first two classes) that can be valorized as a point, line or 2D polygon.

Physical implementation of data model

As aforementioned, the physical implementation of the data model will be a reference implementation based on two different platforms mostly used, Oracle and PostgreSQL/PostGIS. The SQL scripts for creating the database are available (in pdf files) at the following links:


The physical implementation of the CitiEnGov harmonized data model will be used to populate the database with data already available at partners’ premises or collected during the CitiEnGov project. These data will be transformed by CitiEnGov partners using ETL (Extract, Transform, Load) tools. Different options do exist to achieve this data transformation:

  • using SQL or PL/SQL (or PL/pgSQL) scripting language
  • Kettle software
  • FME software
  • HALE software

The physical data model will be provided to partners containing the following SQL statements:

  • CREATE statements for all tables of the “SCC solutions database” in SQL creates an object in a relational database management system (RDBMS). In the SQL 1992 specification , the types of objects that can be created are schemas, tables, views, domains, character sets, collations, translations, and assertions. Many implementations extend the syntax to allow creation of additional objects, such as indexes and user profiles.
  • ALTER statements to add constraints related to Primary Keys; in SQL changes the properties of an object inside of a relational database management system (RDBMS).
  • INSERT INTO statements, used to insert new records in a tables corresponding to codelists; the INSERT specifies both the column names and the values to be inserted.

ICT services to share energy data

The sharing of energy-related data will rely on the deployment of web geo-ICT services based on open standards. These web services will span from catalogue services for browsing and searching data in distributed metadata catalogues, to services for visualizing or accessing data. Client applications that will be implemented by CitiEnGov partners to present energy-related data (e.g. portals) need to use these web services directly by connecting them with standard interfaces/protocols. Data services are services related to data ingestion, management, view and access; from the data provided/publisher point of view (and also according to the ISO19119 taxonomy), the data services can be grouped in the following macro-categories:

  • discovery services
  • viewing services
  • access services configuration (download)
  • processing services (subsetting, ordering, filtering)

These web geo-ICT services based may be implemented using proprietary solutions like Esri ArcGIS Server (http://server.arcgis.com/en/) or open source ones like GeoServer (http://geoserver.org/). It is crucial that the solution chosen by the partner is implementing open standard protocols like the ones mentioned hereafter.


Discovery services

The discovery of energy datasets is usually performed through searching functionalities in metadata catalogues; metadata describe the general characteristics of each dataset, independently from the distribution formats or from the availability of services that operate on the dataset. One dataset, being a geographical one or tabular or other, may have different representations; in the case of geographical data, the “discovery metadata” may provide a general but structure description (responsible parties, dates, licenses, lineage, …) and refer to one or more “resources”. For instance, a metadata regarding a geographical dataset may refer to one or more of the following “resources” in different possible formats and standard protocols:

  • a CSV or XLS formatted file containing the tabular representation of data
  • a ZIP file containing vector representation of data (e.g. SHP with DBF for attributes), to allow Geographic Information Systems’ users to easily work on simple flat datasets
  • a KML encoded file, for being represented in Google Earth or other 3D / globe viewers
  • a GML encoded file, in case of complex spatial data to be provided in an interoperable and open standard format
  • a web service conformant to OGC WMS standard interface, to allow the visualisation of maps in web or desktop map viewers
  • a web service conformant to OGC WFS standard interface service, with dynamic outputs based on the same formats (SHP/ZIP, KML, GML, …) so to allow the downloading of subsets of data based on filters, or for the downloading of frequently updated data

View services

Since sometimes data visualization may be misunderstood as data access, it may be appropriate to highlight here the principle differences:

  • accessing data involves the possibility of querying, sub-setting and filtering (it’s a necessary condition, but not sufficient since certain view services have the capabilities of expressing a filter);
  • accessing data necessarily use a physical data format, but does not depend on it; the representation of data instead is an integral part of viewing services;
  • very often in viewing services, the representation of data completely hides underlying data making it impossible to recover them (approximations, portrayal, simplifications, generalization, aggregation etc.; usually these are part of the viewing service).
  • data coming from an access service can be subsequently elaborated without loss or without the need of particular pre-elaboration.

CitiEnGov partners may expose services for view energy-related data via web services based on well-known APIs for representing tabular data, or through WMS / WMTS protocols defined by the Open Geospatial Consortium (OGC) for maps. For spatial data, viewing means producing an image from the data applying a set of rendering rules, otherwise viewing a classic alphanumeric dataset can be achieved producing a tabular representation or a graphical one. The different infrastructural components are optimized to treat the different type of data and this results in various protocols and standards used in the data services. In the same way, accessing data can have several implementations: WFS for spatial data, CSV for tabular one, SPARQL endpoint for linked (see the following section). The CitiEnGov partners may also offer functionalities to let clients visualise:

  • tabular data, with filtering/searching capabilities to extract or sort subset of datasets
  • graphics (dashboards), based on open source Javascript libraries to render statistical data with high quality diagrams and presentation styles

Download services

In the context of CitiEnGov project, different representations of energy data are foreseen:

  • tabular data, with records and rows to present data in CSV, XLS or other formats
  • geographic vector, with spatial features representing buildings, transport networks or public lighting with vectors
  • geographic coverage, with raster images of spatial phenomena (e.g. energy production may be provided as spatial data in the form of a raster layer, with regular grid containing cells with different values of energy consumption)
  • geographic sensor, with near real-time data coming from sensors (e.g. energy consumption at single municipal buildings level)

As per INSPIRE definitions, a download service for vector geographic data is equivalent to a web service implementing the OGC WFS standard interface; the intention being that the user is given access to the raw data values instead of a cartographic representation as is the case with e.g. WMS requests that only return a map image. Access to the raw data enables two key benefits:

  1. the ability to perform calculation and analysis using the vector geometries or raster cell data
  2. the ability to draw non-pixelated map images at all scales using client-side rendering

CitiEnGov partners may implement an extended set of download services that goes beyond the INSPIRE requirements. Each extension provides a specific performance benefit and the total implementation includes the following protocols and formats.


Service protocol Data format, transport Benefits
Web Feature Service (WFS) GML/XML, GeoJSON, CSV Provides interoperable methods to access and work with remote spatial data sources.
SPARQL RDF/XML, RDF/JSON Provides a basis for easy extension of any dataset through RDF triple assertion. Provides Linked Data publishing.
Custom vector data service TileJSON The vector equivalent of tile map services for raster data. To remove the overhead of clipping custom extents for vector data, tiles are pre-generated. Client applications can buffer neighboring tiles into memory in order to provide smooth panning experiences.
Custom table data service WebCSV, JSON, XML This type of service can provide access to non-spatial tabular data in one of the three formats listed. WebCSV is the lightest format but has limited support in client libraries. JSON has relatively low overhead and is widely supported by browser based end-user clients. Finally, XML is very easy to parse using any software technology despite a significant markup overhead.

Processing services

In the context of CitiEnGov, processing services are “partner-driven” web services linked to the detailed requirements coming from each partner in terms of data processing and user engagement. Several use cases aim to perform e.g. calculations on data about buildings, transport network, public lighting. This relies, of course, on well-known data models that contain the information that is required to run the appropriate equations/algorithms. Data will be read from the partner data store and will be consumed by the processing service where the actual analysis code is implemented. Therefore, the processing services may be a set of independent end-user applications that will consume their business logic via the APIs and (optionally) the client-side JavaScript libraries implemented at partners’ level. The following table summarizes a list of possible operations that can be performed for different categories of processing services:

Categories of processing services

Type of data Visualization Querying Processing
Non-spatial graph X X N/A
Non-spatial table X X N/A
Spatial graph X X Proximity
Spatial raster X N/A N/A
Spatial table X X Proximity, overlay
Spatial table: building X X Energy performance
Spatial table: network X X Route calculation
Spatial table: point X X Interpolation

Technical references for services

This section contains the technical references about interfaces, versions, operations, etc. required at server or client levels. Indeed, the details of these technical references are based on previous EU projects (e.g. eENVplus, GeoSmartCity) available at the deliverables public access pages.

Technical references are divided in three main sections:

  • client: set of requirements related to client software (desktop or web) directly used by human beings to search/discover, view, access energy-related data
  • server: set of requirements related to server components, to be made available at partners’ level
  • interface: set of requirements related to standard interfaces and protocols to be considered at client and/or server side levels to guarantee interoperability

The detailed list of technical specifications is available in the separate page ICT technical guidelines.