• Print

What is open data?

Open data consists of datasets produced or compiled by public bodies that public administrations make available to citizens so that they can be freely used in a simple, user-friendly way.


Open data has significant potential value and is essential to public administration transparency, efficiency and equality of opportunities when it comes to wealth creation.


The main objective of releasing data is for the data managed by administrative authorities to be made public and available to society for use by any individual or organisation. With this service, the administrations increase their transparency since citizens are able to access a true picture of the services provided. In addition, the reuse of open data by companies, organisations, associations and citizens in general enables the development of new products and services that create value, innovation, knowledge and business opportunities.


Open data licenses and terms of use are subject to the laws on public sector information reuse and, in some cases, may be protected by intellectual property licenses, although the data is typically released without conditions, provided it is not altered and includes the mandatory source credit and latest update citation. For further information, please refer to the terms of use and licenses section.

For open data to be fit for purpose it must be:


1. Public: it should not be subject to any kind of privacy, security or any other type of restriction, except those that, by law, are subject to restrictions.

2. Detailed: it must be the primary and original, unprocessed data (what is known as raw data). Information must be provided on how the information was obtained and the location of the source documents, so that the user can verify the transparency of the process and the fact that the raw data has not been altered.

3. Updated: it must be released to citizens as often as necessary to prevent loss of value and ensure it is always accurate and up to date. priority must be given to data for which time is a factor in its usefulness

4. Accessible: it must be accessible to as many users as possible, so there are no restrictions or barriers to its use, such as the need to formally request the information or undertake any other procedure.

5. Automated: it must be made available in electronic formats for generalised and structured use so that it can be automatically processed on any computer.

6. Registration-free: it must be open to all, without the need for prior registration to access it.

7. Open format: it cannot be owned or associated with a specific company and must be free of legal and economic restrictions of use.
8. Free: the use of the data should not be subject to any type of regulation that restricts its reuse. Therefore, the data must be free of rights, patents, copyright and not be subject to privacy or security rights or regulations.

With the development of the information society, open data has become a very valuable information tool that provides benefits for citizens and companies, as well as for the administration itself, able to improve efficiency as a result of facilitating greater interoperability.


The provision of open data is an exercise in information transparency that allows citizens access to information related to the administration's actions and services and its management of public resources.


Benefits for citizens: information
It establishes an active, participatory and bidirectional dialogue between a government and its citizens, a fundamental principle of open government.
It facilitates the creation of new social services that improve the life of citizens.
It promotes the democratic participation of citizens.


Benefits for businesses: generation of wealth
It enables the creation of economic value as it generates new services and web applications developed from the open data.
It opens up a new market based on digital content.
It creates the potential for the generation of profit from public information.


Benefits for public administrations

transparency
Promotes intelligent and effective use of resources.
Generates a transparent government that promotes a higher degree of public confidence.
Facilitates interoperability between different administrations.

 

The term dataset refers to a set of typically structured data that has been used to construct information published in data catalogues or displayed independently.


Raw data is organised into datasets so that it can be more easily indexed and located; in order to achieve this, different fields are used that define the data group such as description, update frequency, format and usage licence, among others.


Currently, the Government of Catalonia Open Data Portal includes datasets organised by a variety of formats, categories and data sources that allow citizens to access a wide range of data. A wide variety of data is available for use from across the various Government of Catalonia bodies, including geographic, meteorological, statistical, economic, administrative, tourism, legal and mobility data, among others.

 

What is a dataset?

A reusable format is a structured, open, proprietary-free format, for example, CSV or XML. These are data formats designed to be used by other programs and applications, for example, for analysis, cross-referencing with other data sources, or creating data views in graphs or maps.

In contrast, data formats exist, such as PDFs, which are designed for data and information reference but which do not allow for easy reuse.

Law 19/2014 of 29 December on transparency, access to public information and good governance establishes that all public sector information must be provided in a clear, structured manner and in reusable format, in order to facilitate interoperability, improve transparency and documentary simplification.

What is a reusable format?

The datasets available on the open data platform can be exported in multiple formats, briefly outlined below:

Format

Description

CSV / TSV

An open, simple and widely used format to represent tabulated data. These files can be opened with both text editors (Windows Notepad, MS Word) and spreadsheet editors (MS Excel, OpenOffice Calc, etc.). The data is structured into columns separated by a specific character (usually the separator is a comma or semicolon for CSVs, and tabs for TSVs). All the rows have the same fields and a line break is included at the end of each row. CSV export options for Excel and CSV for Excel (Europe) are also available. These options are formatted so that when they are opened with MS Excel, the program interprets the separator and displays the columns separately, facilitating their reading. Further information http://tools.ietf.org/html/rfc4180

JSON

Lightweight data interchange format for computer applications Makes it easy for machines to generate and interpret data. Based on a subset of JavaScript programming language, suitable for client programming. Further information http://json.org/json-es.html

XML

Open format that allows data to be represented in a structured and hierarchical way using tags. It is a language designed to facilitate the reuse of data through other programs and applications. Further information https://www.w3.org/TR/2006/REC-xml11-20060816/

RDF-XML

RDF is a specification to structure data in the form of subject-predicate-object triples, which allows semantic information to be incorporated into the data. RDF is an abstract specification and is not limited to a specific format. Serialised RDF files such as XML can be downloaded on the open data platform. Further information https://www.w3.org/TR/REC-rdf-syntax/

RSS

An XML language format that allows the distribution of web page content. It facilitates the publication of updated information to users subscribed to the RSS feed without the use of a browser, using specialist software in this format.

In the case of datasets that contain geographic data, in addition to the formats mentioned above, they can also be exported in the following formats specific to the representation of geographic data:

 

Format

Description

KML/KMZ

The KML format is a specific XML notation for the representation of geographic data. It allows the representation of various geometries (points, polygons, 3D models, etc.) expressed in latitude, length and, optionally, altitude. KMLs can also be grouped together in a ZIP (known as KMZ) and may contain other resources such as descriptions or images associated with geographic elements. They can be opened and processed with software that implements  KML and KMZ, such as Google Earth. Further information https://developers.google.com/kml/documentation/kmlreference

SHP

Shapefile is a proprietary format for spatial data that is widely used for the exchange of geographic information between Geographic Information Systems (GIS). It is a digital vector storage format for storing geometric location and associated attribute information, although it lacks the capacity to store topological information. A number of files, a minimum of three, are generated and three extension types exist: .shp, .shx and .dbf. Further information http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

GeoJSON

The GeoJSON format is an open standard format for representing different geographic data structures along with non-spatial attributes, based on the JSON format. Further information http://geojson.org/geojson-spec.html

Original

This option enables users to download the file in the same format as it was uploaded onto the open data platform.

 

Open data can be used for any purpose: for example, facilitating the search for information on different topics, the building of applications, particularly software and viewing methods, which use free information as a source.

Socioeconomic statistical studies can be conducted, among other uses, which may then subsequently be used by many companies for market analysis and commercial risk assessment, marketing and sales purposes.

Data journalists use open data as raw material instead of other sources of information and work on the critical analysis of information in order to offer compressible and highly intuitive representations of the information.

What can this data be used for?

The Data Portal contains datasets pertaining to a wide range of thematic areas and formatting types.

A series of categories have been established in order to classify the datasets into thematic areas. The categories are unique to each dataset and have been developed based on the categories determined in the national Technical Interoperability Standard for the Reuse of Information Resources [link https: //www.boe.es/boe/dias/2013/03/04/pdfs/BOE-A-2013-2380.pdf].

These are categories commonly used in multiple reference portals such as 060, EUGO, INE, EUROSTAT, WORLD BANK and OECD. Technical Interoperability Standard for the Reuse of Information Resources sets out the following categories:

 

Category

Thematic areas

Science and Technology

Includes innovation, research, R&D&I, telecommunications, Internet and the information society

Commerce

Includes trade and consumption

Culture and leisure

Includes leisure time and recreational activities

Demography

Includes immigration and emigration, family, women, children, the elderly and electoral register

Economy

Includes debt, currency, banking and finance

Education

Includes schools, education and training activities

Energy

Includes renewable energy

Sport

Includes sports facilities, federations and competitions

Housing

Includes real estate market and housing

Treasury

Includes taxes

Industry

Includes mining

Legislation and justice

Includes records

Environment

Includes meteorology, geography and the conservation of fauna and flora

Rural environment and fisheries

Includes agriculture, livestock, fishing, and forestry

Health

Includes health and hospital services

Public sector

Includes budgets, institutional organisation chart, internal legislation and civil service

Security

Includes civil protection and defence

Society and welfare

Includes citizen participation, discrimination, active ageing, dependency, retirement, insurance and pensions, benefits, and subsidies

Transport

Includes transport, communications and traffic

Work

Includes employment and labour market

Tourism

It includes accommodation, hospitality services and gastronomy

Urban planning and infrastructure

Includes construction, infrastructure, public facilities and public sanitation.

 

Additionally, the Memory category was added to include all those datasets that refer to the country's historical memory.

The datasets have also been identified with associated tags or keywords, which are more numerous than the categories and each dataset typically features more than one assigned tag in order to facilitate search options.

Moreover, the datasets published on the portal can be found in various format or data view types. When searching the catalogue for datasets, the type of data view allows you to filter according to the types of formats in which datasets are available. It is also important to note that the type of view does not respond to specific formats but to format types (links, tabular files, files containing geo-spatial information, etc.), which in turn can be found in different formats. The following provides a description of each type of data view:

 

Types of view

Dataset description

Files and Documents

Files that are published on the open data platform but which are not interpreted as structured information. Supplied in the original data source format.

Datasets

Structured tabular files where the columns are the information fields and each row is a piece of data. Can be downloaded as CSV, CSV for Excel, CSV for Excel (Europe), JSON, RDF, RSS, TSV for Excel and XML.

Maps

Files containing geo-spatial information. Can be downloaded with geo-spatial data as KML, KMZ, SHP, Original and GeoJSON; and without geo-spatial data or with a specific layer of data as CSV and JSON.

External data sets

The platform includes links to data sources, which can be found on other Government of Catalonia corporate websites or applications.

Views with filters

Consist of datasets filtered from a set of original data published on the open data platform. Can be filtered and downloaded in the same formats as the original dataset.

Graphics

These are graphic displays of datasets published on the open data platform. Can be filtered and downloaded in the same formats as the original dataset.