Synthetic CityGML data and
procedural modelling engine


While doing my PhD I have encountered the following shortcomings in 3D GIS research:

  • CityGML datasets containing multiple levels of detail are scarce.

  • Procedurally generated data in the CityGML format is non-existing.

  • There are no free procedural modelling engines.

  • Publicly available CityGML models usually contain lots of (topological) errors.

To solve these problems I have developed Random3Dcity, a basic and experimental open-source procedural modelling engine for automatically constructing synthetic buildings and their realisation in CityGML in multiple LODs. I am using the generated datasets for multiple purposes within my PhD project, such as benchmarking the performance of using a specific LOD in a use case, but possible applications are not limited to it. Hereby I bridge the gap and releasing the datasets publicly. The code of the engine has been released as well.
Along with this project, I have designed a new LOD specification for 3D city models that extends the one found in CityGML. The specification has been realised through this engine.
With its diverse buildings and the large number of their representations, Random3Dcity aims to be the most complete CityGML (and probably 3D) dataset available. However, be aware of its limitations, such as the experimental nature, and synthetic outcome.

From the random parametric description to CityGML in multiple representations

Random3Dcity consists of two parts. The first constructs buildings with random properties (procedural modeller), such as height, roof type and number of windows and their size. The algorithm to do that is relatively smart, it does it by adhering to a large number of constrains (e.g. that windows do not overlap), and takes care that the designed buildings look as realistic as possible. The engine stores these data in a human-readable parametric description (own XML schema), e.g.: <roofType>Gabled</roofType><h>2.89</h>.
Because of this stochastic nature, the datasets are well suited for many analyses as an unbiased data source, and for recreating many different scenarios which might not be available in the real world data. With the current configuration, the number of different buildings is around 1054.
The second part of the engine reads this data and realises them in 3D by generating CityGML files in multiple levels of detail.
The methodology is described in the following publication:

  • Generation of multi-LOD 3D city models in CityGML with the procedural modelling engine Random3Dcity
    Filip Biljecki, Hugo Ledoux, and Jantien Stoter
    In: ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., vol. IV-4/W1: 51-59, 2016.
    [PDF] [DOI]

  <building ID="bee12ca0-dba3-4260-9c0c-e495b5f34783">
    <origin>135.0 105.0 0.0</origin>

Random3Dcity - the CityGML procedural modelling engine and synthetic dataset from Filip B on Vimeo.

Meet the refined LODs

As one of the aims of my PhD research, I have developed a new specification that refines the CityGML LOD specification into 16 LODs, four less ambiguous and finely specified variants that fit within each of the current LODs 0, 1, 2 and 3, intended to supplement the CityGML specification.
They are a product of a thorough research of 3D production workflows, contact with practitioners, and examination of presently available 3D models.
A visual description of the specification is shown on the right, and a composite of four LODs of a dataset with 100 buildings is shown below.
The detailed specification was published in:

  • An improved LOD specification for 3D building models
    Filip Biljecki, Hugo Ledoux, and Jantien Stoter
    Computers, Environment and Urban Systems, vol. 59: 25-37, 2016.
    [PDF] [DOI]

The specification above was developed independently of CityGML, and it is usable outside its scope.

The four LODs on the image on the left are: LOD1.1, LOD2.0, LOD2.3, and LOD3.3. The first two representations have their walls modelled as projections from the roof edges.

Interior solids for storeys

Random3Dcity is capable of generating the basic interior in three levels of detail: one solid for each storey, a solid for the whole building (offset from the shell), and a 2D polygon for each floor. These solids may, for instance, serve as ground truth models for building volume computations.
The engine generates a few interior parametres such as joist and thickness of walls, and it computes the geometry of solids.

Multiple geometric references

Each of the buildings is generated in multiple levels of detail (16 of them). Further, they are also generated in multiple geometric references (e.g. the varying height of the top of the LOD1 block model). On the right side you can see seven variants of the LOD1 block model with respect the used geometric reference for the top surface. Further, varying references have been used for the footprint: the actual footprint and the projection from the roof edges. This was also implemented for the LOD2 model.
This topic was described in the following publication:

  • The variants of an LOD of a 3D building model and their influence on spatial analyses
    Filip Biljecki, Hugo Ledoux, Jantien Stoter, and George Vosselman
    ISPRS Journal of Photogrammetry and Remote Sensing, vol. 116: 42-54, 2016.
    [PDF] [DOI]

A sample of the generated CityGML datasets

If you are not willing to download the source of the code and generate the datasets yourself, here I have prepared a zipped collection of sample CityGML data. The zip contains also OBJ files that were generated with my tool CityGML2OBJs. The description for each representation is given below.

Download the zipped CityGML and OBJ data (52.2 MB)

Level of detail Geometric reference Brep or solid Filename Note
LOD0.1 Actual footprint Brep LOD0_1_F0_H3.gml Contains only the footprint
LOD0.2 Actual footprint, height at half height of the roof Brep LOD0_2_F0_H3.gml Contains both footprint and roofedge polygons
LOD0.3 Actual footprint, individual heights Brep LOD0_3_F0_H3.gml Contains both footprint and roofedge polygons
LOD1.1 Actual footprint, height at half height of the roof Brep LOD1_1_F0_H3.gml
LOD1.2 Actual footprint, height at half height of the roof Brep LOD1_2_F0_H3.gml
LOD1.2 The footprint is offset from the roof edges by 20 cm Solid LOD1_2_Fd_H5_solid.gml Different footprint (offset)
LOD1.3 Actual footprint, height at half height of the roof Solid LOD1_3_F0_H3_solid.gml
LOD2.0 Actual footprint Brep LOD2_0_F0.gml
LOD2.0 Actual footprint Brep LOD2_0_F0_S0.gml No semantics
LOD2.1 Actual footprint Brep LOD2_1_F0.gml
LOD2.2 Actual footprint Brep LOD2_2_F0.gml
LOD2.2 Actual footprint Brep LOD2_2_F0_S0.gml No semantics
LOD2.2 Projection from roof edges Brep LOD2_2_F1.gml
LOD2.3 Actual footprint Brep LOD2_3_F0.gml
LOD3.0 Walls as projections from roof edges Brep LOD3_0.gml Aerial features
LOD3.1 Not applicable Brep LOD3_1.gml Terrestrial features
LOD3.2 Not applicable Brep LOD3_2.gml
LOD3.2 Not applicable Brep LOD3_2_S0.gml No semantics
LOD3.3 Not applicable Brep LOD3_3.gml Very detailed model (finest in the series)
LOD3.3 Not applicable Brep LOD3_3_S0.gml No semantics
Interior-LOD0 Not applicable Brep interior-LOD0.gml One polygon for each floor
Interior-LOD1 Not applicable Brep interior-LOD1.gml One solid for the building
Interior-LOD2 Not applicable Brep interior-LOD2_2.gml One solid for each storey

Notes and future work

  • You can convert this data to OBJ with my tool CityGML2OBJs.

  • All <gml:LinearRing> and <gml:Polygon> have a <gml:id>, which was randomly generated (UUID).

  • Datasets ending with _S0.gml do not contain semantically differentiated surfaces.

  • The coordinate system is local.

  • The data complies to CityGML 2.0.

  • The full product of the engine contains more datasets with shuffled variants (392 to be more precise). For instance, not all LOD1 variants with the heights are put here. Please contact me if you require additional variants, or generate them with the provided code.

Thumbnails of some of the LODs in the dataset:

Erroneous datasets (intentionally)

An ancillary engine has been created to simulate acquisition errors to the original dataset above. This data is suited for error propagation analyses (for instance, see my related paper). Furthermore, because of potentially broken topology in some files, its use may be extended to other domains, such as testing validation and repair tools.

Positional error

LOD Disturbed with an error of Erroneous dataset Note
LOD1 σ = 0.0 m (GT) LOD1-F1H1 [5.7MB] Solid
σ = 0.2 m LOD1-F1H1-0.2 [5.7MB] Solid
LOD2 σ = 0.0 m (GT) LOD2-F1 [8.7MB] Brep
σ = 0.2 m LOD2-F1-0.2 [8.7MB] Brep
LOD3 σ = 0.0 m (GT) LOD3 [81.4MB] Brep
σ = 0.2 m LOD3-0.2 [81.4MB] Brep

  • The error is equivalent to the ISO 19157 spatial data quality element positional accuracy.

  • Absence of spatial correlation of uncertanties is assumed.

  • The uncertanties are equal for all coordinates. The vertical (z) coordinates are not treated separately.

Overlapping objects

Overlapping objects are unwanted. Hence, these datasets may also be helpful for testing validation and repair software. The below datasets contain buildings that overlapping, in multiple LODs.

LOD Erroneous dataset Note
LOD2 LOD2-overlap [519kB] Brep
LOD3 LOD3-overlap [5.2MB] Brep

Arbitrary semantics

These datasets contain shuffled semantic surfaces and/or missing semantic classes. The semantics for surfaces were uniformly randomised (1/3 for each of the following: GroundSurface, RoofSurface, WallSurface).

LOD Erroneous dataset Note
LOD2 LOD2-F1-arbitrarysemantics [2.2MB] Brep
LOD3 LOD3-arbitrarysemantics [21.2MB] Brep. Missing Window and Door
(substituted with the 3 classes).

Gross topological errors

The following LOD3 datasets contain topological errors such as broken solids and polygons whose interior lies outside it.

LOD Erroneous dataset Note
LOD3 LOD3-error-topology [21MB] Brep. Most of the errors are outlying windows.
LOD3 LOD3_solid-error-topology [1.9MB] Solids. Most of the errors are self-overlapping roof tips.

Conditions for use

This is free software and you do not need a permission to use it or its data (in fact, I would be happy if you do that). In order to increase visibility you are kindly asked to do the following:

  • Wherever possible please acknowledge the source of the data (Random3Dcity) by citing the following research paper:

    • Biljecki, F., Ledoux, H., Stoter, J. (2016): Generation of multi-LOD 3D city models in CityGML with the procedural modelling engine Random3Dcity. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., vol. IV-4/W1: 51-59. [PDF] [DOI]

  • If you use the data and/or the software please contact me because I would like to list your project on this page as an example of use, and I would love to hear about how did you find it useful.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Showcase: publications and applications where the data has been used

The dataset has been used as data source for the following projects and/or publications:

Propagation of errors (investigating how uncertainty propagates through a GIS use-case)

  • Biljecki, F. et al. (2014): Error propagation in the computation of volumes in 3D city models with the Monte Carlo method. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., II-2, 31–39. [PDF] [DOI]

  • Biljecki, F. et al. (2015): Propagation of positional error in 3D GIS: estimation of the solar irradiation of building roofs. International Journal of Geographical Information Science, 29(12), 2269–2294. [PDF] [DOI]

  • Biljecki, F. et al. (2017): The effect of acquisition error and level of detail on the accuracy of spatial analyses. Cartography and Geographic Information Science, advance online publication. Open access [PDF] [DOI]

Validation and repair

Geometric references (research on the influence of different data specification to the usage of the data)

  • Biljecki, F. et al. (2016). The variants of an LOD of a 3D building model and their influence on spatial analyses. ISPRS Journal of Photogrammetry and Remote Sensing, 116, 42–54. [PDF] [DOI]

  • Biljecki, F. et al. (2014): Height references of CityGML LOD1 buildings and their influence on applications. Proceedings of the ISPRS 9th 3DGeoInfo Conference. [PDF] [DOI]

Other uses

  • Steuer, H. et al. (2015): Voluminator - Approximating the Volume of 3D Buildings to Overcome Topological Errors. Proceedings of the 18th AGILE Conference on Geographic Information Science [Link]

  • Biljecki, F. et al. (2015): Improving the consistency of multi-LOD CityGML datasets by removing redundancy. 3D Geoinformation Science. Selected papers of the ISPRS 9th 3DGeoInfo Conference. Lecture Notes in Geoinformation and Cartography. [PDF]

  • Afghantoloee, A. et al. (2014): Coverage estimation of geosensor in 3D vector environments. The 1st ISPRS International Conference on Geospatial Information Research. [PDF]

  • Doodman, S. et al. (2014): 3D extension of the Vor algorithm to determine and optimize the coverage of geosensor networks. The 1st ISPRS International Conference on Geospatial Information Research. [PDF]

  • Doodman, S. et al. (2014): A Voronoi-based approach for geosensor networks coverage determination and optimisation in 3D environments. Proceedings of the Extended Abstracts of 3D Geoinfo 2014.

  • Afghantoloee, A. et al. (2014): A perspective view-based approach to determine the geosensor coverage in 3D vector environments. Proceedings of the Extended Abstracts of 3D Geoinfo 2014.

Contact and feedback

Due to the completion of the project, I cannot promise updates. If you have a question feel free to contact me. Furthermore, please let me know if you use the data so I can put it in the showcase section.

Frequently asked questions

Yes, the project is open source. The code is released on the Random3Dcity Github project page.
They have been developed in 2014 by Filip Biljecki, PhD candidate at the Delft University of Technology, the Netherlands.
To tackle the absence of multi-LOD datasets in CityGML, to test my refined LOD specification, and to address the lack of procedural modelling engines supporting CityGML.
The generated data has been used in multiple application domains (check the showcase above). Let me know if you have a new idea.
CityGML files can be converted to OBJ to increase their usability. You can do this with my tool CityGML2OBJs.
Yes, I will gladly comply provided it won't take me too much time. Further, if your research overlaps with mine we can have a chat about a potential collaboration.
Please cite the research paper that is published about Random3Dcity, and clearly state that the source of the data is the engine Random3Dcity developed at TU Delft.
Yes. In this case, the number of 900 buildings was selected as the optimal balance between the diversity of the data and the size of the dataset. You can download the software and generate any number of buildings.
Yes. I have coded the generation of a street network (Transportation module in CityGML), and vegetation. But please understand that buildings are the most prominent theme of 3D city modelling, and I have focused on this class. Implementing further thematic classes and resulting in more realistic settings would take an amount of time which is not justifiable with respect the resulting benefit (from the perspective of my PhD needs).
In theory yes, but I did not do much work here and such datasets are not readily available. I have made a modified version of the engine, where I have randomised also the position of the building, however, this resulted in many of the buildings overlapping with each other. This can be solved, but I don't see much benefit in having data which is not gridded. On the other hand, this shortcoming unexpectedly became a feature, because it is used to test geometry repair software.
Random3Dcity generates a basic interior in multiple LODs, based on a research paper from an MSc student at my department. However, "full" LOD4-grade models with rooms, doors and furniture are not and will not be available in the future for many reasons. Implementing interiors would be a major effort, without much benefit for my project. Second, CityGML requires a higher degree of detail for the interior (rooms, doors, furniture) which is beyond the scope of this work. Third, LOD4 models are still mostly academic and do not have many applications.


This research is supported by the Dutch Technology Foundation STW, which is part of the Netherlands Organisation for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs. (Project code: 11300)