|
|
BerlinMOD
|
What is BerlinMOD?
BerlinMOD is a benchmark for spatio-temporal database management
sytems (STDBMS). It is intended as a tool for both
-
comparing different implementation details of STDBMS, as moving
object data types, index structures, and spatio-temporal operators;
-
comparing the performance of different STDBMS.
What Data is used by BerlinMOD?
BerlinMOD primarily measures the performance on queries employing
moving point data. Moving point data are sampled from simulated cars
driving on the street network of the german capital Berlin in a
representative way. The simulation models the behavier of workers
commuting between their homes and work places, and additional trips
in their leisure time.
Normally, the sampled data is mapped to the street network, but it
is also possible to disturb the data.
In generating the benchmark data, BerlinMOD relies on real spatial
data on the roads of the german capital Berlin, imported from the
tool bbbike (
http://bbbike.de)
written by Slaven Rezic, who gave us kind permission to use his data
for scientific purposes.
The data is generated using not a dedicated data generator
program, but a script for the extensible
Secondo DBMS. This makes
BerlinMOD flexible, as you can easily modify the benchmark to your
own needs.
Of course it is possible to export the generated data to use it in
your favourite DBMS.
We have also generated the benchmark data for different scale factors.
The different datasets and their characteristics are given in Table 1,
that also provides links to download the data in CSV and ESRI shape
file format.
Table 1: Pregenerated BerlinMOD Data.
Scale Factor
|
Days
|
Vehicles
|
Trips
|
Units
|
bbbike coords
|
wgs84 coords
|
|
CSV old
|
CSV new
|
Shape
|
CSV
|
Shape
|
Indicated file sizes are for the compressed archieve sizes, the size of the unpacked CSV
data for scale factor 1.0 is about 11 GB, the uncompressed shape data 23 GB.
A data format description for the files and the generator settings used are available.
|
0.005
|
2
|
141
|
1,797
|
346,657
|
CSV, 10 MB
|
CSV, 5 MB
|
SHAPE, 15 MB
|
CSV, 4 MB
|
SHAPE, 11 MB
|
0.05
|
6
|
447
|
15,045
|
2,998,674
|
CSV, 92 MB
|
CSV, 46 MB
|
SHAPE, 132 MB
|
CSV, 37 MB
|
SHAPE, 98 MB
|
0.2
|
13
|
894
|
62,510
|
12,091,785
|
CSV, 372 MB
|
CSV, 183 MB
|
SHAPE, 534 MB
|
CSV, 150 MB
|
SHAPE, 397 MB
|
1.0
|
28
|
2,000
|
292,940
|
56,129,943
|
CSV, 1.7 GB
|
CSV, 857 MB
|
SHAPE, 2.5 GB
|
CSV, 706 MB
|
SHAPE, 1.8 GB
|
After unzipping one of the files into the
secondo/bin/BerlinMOD
directory, the file contents can be imported into an open
Secondo
database by executing the suitable script for
CSV or
SHAPE.
Another pregenerated BerlinMOD dataset can be downloaded
here (8.6 GB). It consists
of 750,000 trajectories in geographic coordinates with corresponding street names and
elevation data. After unpacking the file into your local
secondo/bin directory, you can import the data into an
existing
Secondo database via the command
restore Trips from Trips.
What Queries are used in BerlinMOD?
The benchmark uses two different forms of representation for the
created movements: the object based (one position history per
vehicle, a concatenation of all the vehicle's movements during the
observation period) and the trip based approach (one position history
per trip and vehicle). For each of these representations, the
benchmark provides 17 range-style queries (called BerlinMOD/R), and
9 nearest-neighbours queries (called BerlinMOD/NN).
The queries deal with predicates on
standard data, but mainly with spatial, temporal, and spatio-temporal
predicates.
The first version of BerlinMOD only provided the BerlinMOD/R queries.
An article on BerlinMOD has been published 2009 in the VLDB Journal:
Christian Düntgen, Thomas Behr, Ralf Hartmut Güting.
BerlinMOD: A Benchmark for Moving Object Databases.
The VLDB Journal 18:6 (2009), 1335-1368.
What do I need for BerlinMOD?
This depends on your intentions:
- You want to benchmark your system...
- Just download the pregenerated BerlinMOD data.
- You want to compare your system against Secondo...
- Download Secondo, the BerlinMOD base data files and the BerlinMOD script files.
- You want to use the generator to create your own data...
- Download Secondo, the BerlinMOD script files, and probably the BerlinMOD data files.
How do I use BerlinMOD?
- You want to benchmark your DBMS or just use the data...
- Unpack the downloaded data and import it into your system (the data format description may be useful). You can look up the benchmark queries from one of our articles and translate the queries into your query language.
- You want to compare your system against Secondo...
- Install Secondo on your test platform. Set up the BerlinMOD data generator to export the data to your preferred data format. Then start the data generator. Import the created data into your system and translate the benchmark queries. You can execute the benchmark object builder and query scripts on the same Secondo system.
- You want to use the generator to create your own data...
- Install Secondo and read our article to learn about the parameters and input files used. Then you can adapt the data generator to your needs.
Using the Data Generator
After installing
Secondo (
project website), copy the base data files
streets.data, homeRegions.data, workRegions.data to your
Secondo binary directory. Extract the generator and benchmark scripts into your
Secondo binary directory.
You may now set up the data generator according to your wishes. For further details, please confer with our
article.
Change to your
Secondo binary directory and type
SecondoTTYNT -i BerlinMOD_DataGenerator.SEC to generate the benchmark data.
Running the Benchmark on Secondo
You can also create the benchmark data, create all database objects (including indexes) for running the benchmark on the Secondo DBMS and run all benchmark queries by calling: SecondoTTYNT -i BerlinMOD_Complete.SEC
By default, a scale factor of 0.05 is used in these scripts.
Further information and instruction is contained in the script files and within the technical report.
Examples and Use Cases
BerlinMOD and the BerlinMOD data can be used in different ways to support reasearchers. Here, we present some examples.
Assessing Representations for Moving Object Histories
Here, the BerlinMOD data and the BerlinMOD/R query set were used to assess different representation variants for moving object data. The
Secondo command scripts used for the experiments are available as a zip archieve. The archieve contains a file
ReadMe.txt with further instructions. The scripts transform the BerlinMOD data into several different representation modes, and create the according index structures required within the BerlinMOD/R query scripts. This example may inspire you to use BerlinMOD and
Secondo as a test bed for your own experiments in MODB research. There is also a paper discussing 5 example queries from these scripts, and a technical report describing the experiments and their results.
Spatiotemporal Pattern Queries
This paper proposes an elegant way to formulate Spatiotemporal Pattern queries. Some experiments were done in order to establish first efficiency results.
The BerlinMod data generator was used to create moving object data for 50,0000, 100,000, 200,000, and 300,000 cars over one day used in the tests.
M.A. Sakr and R.H. Güting
Spatiotemporal Pattern Queries.
GeoInformatica 15:3 (2011), 497-540.
Efficient k-Nearest Neighbor Search on Moving Object Trajectories
BerlinMOD data was used for experiments comparing a new R*-tree index access method supporting k-NN-queries on moving objects with two other methods using the TB-tree index. The BerlinMOD data generator was set up to create the "Cars" test data set: 2,000 cars moving on 1 day.
R.H. Güting, T. Behr, and J. Xu
Efficient k-Nearest Neighbor Search on Moving Object Trajectories.
The VLDB Journal 19:5 (2010), 687-714.
I want to talk about that...
Last modification: 2011-09-20 by Thomas Behr . |