Skip to content

Data Service

tl;dr

Debug Information

Image: registry.datalab.tuwien.ac.at/dbrepo/data-service:1.8.1

  • Ports: 9093/tcp
  • Info: http://<hostname>:9093/actuator/info
  • Health: http://<hostname>:9093/actuator/health
    • Readiness: http://<hostname>:9093/actuator/health/readiness
    • Liveness: http://<hostname>:9093/actuator/health/liveness
  • Prometheus: http://<hostname>:9093/actuator/prometheus
  • Swagger UI: http://<hostname>:9093/swagger-ui/index.html view online

To directly access in Kubernetes (for e.g. debugging), forward the svc port to your local machine:

kubectl [-n namespace] port-forward svc/data-service 9093:80

Overview

The Data Service is responsible for inserting AMQP tuples from the Broker Service into the Data DB via Spring AMQP. To increase the number of consumers, scale the Data Service up.

Data Processing

The Data Service uses Apache Spark, a data engine to load data from/into the Data Database with a wide range of open-source connectors. The default deployment uses a local mode of embedded processing directly in the service until there exists a Bitnami Chart for Spark 4.

Retrieving data from a subset internally generates a view with the 64-character hash of the query. This view is not automatically deleted currently.

Caching

The Data Service uses Caffeine, a caching solution that is used to temporarily cache the connection details from the Metadata Service such that they don't have to be queried everytime e.g. a sensor measurement is inserted. By default, this information is stored for 60 minutes. System administrators can disable this behavior by setting CREDENTIAL_CACHE_TIMEOUT=0 (cache is deleted after 0 seconds).

Storage

The Data Service also is capable to upload files to the S3 backend. The default limit of Tomcat in Spring Boot is configured to be 2GB. You can provide your own limit with setting MAX_UPLOAD_SIZE.

By default, the Data Service removes datasets older than 24 hours on a regular basis every 60 minutes. You can set the MAX_AGE (in seconds) and S3_STALE_CRON to fit your use-case. You can disable this feature by setting S3_STALE_CRON to -, this may lead to storage issues as no space will be available inevitably. Note that Spring Boot uses its own flavor of cron syntax.

Limitations

  • Views in DBRepo can only have 63-character length (it is assumed only internal views have the maximum length of 64 characters).
  • Local mode of embedded processing of Apache Spark directly in the service using a local[2] configuration.

Do you miss functionality? Do these limitations affect you?

We strongly encourage you to help us implement it as we are welcoming contributors to open-source software and get in contact with us, we happily answer requests for collaboration with attached CV and your programming experience!

Security

(none)