A list of puns related to "Time series database"
I have a Davis Pro2 weather station and I am having trouble connecting weewx on a raspberry pi 4 to my influxdb database with this github code https://github.com/matthewwall/weewx-influx
Do I need to install the setup.py version of weewx rather than the DEB Debian version for Raspiban? https://weewx.com/docs/usersguide.htm
I have 10 days of recording across 150 sensors saved in 5 minute HDF5 files. Each file is about 1 million rows (lowest sample rate but there is higher) by 150 columns, one of which is timestamp. There are about 3k of these files (so 3 billion rows x 150 columns at a minimum). I need to be able to query by time (often disjoint, e.g., 10am-2pm each day) and apply various transformations (z-score, pca/ica, clustering, wavelet etc) on the resulting data before returning. Currently I am getting all files and their write time (bash and python) and then using [write_time - 5minutes, write_time] as the window associated with that file. I pull in all files that contain some portion of the larger window I'm looking for (often hours) and then concatenating the data and applying transformations. I would like to just query a database for some time period and get the associated data or even better, just run the transformations somewhere in the database layer before returning. What type of database or data storing system would be best suited for my data?
Has anyone seen any performance data that compares something like influxdb between Splunk metrics (https://docs.splunk.com/Documentation/Splunk/8.0.0/Metrics/Overview).
I havenβt seen any formal performance comparison for specifically Splunk metrics.
My company is interested in building a large-scale metrics collection database and is comparing contenders. Iβm leaning something like timescale, but we already have Splunk, and we do not have a formal db admin, so interested what your recommendation would be if building this.
Hi All,
Time series databases are becoming more and more common these days, I couldnβt find an easy and deployable solution to build one using DynamoDB, so created one here: https://coderecipe.ai/architectures/24198611
Just thought I would share, let me know if itβs helpful or if you have any suggestions!
First time poster. I know only a little about databases, but would appreciate any suggestions that anyone can give to point me in the right direction.
I've been working on an application to calibrate weather data. I want to store the results of my calibration in a database which can be accessed by my program and eventually to be displayed in a web application. All I need to store is one parameter in 5-minute increments for a 0.5x0.5km grid across the entire United States. This is something like 1.6 trillion rows generated per year. 90% of them would be zeros. The rest of the values would be real numbers, 2 bytes of precision would be plenty. It's possible I would add a couple other parameters in the future.
The most recent couple days of data will need to be updated relatively frequently (every hour or so) as new data comes in. After this it will be unlikely to change but could still be queried.
Data will generally be updated and queried for a contiguous area over a certain time domain.
I'm trying to figure out what kind of database I should think about using for this sort of problem. I have a little experience with Sqlite and could imagine doing this with a column for x, a column for y, a column for datetime, and a column for the value. But I'm not sure that this is the optimal solution for such a large database and I'd prefer not have to start from scratch down the road.
Right now this is my personal project, but if/when it gets to this point hopefully my company will be on board, so an enterprise solution could be appropriate. I'm leaning toward running the application on an AWS instance.
Thanks in advance for any helpful suggestions! :)
Edit: The other thing I realized is significant is I would be willing to sacrifice being able to efficiently look up values in the database (such as, give me all the times the parameter was greater than such and such a value) as long as I can look it up by (x,y,t) location. Looking into array databases right now.
Suppose that I want to pull information from various sources once every hour and store them in my database and write error logs if any of them fail. The app itself can read and write to this DB at will in response to user requests
The idea I have right now is just to run the scraper script as a cron job every hour separate from the flask backend itself.
It seems ok but I feel like there's a more formal way of doing this.
Hello, i need some help to design a database with time series. Here is my porblem:
So it's like 500k/year for 1m time serie and 350/year for 1d time serie. Sensors are alive since 5 years. So we're looking at 5 * 20 = 100 time series at 2.5kk values and so on for other time series. Of course, stations and sensors are not synchronized (that's a shame).
Here is the pseudo-design i imagine (i'm no programmer here, i tryed my best):
First, station_table:
Then, sensor_type_table:
Then, timestamp_table:
Finally, measurement_table:
Then i create 1 station_table, 1 sensor_table, i populate with my stations and sensors type. Then, for each time series (20 stations * 20 sensors * 4 rates = 400) i create 400 timestamp_table and 400 corresponding measurement_table.
It feels very odd to me to create so much tables, am I right doing so?
Then i would like to perform some analysis (FT, MA... like 20 or more indicators). Would it be crazy to create a new column per indicator inside the measurement_table, to be able to retrieve them quickly? Or do i need to create some dedicated tables?
Oh and datas will keep coming : the system is still alive. I'll perform analysis with python (numpy) and will interact with the DB with python. I was planning on using pgSQL, i used it a few times to do some simple things (nothing about time series)
In case you're wondering, it's a study about 2D deformation of a solid (temperature is also needed to isolate some effects), there is 20 experiment in parallel.
Ever wondered how far your monitoring database could scale on a single node? Me too. So I run performance tests for InfluxDB, TimescaleDB and VictoriaMetrics on standard Google Cloud machine types with CPU counts varying from 1 to 64 and RAM sizes varying from 3.75GB to 260GB. Read the resulting article.
After obtaining a list of 50 clusters with K-Shape, is there a method that can compare individual time-series samples from a different database with each of the 50 original clusters and then get the same result had it been clustered with K-Shape? With reasonable accuracy of course.
Sorry if it's something that can't be done. I'm trying to learn this from scratch for a school project.
(Additional info: The 2 databases consist of Forex price samples from random points in time.)
So i'm making a network monitor on isp scale. Lets say 10metrics *26 interfaces * 50.000 switches
Aiming 10 minute polling interval, was going for 1min with influx..
= 13m metrics per 10minutes, = 1.3m metrics per minute = 22k metrics per second
If 10metrics/interface is too much I can sacrifice 5 of them for hourly polling.
Since we dont have ssd's on our servers we cannot use influxdb. Therefore im looking for a viable tsdb that works on spin disk storage
I can order any amount of cpu cores, hdd size, ram, well discussable..
Another question that comes to mind, do I have to cluster for this?
Got a nodejs(hate me for it but its blazing fast) app for snmp polling, stats aggregation depending on db, statsD for influx
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.