Database Management System for Macroeconomic (and not only) Time-series

giorgi.tsu · December 17, 2020, 4:09pm

As far as I hear from various professors they are adopting database management systems to organize their research data in Macro. As they say, it requires some initial fixed cost to learn database management but then it saves u a lot of time in the future. I have measured and I am spending about 8-10 hours per week to organize and update relevant data that I need for my research. I am collecting all the variables in excel file. When I need to update (I have reminders set for each update of each variable), I am downloading new excel file and then I am updating my database by connecting several excel files.) While this is time consuming it is also error-prone. If I change the location of any file or the provider changes anything in the update excel file my technology fails badly and then I either need to copy manually or I need to change accordingly.

So, I am ready to incur this initial cost with the expectation that discounted lifetime profit will be positive. I have searched over the internet and I found plenty of database management systems (InfluxDB, SQL, MySQL, PostgreSQL) but by reading all of the manuals and then trying out several to find the best one I am almost sure my discounted lifetime profit will be negative. In order to optimize, I would like to ask experienced macro (and not only) researchers (I know many of you are here and probably this is the best place to reach you) the following question:

How do you organize your research data? (I am following about 132 series of different frequency)? or do you know anyone documenting this process?

I have not so ambitious goal and with the database management system I want to:

Create a database where I will have all these and many more
variables (I want each variable to have a name, description, source
description (ideally with URL included), date of last update etc.) stored.
Be able to update the database automatically from
downloaded Excel or CSV file (These files come in different formats
from various webpages. Some are structured vertically some horizontally).
Have some error checkers for easier debugging the base.
The series are of different frequencies (some are daily, some weekly, some monthly and some quarterly and yet some annual). I want to be able to easily convert from daily to monthly or from monthly to quarterly or vice versa using my own predetermined formula. (being it average or median or whatever the current task requires)
Be able to easily query any variable or several of the variables and create a dashboard in various programs like MATLAB or Python or Julia to visualize dynamics and use them for regression or model estimations in these programs.

**1. Will your suggested system perform all of the above? or is it too much and I should continue old ways? **

jpfeifer · December 20, 2020, 4:29pm

I doubt there is a general solution. Given the wide range of your data you need something can handle the various APIs provided by the data providers.

giorgi.tsu · December 20, 2020, 7:35pm

Thanks Professor.

Plus, that something needs to be able to download Excel or SCV files automatically, then read through them and update the relevant variables in the database because many data providers do not have API through which to fetch data and simply put out data in an Excel or CSV. If I dive into the issue and come up a solution I will write a note and post it here.

jpfeifer · December 21, 2020, 8:24am

Yes, indeed. I am not completely sure, but I guess you will need a pipeline consisting of two applications. One script-based or programming language to deal with fetching the data in arbitrary formats without an API and then another one storing the vintages of the data. From what I understand, both could be done with SQL.