The objective of this project is to convert observations of temperatures and thermostat ON/OFF cycles in cooling or heating modes from text files into indexed time series in pandas and NumPy.
The package's functionality can easily work with other types of data. For example, batteries also involve duty cycles. The package can also be used with time-stamped observations of state of charge or other measurements.
By convention, the primary time-stamped data will be from these sources:
The package will format raw data and match the results across the sources based on metadata such as device ID's and/or location ID's and time, in order to form multi-dimensional time series.
It can automatically detect the type of data in each column of a text file, based on the data itself and based on column labels. The detection allows for any ordering of columns in the input data.
In order to select data relating to thermostats or temperatures in zip codes in specific states, metadata files are needed. Example files are also in the data folder.
This project is intended to accelerate analysis of time-stamped data including thermostat operations at the device level and temperature data (indoor or outdoor). It does this by putting observations into an indexed form that can be summarized in aggregated form and at the device level. It supports visualization in time series form. The ultimate intent is to support analysis related to HVAC control, power systems or energy efficiency research.
The package may be installed using pip or conda.
Python versions supported:
Assuming no version is specified, the latest version of caar will be installed.
Pip installation from PyPI
pip install caar
Conda installation from Anaconda.org
conda install -c nickpowersys caar
Sample input files are in the data directory at https://github.com/nickpowersys/CaaR.
CaaR can be used to read delimited text files and (optionally) save the data in Python pickle files for fast access.
Common delimited text file formats including commas, tabs, pipes and spaces are detected in that order within the first row and the first delimiter detected is used. In all cases, rows are only used if the number of values match the number of column labels in the first row.
Each input file is expected to have (at least) columns representing IDs, time stamps (or starting and ending time stamps for cycles), and (if not cycles) corresponding observations.
To use the automatic column detection functionality, use the keyword argument 'auto' within the picklefromfile() or dictfromfile() function (see the notebook example or API documentation) and assign it one of the values: 'cycles', 'inside', or 'outside' (for example, auto='inside').
The IDs should contain digits, and may also contain letters (leading zeroes are also allowed in place of letters). Having the string 'id', 'Id' or 'ID' in the column heading will cause that column to be the ID index within the combined ID-time stamp index for a given input file. If there is no such label, the leftmost column with alphanumeric strings (for example, 'T12' or '0123') will be taken as the ID.
Next, CaaR can create pandas DataFrames. CaaR and the pandas library offer many functions for summarizing and analyzing the data.
CaaR can convert DataFrames into NumPy time series arrays, for plotting/visualization and deeper data analysis.
I would welcome any feedback on features that would be useful. The project is a work in progress.
This project is licensed under the terms of the BSD 3-Clause License.