Exploratory Analysis in Python
Use Python to explore and analyze data in Synnax
This guide will walk you through basic analysis on an example data set using Synnax. We’ll import the data set, explore it using the Synnax Console, run analysis, and attach post-processed results.
Prerequisites
Start a local cluster
This guide assumes you’ve started a local cluster running on localhost:9090
using the
instructions here. You’re more than welcome to use a remote
cluster, but you’ll need to keep in mind that those connection parameters may be different
than the ones used in this guide.
Install the Synnax Console
We’re going to use the Synnax Console to plot our data. To install it, follow the instructions here.
Install and authenticate the Python client
Finally, you’ll need to make sure you have the synnax
Python client installed and
authenticated with your cluster. A guide for doing so is available here.
Importing a data set
Our first step is to import a data set into Synnax. We’ll be using a sample CSV file we’ve generated just for this guide. To download it, run the following command:
curl -O https://raw.githubusercontent.com/synnaxlabs/synnax/main/docs/examples/april_9_wetdress.csv
Next, we’ll use the CLI to import our data set into Synnax. To do so, run
sy ingest april_9_wetdress.csv
This command will begin an interactive import process that will prompt us with a few questions:
Welcome to the Synnax Ingestion CLI! Let's get started.
Using saved credentials.
Connection successful!
Would you like to ingest all cahnnels? [y/n] (y):
This question is asking if we want to import data for all columns in the file. We do, so we’ll hit enter. Next, Synnax will check if all of the columns in the CSV correspond with existing channels in our cluster:
Validating that channels exist...
The following channels were not found in the database:
┏━━━━━━━━━━━━━━━━━┓
┃ name ┃
┡━━━━━━━━━━━━━━━━━┩
│ ec_pressure_5 │
│ ec_pressure_7 │
│ ec_pressure_9 │
│ ec_pressure_11 │
│ ec_pressure_12 │
│ ec_pressure_14 │
│ ec_pressure_19 │
│ ec_tc_0 │
│ ec_tc_1 │
│ Time │
└─────────────────┘
Would you like to create them? [y/n] (y):
We do want to create them, so we’ll just hit enter.
Any any channels indexes (e.g. timestamps)? [y/n] (y):
Synnax is asking if any of the columns in the contain timestamps that tell us when samples were taken. These types of columns are called “indexes” and are used to execute queries by time range. If you’d like to read more about the different channel types in Synnax, see this page.
In our case, we have a ‘Time’ column that contains timestamps, so we’ll enter y
and hit
enter. Synnax will then ask us to select the name of our index column:
You can enter 'all' for all channels or a comma-separated list of:
1) Names (e.g. 'channel1, channel2, channel3')
2) Channel indices (e.g. '1, 2, 3')
3) A pattern to match (e.g. 'channel*, sensor*')
channels:
We’ll enter Time
, and move on to the next step:
Do all non-indexed channels have the same data rate or index? [y/n] (y):
This question asks us if our Time
column represents the timestamps for all of the other
columns in the data set. In our case, it does, so we’ll enter y
and continue. Synnax
will then ask us to enter the name of our time column:
Enter the name of an index or a data rate:
Again, we’ll enter Time
and continue. Then, Synnax will ask us about how we’d like
to assign data types to our channels:
Please select an option for assigning data types:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Guess data types from file │
│ Assign the same data type to all channels (excluding indexes) │
│ Group channels by data type │
└───────────────────────────────────────────────────────────────┘
Select an option # [0/1/2] (0):
We’ll select the first option, which will automatically guess the channel’s data types from the data in the file. We’ve now successfully created all of the channels in our data set.
The next time we import a data set with the same channels, Synnax will automatically detect the channels and skip this process.
Now, Synnax will ask us to confirm the starting timestamp for our data set.
Identified start timestamp for file as 2023-04-10T12:07:23.662716-04:00.
Is this correct? [y/n] (y):
This looks right, so we’ll hit enter. Finally, Synnax will ask us to name our data set, which we’ll use to reference it in future steps.
Please enter a name for the data set
Name (4_9_wetdress_data_cleaned.csv):
We’ll enter April 9 Wetdress
and hit enter. Synnax will then begin importing the data
set. When it’s done, we’ll see the following message:
━━━━━━━━━━━━━━━━━━ 100% 85740 out of 85570 samples - 7477108.2235981515 samples/s
Plotting our data set
Now that we’ve imported our data set, we can use Python to analyze it. We’ll create a new Python file and instantiate the Synnax client:
import synnax as sy
client = sy.Synnax()
Next, we’ll retrieve the range referencing the data we just imported:
data = client.ranges.retrieve("April 9 Wetdress")
ec_pressure_12
on our data set contains our flowmeter’s pressure differential readings
in psi. To make things easier to work with, we’ll alias ec_pressure_12
to flow_dp
:
data.ec_pressure_12.set_alias("flow_dp")
Next, we’ll filter our data set to only include samples where flow_dp
is greater than
5 psi
pressure_mask = data.flow_dp > 5
filtered_dp = data.flow_dp[pressure_mask]
filtered_time = data.Time[pressure_mask]
We’ll also clean up our Time
channel to use elapsed seconds instead of unix epoch
nanoseconds. We can use the sy.elapsed_seconds
utility to accomplish this:
elapsed_time = sy.elapsed_seconds(filtered_time)
Finally, we’ll plot our data with matplotlib:
import matplotlib.pyplot as plt
plt.plot(elapsed_time, filtered_dp)
plt.xlabel("Elapsed Time (s)")
plt.ylabel("Flow DP (psi)")
plt.show()
We’ll save this file as plot.py
and run it with the following command:
python plot.py
This will open a window with the following plot: