About Anaconda Help Download Anaconda

octo / packages / ddui 3.0.4

  • 843 total downloads
  • Last upload: 5 years and 3 months ago

Installers

Info: This package contains files in non-standard labels.

pip install

To install this package run one of the following:
pip install -i https://pypi.anaconda.org/octo/simple ddui
pip install -i https://pypi.anaconda.org/octo/label/DD-191/simple ddui
pip install -i https://pypi.anaconda.org/octo/label/dev/simple ddui
pip install -i https://pypi.anaconda.org/octo/label/test_anaconda_version/simple ddui

Description

Airflow's DataDriver plugin

from Pandas' dataframes to Airflow pipelines

WHY :

In a machine learning project, there is a recurring problem with the difference between local interactive modeling source code and production pipelines source code. It is very error prone and, as a consequence, time consuming because we switch constantly between experimentation and production.

The Datadriver project aims to solve this issue by making the glue code based on Pandas and sklearn for modelization, and on Airflow for automation, scheduling, and monitoring of training and predicting pipelines.

Plugin description

Datadriver UI (ddui) is the Airflow's plugin we developed to track our models. Combined with the Datadriver's API (pyddapi), it offers a DAG view to track machine learning workflow (or dataflow).

More specifically, it shows the Output of any Airflow's Task with a lot of metrics and charts :

  • choose a DAG to track img/ddui_titan1.png
  • select a task to see charts and describe metrics on the outputtable img/ddui</em>titan3.png
  • look at histograms to verify if columns are correct (distributions, number of NAs, unique values, etc...) img/ddui_titan2.png

Getting started

git clone git_url_of_this_project && cd this_project

local install :

pip install -e .
ddui install

docker install :

./run_docker.sh

Package modules

ddui/
    dash_app -> the application defined like a Dash application, with callbacks and event handeling. It is imported in plugin.py later
    dash_components -> html custom components like a Panel or an Alert Div
    orm -> function to access the Airflow metastore and retrieve DAGs list and infos
    plot -> functions using plotly, they return a Graph object
    plugin -> defines the DataDriverUI plugin that implements Airflow's Plugin interface https://airflow.apache.org/plugins.html#interface
    views -> a FlaskAdminView that implements Dash too, to have the ability to include plotly charts in Airflow
dependencies graph

pydeps ddui

Developer setup

There is an existing DAG in tests/dags that mocks the behavior of Datadriver's API, but without any dependency to pyddapi.

You can use it to develop the User Interface, using the script located in tests/dev_tools.

cd tests/dev_tools
python run_webserver.py

It runs the Airflow's webserver, and it overrides the AIRFLOWCOREDAGS_FOLDER to look into tests/dags.

Setup your virtual env

virtualenv venv
source venv/bin/activate
pip install -e .
pip install -r ci/tests_requirements.txt
ddui install

© 2024 Anaconda, Inc. All Rights Reserved. (v4.0.4) Legal | Privacy Policy