Welcome to Cooka
Cooka is a lightweight and visualization system to manage datasets and design model learning experiments through web UI.
It using DeepTables and HyperGBM as experiment engine to complete feature engineering, neural architecture search and hyperparameter tuning automatically.

Features overview
Through the web UI provided by cooka you can:
Add and analyze datasets
Design experiment
View experiment process and result
Using models
Export experiment to jupyter notebook
Screen shots


The machine learning algorithms supported are :
XGBoost
LightGBM
Catboost
The neural networks supported are:
WideDeep
DeepFM
xDeepFM
AutoInt
DCN
FGCNN
FiBiNet
PNN
AFM
The search algorithms supported are:
Evolution
MCTS(Monte Carlo Tree Search)
The supported feature engineering provided by scikit-learn and featuretools are:
- Scaler
StandardScaler
MinMaxScaler
RobustScaler
MaxAbsScaler
Normalizer
- Encoder
LabelEncoder
OneHotEncoder
OrdinalEncoder
- Discretizer
KBinsDiscretizer
Binarizer
- Dimension Reduction
PCA
- Feature derivation
featuretools
- Missing value filling
SimpleImputer
It can also extend the search space to support more feature engineering methods and modeling algorithms.
Read more:
Installation
You can use docker,pip and source code to install Cooka.
Using pip
It requires Python3.6
or above, uses pip to install Cooka:
pip install --upgrade pip setuptools # (optional)
pip install cooka
then start cooka web server:
cooka server
Open browser visit site http://<your-ip>:8000
to use Cooka. If you want to integrate with jupyter notebook please refer to this guide.
Using Docker
You can also use Cooka through docker with command:
docker run -ti -p 8000:8000 -p 9001:9001 datacanvas/cooka:latest
# port 9001 is supervisor(which used to manage process) web ui, and the account/password is: user/123
# port 8000 is cooka web ui
Open browser and visit site http://<your-ip>:8000
to use cooka.
If you want to integrate with jupyter notebook, please specify jupyter url running in the container:
docker run -ti -p 8000:8000 -p 9001:9001 -p 8888:8888 -e COOKA_NOTEBOOK_PORTAL=http://<your_ip>:8888 datacanvas/cooka:latest
# port 8888 is jupyter notebook
You can persist data in the host:
docker run -v /path/to/cooka-config-dir:/root/.config/cooka -v /path/to/cooka-data:/root/cooka -ti -p 8000:8000 -p 9001:9001 datacanvas/cooka:latest
# Config file is at: /root/.config/cooka/cooka.py
# User data is at: /root/cooka
Using source code
Frontend developed by reactjs, therefore, we need to install node>=8.0.0 get it at https://nodejs.org and install yarn:
npm install yarn -g
Finally build frontend and install them all:
pip install --upgrade pip setuptools
git clone git@github.com:DataCanvasIO/Cooka.git
cd Cooka
python setup.py buildjs # build frontend
python setup.py install
User Guide
The purpose of this document is to help you to use Cooka in web browser, including:
Manage dataset
Preview dataset
Insight dataset
Design experiment
Experiment list
We recommend Chrome v59 or above to visit Cooka.
Manage dataset
You can upload or import data from the server for training and the data file should meet the following conditions:
At least 2 columns
In CSV format
With headers or not
You can also choose the sampling strategy to analyze the dataset, supported sampling strategies:
by rows
by percentage
whole data
Datasets can be added in the dataset list page:
image
Upload
Users can upload files to Cooka through the browser to create datasets:
image
Import
Users can import files in server to Cooka. This way is friendly to large files, import process:
Input file path in server and wait it check passed
Click “Analyze” button, cooka will do analyze and the progress informantion should be displayed in the bar on the right
Check dataset name and click “Create” button to confirm
image
Preview dataset
You can view CSV file data in table view in “Preview” page:
image
Insight dataset
Cooka will analyze the information of dataset, including:
distribution of different feature types
data type, feature type, unques, missing percentage, linear correlation of feature
image
It will show mode and numerical distribution for categorical features:
image
For continuous features, you can view the min/max/mean/median/standard deviation and the values distribution or interval distribution:
image
The values distribution:
image
For datetime feature you can see statistics by year, month, day, hour, week:
image
Cooka can note the poor quality feature and the reason, for example:
image
The reason may be:
Correlation is too low
Missing percentage is too high
Constant feature
Id-ness feature(Every value is unique)
Design experiment
Users can design modeling experiments to define real-life problems as a modeling task. On the data exploration page, users can select one as target column:
image
In the experiment design page, you can choose quick training mode or performance training mode. Quick mode uses general search space and less search times, while performance mode uses more comprehensive search space and more search times. Fast mode makes a balance between training time and model effect, while performance mode sacrifices time to improve model effect:
image
It will infer task type by the target column. There is 2 experiments engine HyperDT and HyperGBM, and they use neural network and GBM algorithms. If your data is in datatime order, you can select datetime series column, cooka will use the older data to train and the newer data to test model effect:
image
Experiment list
You can see the running status of the training task on the experiment list page, such as training progress and model score:
image
Cooka supports early stopping, when the performance of the model can not be improved, it will terminate the training process in advance to save computing resources.
Model evaluation
For the completed experiments, you can see the evaluation of model including:
Confusion matrix and ROC curve for binary-classification
Evaluaion metrics
image
The ROC curve:
image
In the process of searching, Y-axes values represent the hyper parameters used in training, and the color of lines represents the score of the model, the darker the color, the higher the score:
image
Model predict
The model will be saved in the training end, Users can upload test data to use the model, and the prediction progress will be displayed on the page:
image
The results can be downloaded when prediction finished.
Export to notebook
You can export experiment to notebook to custom modeling:
image
An example of Notebook:
img
Notebook also contains explanation of prediction:
img
And feature importance:
img
If it is a binary classification task, the ROC curve and confusion matrix of the model will also be included.
Configuration
Configuration file
Cooka provides a command to generate config template:
❯ cooka generate-config
# Configuration file for cooka
# HTTP Server port
# c.CookaApp.server_port = 8000
# Language, zh_CN or en_US, auto; if is auto will read localization from use browser
# c.CookaApp.language = "auto"
# Data to storage
# c.CookaApp.data_directory = "~/cooka"
# Integrate with jupyter, Jupyter notebook work dir should at `c.CookaApp.data_directory`
# c.CookaApp.notebook_portal = "http://localhost:8888"
# Default optimize metric
# c.CookaApp.optimize_metric = {
# "multi_classification_optimize": "accuracy",
# "binary_classification": "auc",
# "regression": "rmse"
# }
# Default trial nums
# c.CookaApp.max_trials = {
# "performance": 50,
# "quick": 10,
# "minimal": 1
# }
Write it to configuration file at ~/.config/cooka/cooka.py
:
mkdir -p ~/.config/cooka/
cooka generate-config > ~/.config/cooka/cooka.py
Integrate with Jupyter Notebook
1. Install dependencies
In the experiment list, you can export the experiment to notebook:
It needs python module:
shap: model explain
jupyterlab: notebook server
matplotlib: plot in notebook
You may refer to this guide to install shap;
Install jupyterlab using pip:
pip install jupyterlab
matplotlib dependency on system package graphviz
take install it on centos7 as an example:
yum install graphviz
and then install matplotlib using pip:
pip install matplotlib
2. Start jupyter
Start a jupyterab server in cooka working directory, default it at ~/cooka
:
cd ~/cooka
jupyter-lab --ip=0.0.0.0 --no-browser --allow-root --NotebookApp.token=
3. Configure cooka
Then configuration notebook portal in cooka config file ~/.config/cooka/cooka.py
:
c.CookaApp.notebook_portal = "http://<change_to_you_jupyter_ip>:8888"
Finally, start the web server and try to export a experiment to notebook:
cooka server
Release Note
Version 0.1.2
Experiment design - Support input random state to split dataset
Experiment list - Use reward metric in visualmap of hyperparams line chart(Fix bug)
** Other ** - Update hypergbm to 0.2.2
Version 0.1.1
Dataset manage
Search
Delete
- Upload or import CSV
Sampling analysis
Support no column headers
Inferring feature types
Dataset preview
Cat origin file on line
Scrolling
Dataset insight
Distribution of feature type
Data type, feature type, missing percentage, uniques, linear correlation
Recognize Id-ness, constant, missing percentage too high features
Feature search
- Datetime features
Display by year, month, day, hour, week
- Categorical features
Distribution of values
Mode
- Continuous features
Distribution of interval
Distribution of values
max, min, median, mean, stand deviation
Experiment design
Recommend experiment options
HyperGBM,HyperDT as experiment engine
Quick, performance training mode
Train-Validation-Holdout data partition
Split data in datetime order
Support binary classification, multi-classification, regression
Experiment list
Training process,
Remaining time estimation
Confusion matrix and ROC curve for binary-classification
- Evaluation metrics
Binary classification: Accuracy, F1, Fbeta, Precision, Recall, AUC, Log Loss
Multi-classification: Accuracy, F1, Fbeta, Precision, Recall, Log Loss
Regression: EVS, MAE, MSE, RMSE, MSLE, R2, MedianAE
View train log and source code
Export to notebook
Hyper-params
Batch predict
DataCanvas

Cooka is an open source project created by DataCanvas .