【记录】pytorch_tabular
-
地址:https://github.com/manujosephv/pytorch_tabular
介绍:为表格数据建模深度学习模型的标准框架。[PyTorch和PyTorch Lightning框架]
PyTorch Tabular 旨在让表格形式数据的深度学习变得容易,并且可供现实案例和研究使用。
安装:pip install pytorch_tabular[all]
文档:https://pytorch-tabular.readthedocs.io/en/latest/
可用模型:- FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.
- Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
- TabNet: Attentive Interpretable Tabular Learning is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.
- Mixture Density Networks is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.
- AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task
- TabTransformer is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.
- FT Transformer from Revisiting Deep Learning Models for Tabular Data
使用:
from pytorch_tabular import TabularModel from pytorch_tabular.models import CategoryEmbeddingModelConfig from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig data_config = DataConfig( target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented continuous_cols=num_col_names, categorical_cols=cat_col_names, ) trainer_config = TrainerConfig( auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate batch_size=1024, max_epochs=100, gpus=1, #index of the GPU to use. 0, means CPU ) optimizer_config = OptimizerConfig() model_config = CategoryEmbeddingModelConfig( task="classification", layers="1024-512-512", # Number of nodes in each layer activation="LeakyReLU", # Activation between each layers learning_rate = 1e-3 ) tabular_model = TabularModel( data_config=data_config, model_config=model_config, optimizer_config=optimizer_config, trainer_config=trainer_config, ) tabular_model.fit(train=train, validation=val) result = tabular_model.evaluate(test) pred_df = tabular_model.predict(test) tabular_model.save_model("examples/basic") loaded_model = TabularModel.load_from_checkpoint("examples/basic")