Skip to content

Commit

Permalink
add attention layer and AITM model
Browse files Browse the repository at this point in the history
  • Loading branch information
yangxudong committed Jun 13, 2024
1 parent 871a40e commit 17c13e5
Show file tree
Hide file tree
Showing 17 changed files with 526 additions and 99 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Running Platform:
- [DSSM](docs/source/models/dssm.md) / [MIND](docs/source/models/mind.md) / [DropoutNet](docs/source/models/dropoutnet.md) / [CoMetricLearningI2I](docs/source/models/co_metric_learning_i2i.md) / [PDN](docs/source/models/pdn.md)
- [W&D](docs/source/models/wide_and_deep.md) / [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [DCN](docs/source/models/dcn.md) / [FiBiNet](docs/source/models/fibinet.md) / [MaskNet](docs/source/models/masknet.md) / [PPNet](docs/source/models/ppnet.md) / [CDN](docs/source/models/cdn.md)
- [DIN](docs/source/models/din.md) / [BST](docs/source/models/bst.md) / [CL4SRec](docs/source/models/cl4srec.md)
- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [PLE](docs/source/models/ple.md)
- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [AITM](docs/source/models/aitm.md) / [PLE](docs/source/models/ple.md)
- [HighwayNetwork](docs/source/models/highway.md) / [CMBF](docs/source/models/cmbf.md) / [UNITER](docs/source/models/uniter.md)
- More models in development

Expand Down
Binary file added docs/images/models/aitm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 8 additions & 8 deletions docs/source/component/backbone.md
Original file line number Diff line number Diff line change
Expand Up @@ -1111,14 +1111,14 @@ MovieLens-1M数据集效果:

## 2.特征交叉组件

| 类名 | 功能 | 说明 | 示例 |
| -------------- | ---------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------- |
| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) |
| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) |
| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) |
| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| Attention | Dot-product attention | Transformer模型的组件 | |
| 类名 | 功能 | 说明 | 示例 |
| -------------- | --------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
| FM | 二阶交叉 | DeepFM模型的组件 | [案例2](#deepfm) |
| DotInteraction | 二阶内积交叉 | DLRM模型的组件 | [案例4](#dlrm) |
| Cross | bit-wise交叉 | DCN v2模型的组件 | [案例3](#dcn) |
| BiLinear | 双线性 | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| FiBiNet | SENet & BiLinear | FiBiNet模型 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
| Attention | Dot-product attention | Transformer模型的组件 | |

## 3.特征重要度学习组件

Expand Down
38 changes: 19 additions & 19 deletions docs/source/component/component.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,25 +86,25 @@ Dot-product attention layer, a.k.a. Luong-style attention.
The calculation follows the steps:

1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv).
2. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
3. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).

| 参数 | 类型 | 默认值 | 说明 |
| -------- | -------- | --- | ---------------- |
| use_scale | bool | False | If True, will create a scalar variable to scale the attention scores. |
| score_mode | string | dot | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
| dropout | float | 0.0 | Float between 0 and 1. Fraction of the units to drop for the attention scores. |
| seed | int | None | A Python integer to use as random seed incase of dropout. |
| return_attention_scores | bool | False | if True, returns the attention scores (after masking and softmax) as an additional output argument. |
| use_causal_mask | bool | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |

- inputs: List of the following tensors:
- query: Query tensor of shape (batch_size, Tq, dim).
- value: Value tensor of shape (batch_size, Tv, dim).
- key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
- output:
- Attention outputs of shape (batch_size, Tq, dim).
- (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).
1. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
1. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).

| 参数 | 类型 | 默认值 | 说明 |
| ----------------------- | ------ | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| use_scale | bool | False | If True, will create a scalar variable to scale the attention scores. |
| score_mode | string | dot | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
| dropout | float | 0.0 | Float between 0 and 1. Fraction of the units to drop for the attention scores. |
| seed | int | None | A Python integer to use as random seed incase of dropout. |
| return_attention_scores | bool | False | if True, returns the attention scores (after masking and softmax) as an additional output argument. |
| use_causal_mask | bool | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |

- inputs: List of the following tensors:
- query: Query tensor of shape (batch_size, Tq, dim).
- value: Value tensor of shape (batch_size, Tv, dim).
- key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
- output:
- Attention outputs of shape (batch_size, Tq, dim).
- (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).

## 3.特征重要度学习组件

Expand Down
118 changes: 118 additions & 0 deletions docs/source/models/aitm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# AITM

### 简介

在推荐场景里,用户的转化链路往往有多个中间步骤(曝光->点击->转化),AITM是一种多任务模型框架,充分利用了链路上各个节点的样本,提升模型对后端节点转化率的预估。

![AITM](../../images/models/aitm.jpg)

1. (a) Expert-Bottom pattern。如 [MMoE](mmoe.md)
1. (b) Probability-Transfer pattern。如 [ESMM](esmm.md)
1. (c) Adaptive Information Transfer Multi-task (AITM) framework.

两个特点:

1. 使用Attention机制来融合多个目标对应的特征表征;
1. 引入了行为校正的辅助损失函数。

### 配置说明

```protobuf
model_config {
model_name: "AITM"
model_class: "MultiTaskModel"
feature_groups {
group_name: "all"
feature_names: "user_id"
feature_names: "cms_segid"
...
feature_names: "tag_brand_list"
wide_deep: DEEP
}
backbone {
blocks {
name: "mlp"
inputs {
feature_group_name: "all"
}
keras_layer {
class_name: 'MLP'
mlp {
hidden_units: [512, 256]
}
}
}
}
model_params {
task_towers {
tower_name: "ctr"
label_name: "clk"
loss_type: CLASSIFICATION
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128]
}
use_ait_module: true
weight: 1.0
}
task_towers {
tower_name: "cvr"
label_name: "buy"
losses {
loss_type: CLASSIFICATION
}
losses {
loss_type: ORDER_CALIBRATE_LOSS
}
metrics_set: {
auc {}
}
dnn {
hidden_units: [256, 128]
}
relation_tower_names: ["ctr"]
use_ait_module: true
ait_project_dim: 128
weight: 1.0
}
l2_regularization: 1e-6
}
embedding_regularization: 5e-6
}
```

- model_name: 任意自定义字符串,仅有注释作用

- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字

- feature_groups: 配置一组特征。

- backbone: 通过组件化的方式搭建的主干网络,[参考文档](../component/backbone.md)

- blocks: 由多个`组件块`组成的一个有向无环图(DAG),框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑,构建TF Graph的一个子图
- name/inputs: 每个`block`有一个唯一的名字(name),并且有一个或多个输入(inputs)和输出
- keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer,执行一段代码逻辑;[参考文档](../component/backbone.md#keraslayer)
- mlp: MLP模型的参数,详见[参考文档](../component/component.md#id1)

- model_params: AITM相关的参数

- task_towers 根据任务数配置task_towers
- tower_name
- dnn deep part的参数配置
- hidden_units: dnn每一层的channel数目,即神经元的数目
- use_ait_module: if true 使用`AITM`模型;否则,使用[DBMTL](dbmtl.md)模型
- ait_project_dim: 每个tower对应的表征向量的维度,一般设为最后一个隐藏的维度即可
- 默认为二分类任务,即num_class默认为1,weight默认为1.0,loss_type默认为CLASSIFICATION,metrics_set为auc
- loss_type: ORDER_CALIBRATE_LOSS 使用目标依赖关系校正预测结果的辅助损失函数,详见原始论文
- 注:label_fields需与task_towers一一对齐。
- embedding_regularization: 对embedding部分加regularization,防止overfit

### 示例Config

- [AITM_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/aitm_on_taobao.config)

### 参考论文

[AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
6 changes: 4 additions & 2 deletions docs/source/models/loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2
| PAIRWISE_LOGISTIC_LOSS | pair粒度的logistic loss, 支持自定义pair分组 |
| JRC_LOSS | 二分类 + listwise ranking loss |
| F1_REWEIGHTED_LOSS | 可以调整二分类召回率和准确率相对权重的损失函数,可有效对抗正负样本不平衡问题 |
| ORDER_CALIBRATE_LOSS | 使用目标依赖关系校正预测结果的辅助损失函数,详见[AITM](aitm.md)模型 |

- 说明:SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING
- 支持参数配置,升级为 [support vector guided softmax loss](https://128.84.21.199/abs/1812.11317)
Expand Down Expand Up @@ -71,9 +72,9 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2

- f1_beta_square: 大于1的值会导致模型更关注recall,小于1的值会导致模型更关注precision
- F1 分数,又称平衡F分数(balanced F Score),它被定义为精确率和召回率的调和平均数。
- ![f1 score](../images/other/f1_score.svg)
- ![f1 score](../../images/other/f1_score.svg)
- 更一般的,我们定义 F_beta 分数为:
- ![f_beta score](../images/other/f_beta_score.svg)
- ![f_beta score](../../images/other/f_beta_score.svg)
- f1_beta_square 即为 上述公式中的 beta 系数的平方。

- PAIRWISE_FOCAL_LOSS 的参数配置
Expand Down Expand Up @@ -159,3 +160,4 @@ EasyRec支持两种损失函数配置方式:1)使用单个损失函数;2

- 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》
-[Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/abs/2111.10603)
- [AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
1 change: 1 addition & 0 deletions docs/source/models/multi_target.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@
esmm
mmoe
dbmtl
aitm
ple
simple_multi_task
2 changes: 1 addition & 1 deletion easy_rec/python/layers/keras/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .auxiliary_loss import AuxiliaryLoss
from .attention import Attention
from .auxiliary_loss import AuxiliaryLoss
from .blocks import MLP
from .blocks import Gate
from .blocks import Highway
Expand Down
Loading

0 comments on commit 17c13e5

Please sign in to comment.