add attention layer and AITM model

alibaba · Jun 13, 2024 · 17c13e5 · 17c13e5
1 parent 871a40e
commit 17c13e5
Show file tree

Hide file tree

Showing 17 changed files with 526 additions and 99 deletions.
diff --git a/README.md b/README.md
@@ -63,7 +63,7 @@ Running Platform:
 - [DSSM](docs/source/models/dssm.md) / [MIND](docs/source/models/mind.md) / [DropoutNet](docs/source/models/dropoutnet.md) / [CoMetricLearningI2I](docs/source/models/co_metric_learning_i2i.md) / [PDN](docs/source/models/pdn.md)
 - [W&D](docs/source/models/wide_and_deep.md) / [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [DCN](docs/source/models/dcn.md) / [FiBiNet](docs/source/models/fibinet.md) / [MaskNet](docs/source/models/masknet.md) / [PPNet](docs/source/models/ppnet.md) / [CDN](docs/source/models/cdn.md)
 - [DIN](docs/source/models/din.md) / [BST](docs/source/models/bst.md) / [CL4SRec](docs/source/models/cl4srec.md)
-- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [PLE](docs/source/models/ple.md)
+- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [AITM](docs/source/models/aitm.md) / [PLE](docs/source/models/ple.md)
 - [HighwayNetwork](docs/source/models/highway.md) / [CMBF](docs/source/models/cmbf.md) / [UNITER](docs/source/models/uniter.md)
 - More models in development
 

diff --git a/docs/images/models/aitm.jpg b/docs/images/models/aitm.jpg
diff --git a/docs/source/component/backbone.md b/docs/source/component/backbone.md
@@ -1111,14 +1111,14 @@ MovieLens-1M数据集效果：
 
 ## 2.特征交叉组件
 
-| 类名             | 功能               | 说明           | 示例                                                                                                                         |
-| -------------- | ---------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------- |
-| FM             | 二阶交叉             | DeepFM模型的组件  | [案例2](#deepfm)                                                                                                             |
-| DotInteraction | 二阶内积交叉           | DLRM模型的组件    | [案例4](#dlrm)                                                                                                               |
-| Cross          | bit-wise交叉       | DCN v2模型的组件  | [案例3](#dcn)                                                                                                                |
-| BiLinear       | 双线性              | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
-| FiBiNet        | SENet & BiLinear | FiBiNet模型    | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
-| Attention      | Dot-product attention | Transformer模型的组件 | |
+| 类名             | 功能                    | 说明               | 示例                                                                                                                         |
+| -------------- | --------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
+| FM             | 二阶交叉                  | DeepFM模型的组件      | [案例2](#deepfm)                                                                                                             |
+| DotInteraction | 二阶内积交叉                | DLRM模型的组件        | [案例4](#dlrm)                                                                                                               |
+| Cross          | bit-wise交叉            | DCN v2模型的组件      | [案例3](#dcn)                                                                                                                |
+| BiLinear       | 双线性                   | FiBiNet模型的组件     | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
+| FiBiNet        | SENet & BiLinear      | FiBiNet模型        | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
+| Attention      | Dot-product attention | Transformer模型的组件 |                                                                                                                            |
 
 ## 3.特征重要度学习组件
 

diff --git a/docs/source/component/component.md b/docs/source/component/component.md
@@ -86,25 +86,25 @@ Dot-product attention layer, a.k.a. Luong-style attention.
 The calculation follows the steps:
 
 1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv).
-2. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
-3. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).
-
-| 参数       | 类型       | 默认值 | 说明               |
-| -------- | -------- | --- | ---------------- |
-| use_scale | bool | False | If True, will create a scalar variable to scale the attention scores. |
-| score_mode | string | dot | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
-| dropout | float | 0.0 | Float between 0 and 1. Fraction of the units to drop for the attention scores. |
-| seed | int | None | A Python integer to use as random seed incase of dropout. |
-| return_attention_scores | bool | False | if True, returns the attention scores (after masking and softmax) as an additional output argument. |
-| use_causal_mask | bool | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |
-
-  - inputs: List of the following tensors:
-    - query: Query tensor of shape (batch_size, Tq, dim).
-    - value: Value tensor of shape (batch_size, Tv, dim).
-    - key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
-  - output: 
-    - Attention outputs of shape (batch_size, Tq, dim). 
-    - (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).
+1. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
+1. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).
+
+| 参数                      | 类型     | 默认值   | 说明                                                                                                                                                                                                                                     |
+| ----------------------- | ------ | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| use_scale               | bool   | False | If True, will create a scalar variable to scale the attention scores.                                                                                                                                                                  |
+| score_mode              | string | dot   | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
+| dropout                 | float  | 0.0   | Float between 0 and 1. Fraction of the units to drop for the attention scores.                                                                                                                                                         |
+| seed                    | int    | None  | A Python integer to use as random seed incase of dropout.                                                                                                                                                                              |
+| return_attention_scores | bool   | False | if True, returns the attention scores (after masking and softmax) as an additional output argument.                                                                                                                                    |
+| use_causal_mask         | bool   | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.                                                     |
+
+- inputs: List of the following tensors:
+  - query: Query tensor of shape (batch_size, Tq, dim).
+  - value: Value tensor of shape (batch_size, Tv, dim).
+  - key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
+- output:
+  - Attention outputs of shape (batch_size, Tq, dim).
+  - (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).
 
 ## 3.特征重要度学习组件
 

diff --git a/docs/source/models/aitm.md b/docs/source/models/aitm.md
@@ -0,0 +1,118 @@
+# AITM
+
+### 简介
+
+在推荐场景里，用户的转化链路往往有多个中间步骤（曝光->点击->转化），AITM是一种多任务模型框架，充分利用了链路上各个节点的样本，提升模型对后端节点转化率的预估。
+
+![AITM](../../images/models/aitm.jpg)
+
+1. (a) Expert-Bottom pattern。如 [MMoE](mmoe.md)
+1. (b) Probability-Transfer pattern。如 [ESMM](esmm.md)
+1. (c)  Adaptive Information Transfer Multi-task (AITM) framework.
+
+两个特点：
+
+1. 使用Attention机制来融合多个目标对应的特征表征；
+1. 引入了行为校正的辅助损失函数。
+
+### 配置说明
+
+```protobuf
+model_config {
+  model_name: "AITM"
+  model_class: "MultiTaskModel"
+  feature_groups {
+    group_name: "all"
+    feature_names: "user_id"
+    feature_names: "cms_segid"
+    ...
+    feature_names: "tag_brand_list"
+    wide_deep: DEEP
+  }
+  backbone {
+    blocks {
+      name: "mlp"
+      inputs {
+        feature_group_name: "all"
+      }
+      keras_layer {
+        class_name: 'MLP'
+        mlp {
+          hidden_units: [512, 256]
+        }
+      }
+    }
+  }
+  model_params {
+    task_towers {
+      tower_name: "ctr"
+      label_name: "clk"
+      loss_type: CLASSIFICATION
+      metrics_set: {
+        auc {}
+      }
+      dnn {
+        hidden_units: [256, 128]
+      }
+      use_ait_module: true
+      weight: 1.0
+    }
+    task_towers {
+      tower_name: "cvr"
+      label_name: "buy"
+      losses {
+        loss_type: CLASSIFICATION
+      }
+      losses {
+        loss_type: ORDER_CALIBRATE_LOSS
+      }
+      metrics_set: {
+        auc {}
+      }
+      dnn {
+        hidden_units: [256, 128]
+      }
+      relation_tower_names: ["ctr"]
+      use_ait_module: true
+      ait_project_dim: 128
+      weight: 1.0
+    }
+    l2_regularization: 1e-6
+  }
+  embedding_regularization: 5e-6
+}
+```
+
+- model_name: 任意自定义字符串，仅有注释作用
+
+- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字
+
+- feature_groups: 配置一组特征。
+
+- backbone: 通过组件化的方式搭建的主干网络，[参考文档](../component/backbone.md)
+
+  - blocks: 由多个`组件块`组成的一个有向无环图（DAG），框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑，构建TF Graph的一个子图
+  - name/inputs: 每个`block`有一个唯一的名字（name），并且有一个或多个输入(inputs)和输出
+  - keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer，执行一段代码逻辑；[参考文档](../component/backbone.md#keraslayer)
+  - mlp: MLP模型的参数，详见[参考文档](../component/component.md#id1)
+
+- model_params: AITM相关的参数
+
+  - task_towers 根据任务数配置task_towers
+    - tower_name
+    - dnn deep part的参数配置
+      - hidden_units: dnn每一层的channel数目，即神经元的数目
+    - use_ait_module: if true 使用`AITM`模型；否则，使用[DBMTL](dbmtl.md)模型
+    - ait_project_dim: 每个tower对应的表征向量的维度，一般设为最后一个隐藏的维度即可
+    - 默认为二分类任务，即num_class默认为1，weight默认为1.0，loss_type默认为CLASSIFICATION，metrics_set为auc
+    - loss_type: ORDER_CALIBRATE_LOSS 使用目标依赖关系校正预测结果的辅助损失函数，详见原始论文
+    - 注：label_fields需与task_towers一一对齐。
+  - embedding_regularization: 对embedding部分加regularization，防止overfit
+
+### 示例Config
+
+- [AITM_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/aitm_on_taobao.config)
+
+### 参考论文
+
+[AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
diff --git a/docs/source/models/loss.md b/docs/source/models/loss.md
@@ -19,6 +19,7 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 | PAIRWISE_LOGISTIC_LOSS                     | pair粒度的logistic loss, 支持自定义pair分组                          |
 | JRC_LOSS                                   | 二分类 + listwise ranking loss                                |
 | F1_REWEIGHTED_LOSS                         | 可以调整二分类召回率和准确率相对权重的损失函数，可有效对抗正负样本不平衡问题                     |
+| ORDER_CALIBRATE_LOSS                       | 使用目标依赖关系校正预测结果的辅助损失函数，详见[AITM](aitm.md)模型                  |
 
 - 说明：SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING
   - 支持参数配置，升级为 [support vector guided softmax loss](https://128.84.21.199/abs/1812.11317) ，
@@ -71,9 +72,9 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
   - f1_beta_square: 大于1的值会导致模型更关注recall，小于1的值会导致模型更关注precision
   - F1 分数，又称平衡F分数（balanced F Score），它被定义为精确率和召回率的调和平均数。
-    - ![f1 score](../images/other/f1_score.svg)
+    - ![f1 score](../../images/other/f1_score.svg)
   - 更一般的，我们定义 F_beta 分数为:
-    - ![f_beta score](../images/other/f_beta_score.svg)
+    - ![f_beta score](../../images/other/f_beta_score.svg)
   - f1_beta_square 即为 上述公式中的 beta 系数的平方。
 
 - PAIRWISE_FOCAL_LOSS 的参数配置
@@ -159,3 +160,4 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
 - 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》
 - 《 [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/abs/2111.10603) 》
+- [AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
diff --git a/docs/source/models/multi_target.rst b/docs/source/models/multi_target.rst
@@ -7,5 +7,6 @@
    esmm
    mmoe
    dbmtl
+   aitm
    ple
    simple_multi_task
diff --git a/easy_rec/python/layers/keras/__init__.py b/easy_rec/python/layers/keras/__init__.py
@@ -1,5 +1,5 @@
-from .auxiliary_loss import AuxiliaryLoss
 from .attention import Attention
+from .auxiliary_loss import AuxiliaryLoss
 from .blocks import MLP
 from .blocks import Gate
 from .blocks import Highway
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,5 +7,6 @@ @@
        esmm
        mmoe
        dbmtl
+       aitm
        ple
        simple_multi_task