SJTUTables is a benchmark dataset collection designed for Relational Table Learning (RTL). Released as part of the rLLM project, it includes three enhanced relational table datasets: TML1M, TLF2K, and TACM12K. Derived from well-known classical datasets, each dataset is paired with a standard classification task. Their simple, easy-to-use, and well-organized structure makes them an ideal choice for quickly evaluating and developing RTL methods.
- TML1M is derived from the classical MovieLens1M dataset and contains three relational tables related to movie recommendation: users, movies, and ratings.
- TLF2K is derived from the classical LastFM2K dataset and includes three relational tables related to music preferences: artists, user-artist interactions, and user-friend relationships.
- TACM12K is derived from the ACM heterogeneous graph dataset and contains four relational tables for academic publications: papers, authors, writing relationships, and citation relationships.
@article{rllm2024,
title={rLLM: Relational Table Learning with LLMs},
author={Weichen Li and Xiaotong Huang and Jianwu Zheng and Zheng Wang and Chaokun Wang and Li Pan and Jianhua Li},
year={2024},
eprint={2407.20157},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2407.20157},
}