Analyze your instant messaging behaviour. Exported WhatsApp Chat .txt files are parsed into a pandas dataframe, which is also enriched by feature columns.
- Install poetry, if not already installed.
git clone https://github.com/fabitosh/chat-analyzer.git
cd chat-analyzer
- Install project dependencies
poetry install
- Export your WhatsApp Chats.
- Configure
chat_analyzer/__init__.py
to your needs. I work with a gitignoreddata/
folder within the project. - Run
main.py
- View basic chat analysis in
data/visualiyed/
for each chat as html files. A dash application is wip.
main.py
chat_analyzer
├── __init__.py # Configuration
├── analysis
│ ├── __init__.py
│ └── analysis.py # Module for conducting analysis on chat data
│
├── data_processing
│ ├── __init__.py
│ └── load.py
│ ├── extract.py # Module for parsing raw chats and merging consecutive messages
│ └── feature_engineering.py # Define feature columns
│
├── utils
│ ├── __init__.py
│ └── data_definitions.py
│
├── visualization
│ ├── __init__.py
│ └── visualize.py
│
├── data
│ ├── raw # Raw chat exports
│ ├── processed # Parsed and enriched entire dataframes as pickle.
│ └── visualized # Basic chat visualization html files
│
├── notebooks # Explaratory notebooks can go here.
└── tests
Every chat should be parsed into a RawChat
DataFrame. If you are chatting to the same person through multiple messengers, the possibility to concat/combine two RawChat
should be an option if desired.
Manually export every chat into a .txt file. As of this writing, this was only possible from the phone. Dump all chats of interest into a folder and set said path as PATH_WHATSAPP_MSG
.
Parsing is still to be implemented. https://github.com/tbvdm/sigtop
Analysis can happen on multiple layers:
- Analysis which every message/row has.
- Statistics derived in the context of the chat should be included in
analysis.extract_single_chat_features()
. Example: Time it took to reply - Metrics that only need the row for context are put to
analysis.add_features()
. Example: Number of message symbols
- Statistics derived in the context of the chat should be included in
- Analysis of one chat behaviour:
- Analysis of the sender's instant messengers as whole