Step 1: Parse pcaps/bro logs into a JSON file
- When parsing pcaps, run the pcap_processing/parse_bro_logs_v2.py script.
- When parsing bro logs, run the pcap_processing/parse_logs.py script.
- The output consist in the JSON file and a folder with the certificates.
Step 2: Extract features
- Run the clustering/prep/extract_features.py script, specifying the corresponding avclass and virustotal files, if any, and the folder with the certs. The TLS fingerprints can be located at clustering/data/tls_fprints/
- The output is a TSV file with the feature vectors.
Step 3: Run the clustering
- When using the script clustering/run_clustering.py, a dataset_name must be specified. The script will look for two files into the data/groundtruth folder by default, the feature vectors (dataset_name.tsv) and the related avclass file (dataset_name.labels)
- The output will be generated inside a folder with the same name as the dataset_name.