-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathsearchHistory-README.txt
11 lines (7 loc) · 1.1 KB
/
searchHistory-README.txt
1
2
3
4
5
6
7
8
9
10
11
A log of searches performed in Dryad is now available at:
http://datadryad.org/downloads/dryadSearchLogs.txt
The search logs are not presented here directly, due to GitHub's limit on file size.
The search logs were extracted from our main Apache access logs using the following command:
zgrep "] \"GET /discover?query=" /opt/dryad-data/log_archives/var/log/httpd/dryad/datadryad.org-access_log*.gz | grep -v 192.107.175.11| awk '{ print $4 " " $7 } ' | grep -v "/discover?query=&submit=Go" | sed -e 's/&submit=Go//' | sed -e 's/\/discover?//' | sed -e 's/\[//' >dryadSearchLogs.txt
The extraction command excludes lines from the logs that do not contain searches. It also excludes searches from one particular IP address. This IP is used by the meta-search engine WorldWideScience.org. Since queries to WorldWideScience are not targeted specifically at Dryad, they are not considered relevant to this set of logs.
The extraction command filters the appropriate lines to remove identifiable information such as IP addresses and user agents. The resultant file contains timestamps and query strings, separated by a space.