Skip to content

Commit

Permalink
Deployment on other Wikimedia wikis (#37)
Browse files Browse the repository at this point in the history
* Prepare for deployment on other Wikimedia wikis

* Better documentation after deployment for Commons

* Improve property extraction from diffs

* Remove references to Wikidata in code
  • Loading branch information
wetneb authored Sep 22, 2021
1 parent 000a476 commit a0c3c06
Show file tree
Hide file tree
Showing 24 changed files with 170 additions and 64 deletions.
2 changes: 1 addition & 1 deletion deployment/celery.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ spec:
spec:
containers:
- name: celery
image: docker-registry.tools.wmflabs.org/toollabs-python37-base:latest
image: docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
command: [ "/data/project/editgroups/www/python/src/tasks.sh" ]
workingDir: /data/project/editgroups/www/python/src
env:
Expand Down
2 changes: 1 addition & 1 deletion deployment/listener.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ spec:
spec:
containers:
- name: listener
image: docker-registry.tools.wmflabs.org/toollabs-python37-base:latest
image: docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
command: [ "/data/project/editgroups/www/python/src/listener.sh" ]
workingDir: /data/project/editgroups/www/python/src
env:
Expand Down
2 changes: 1 addition & 1 deletion deployment/migrator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ spec:
spec:
containers:
- name: migrator
image: docker-registry.tools.wmflabs.org/toollabs-python37-base:latest
image: docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest
command: [ "/data/project/editgroups/www/python/src/migrator.sh" ]
workingDir: /data/project/editgroups/www/python/src
env:
Expand Down
61 changes: 43 additions & 18 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,55 +69,69 @@ to fill in the following fields:
Once you validate this form, the edits listener will pick up the new tool (after a few minutes) and start ingesting its edits. If you
need to ingest past edits again, you can use the listener script with a date in the past to retrieve the previous edits.

Deploying on WMF Toollabs
-------------------------
Deploying on WMF Toolforge
--------------------------

In what follows we assume that the tool is deployed as the ``editgroups`` project.

- ``become editgroups``
- ``mkdir -p www/python/src``

Put the following contents in ``manifest.template`` in the home directory of the tool::

backend: kubernetes
type: python3.7

Install the dependencies in the virtualenv::

webservice shell
cd www/python
virtualenv venv --python /usr/bin/python3
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
git clone https://github.com/Wikidata/editgroups.git src
pip install -r src/requirements.txt

Configure static files::

mkdir -p src/static
ln -s src/static
ln -s src/static .

Put the following content in ``~/www/uwsgi.ini``::

[uwsgi]
check-static = /data/project/editgroups/www/python

and run ``./manage.py collectstatic``.

Create the SQL database:
Create the SQL database (outside of the ``webservice shell``):

- ``sql tools``
- ``CREATE DATABASE s1234__editgroups;`` where ``s1234`` is the SQL username of the tool
- ``CREATE DATABASE s1234__editgroups;`` where ``s1234`` is the SQL username of the tool (can be found in ``~/replica.my.cnf``)
- ``\q``

Configure database access and other settings::

cd ~/www/python/src/editgroups/settings/
echo "from .prod import *" > __init__.py
cp secret_wmflabs.py secret.py
cp secret_toolforge.py secret.py

Edit ``secret.py`` with the user
and password of the table (they can be found in ``~/replica.my.cnf``).
The name of the table is the one you used at creation above
(``s1234__editgroups`` where ``s1234`` is replaced by the username of
the tool).
the tool). Also, pick a secret key to store in ``SECRET_KEY``.

In the ``editgroups/settings/__init__.py`` you can also copy over
settings line from ``editgroups/settings/common.py`` and adapt them to
the wiki that you are running EditGroups for (for instance ``MEDIAWIKI_API_ENDPOINT`` and the following lines).
You should also adapt the allowed hostname (taken from ``editgroups/settings/prod.py``). It's easier
to add those to the ``__init__.py`` file to avoid editing files tracked by Git.

Put the following content in ``~/www/python/uwsgi.ini``::

[uwsgi]
static-map = /static=/data/project/editgroups/www/python/src/static

and run ``./manage.py collectstatic`` in the ``~/www/python/src`` directory.


Configure OAuth login:

- Request an OAuth client id at https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/propose. Beyond the normal editing scopes, you will also need to perform administrative actions (delete, restore) on behalf of users, so make sure you request these scopes too.
- Request an OAuth client id at https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/propose. As OAuth protocol version, use "OAuth 1.0a". As callback URL, use the domain of the tool and tick the box to treat it as a prefix. Beyond the normal editing scopes, you will also need to perform administrative actions (delete, restore) on behalf of users, so make sure you request these scopes too.
- Put the tokens in ``~/www/python/src/editgroups/settings/secret.py``

Migrate the database:
Expand All @@ -126,9 +140,20 @@ Migrate the database:

Run the webserver:

- ``webservice --backend kubernetes python start``
- ``webservice start``

Go to the webservice, login with OAuth to the application. This will create a ``User`` object that you can then mark as staff in the Django shell, as follows::

$ webservice shell
source ~/www/python/venv/bin/activate
cd www/python/src
./manage.py shell
from django.contrib.auth.models import User
user = User.objects.get()
user.is_staff = True
user.save()

Launch the listener and Celery in Kubernetes:
Launch the listener and Celery in Kubernetes. These deployment files may need to be adapted if you are not deploying the tool as the ``editgroups`` toolforge tool but another tool id:

- ``kubectl create -f deployment/listener.yaml``
- ``kubectl create -f deployment/celery.yaml``
Expand Down
4 changes: 2 additions & 2 deletions dump_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
import sys
from sseclient import SSEClient as EventSource
from dateutil import parser
from store.stream import WikidataEditStream
from store.stream import WikiEditStream

if __name__ == '__main__':
s = WikidataEditStream()
s = WikiEditStream()
offset = None
if len(sys.argv) > 1:
offset = parser.parse(sys.argv[1])
Expand Down
20 changes: 20 additions & 0 deletions editgroups/context_processors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from django.conf import settings

def mediawiki_site_settings(request):
return {
'MEDIAWIKI_API_ENDPOINT': settings.MEDIAWIKI_API_ENDPOINT,
'MEDIAWIKI_BASE_URL': settings.MEDIAWIKI_BASE_URL,
'MEDIAWIKI_INDEX_ENDPOINT': settings.MEDIAWIKI_INDEX_ENDPOINT,
'PROPERTY_BASE_URL': settings.PROPERTY_BASE_URL,
'USER_BASE_URL': settings.USER_BASE_URL,
'USER_TALK_BASE_URL': settings.USER_TALK_BASE_URL,
'CONTRIBUTIONS_BASE_URL': settings.CONTRIBUTIONS_BASE_URL,
'WIKI_CODENAME': settings.WIKI_CODENAME,
'USER_DOCS_HOMEPAGE': settings.USER_DOCS_HOMEPAGE,
'MEDIAWIKI_NAME': settings.MEDIAWIKI_NAME,
'DISCUSS_PAGE_PREFIX': settings.DISCUSS_PAGE_PREFIX,
'DISCUSS_PAGE_PRELOAD': settings.DISCUSS_PAGE_PRELOAD,
'REVERT_PAGE': settings.REVERT_PAGE,
'REVERT_PRELOAD': settings.REVERT_PRELOAD,
'WIKILINK_BATCH_PREFIX': settings.WIKILINK_BATCH_PREFIX
}
18 changes: 18 additions & 0 deletions editgroups/settings/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
"social_django.context_processors.backends",
"social_django.context_processors.login_redirect",
"tagging.filters.context_processor",
"editgroups.context_processors.mediawiki_site_settings",
),
'debug': True
}
Expand Down Expand Up @@ -183,6 +184,23 @@
SOCIAL_AUTH_EMAIL_LENGTH = 190

MEDIAWIKI_API_ENDPOINT = 'https://www.wikidata.org/w/api.php'
MEDIAWIKI_BASE_URL = 'https://www.wikidata.org/wiki/'
MEDIAWIKI_INDEX_ENDPOINT = 'https://www.wikidata.org/w/index.php'
PROPERTY_BASE_URL = MEDIAWIKI_BASE_URL + 'Property:'
USER_BASE_URL = MEDIAWIKI_BASE_URL + 'User:'
USER_TALK_BASE_URL = MEDIAWIKI_BASE_URL + 'User_talk:'
CONTRIBUTIONS_BASE_URL = MEDIAWIKI_BASE_URL + 'Special:Contributions/'
WIKI_CODENAME = 'wikidatawiki'
USER_DOCS_HOMEPAGE = 'https://www.wikidata.org/wiki/Wikidata:Edit_groups'
MEDIAWIKI_NAME = 'Wikidata'
DISCUSS_PAGE_PREFIX = 'Wikidata:Edit_groups/'
DISCUSS_PAGE_PRELOAD = 'Wikidata:Edit_groups/Preload'
REVERT_PAGE = 'Wikidata:Requests_for_deletions'
REVERT_PRELOAD = 'Wikidata:Edit_groups/Revert'
WATCHED_NAMESPACES = [0, 120]

WIKILINK_BATCH_PREFIX = ':toollabs:editgroups/b/'
REVERT_COMMENT_STAMP = ' ([[:toollabs:editgroups/b/EG/{}|details]])'

### Celery config ###
# Celery runs asynchronous tasks such as metadata harvesting or
Expand Down
2 changes: 1 addition & 1 deletion editgroups/settings/prod.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.7/howto/static-files/

STATIC_URL = '/editgroups/static/'
STATIC_URL = '/static/'
STATICFILES_DIRS = [
os.path.join(BASE_DIR, "static"),
]
Expand Down
33 changes: 33 additions & 0 deletions editgroups/settings/secret_toolforge.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import os
from .common import BASE_DIR

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'insert_a_random_hash_here'

# Database
# https://docs.djangoproject.com/en/1.7/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 's1234__editgroups', # adapt to the database you created
'HOST': 'tools.db.svc.eqiad.wmflabs',
'OPTIONS': {
'init_command': "SET sql_mode='STRICT_TRANS_TABLES'",
'charset': 'utf8mb4',
'read_default_file': os.path.expanduser("~/replica.my.cnf")
},
}
}

# Adapt those to the credentials you got
SOCIAL_AUTH_MEDIAWIKI_KEY = ''
SOCIAL_AUTH_MEDIAWIKI_SECRET = ''
SOCIAL_AUTH_MEDIAWIKI_URL = 'https://www.wikidata.org/w/index.php'
SOCIAL_AUTH_MEDIAWIKI_CALLBACK = 'https://editgroups.toolforge.org/oauth/complete/mediawiki/'

# Redis (if you use it)
REDIS_HOST = 'tools-redis'
REDIS_PORT = 6379
REDIS_DB = 3
REDIS_PASSWORD = ''

2 changes: 1 addition & 1 deletion editgroups/templates/editgroups/common.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
<button type="submit" class="btn btn-default">Submit</button>
</form> -->
<ul class="nav navbar-nav navbar-right">
<li><a href="https://www.wikidata.org/wiki/Wikidata:Edit_groups">About</a></li>
<li><a href="{{ USER_DOCS_HOMEPAGE }}">About</a></li>
{% if not user.is_authenticated %}
<li><a href="{% url "social:begin" 'mediawiki' %}">Login</a></li>
{% else %}
Expand Down
6 changes: 3 additions & 3 deletions listener.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@
import django
django.setup()

from store.stream import WikidataEditStream
from store.stream import WikiEditStream
from store.utils import grouper
from store.models import Edit

print('Listening to Wikidata edits...')
s = WikidataEditStream()
print('Listening to edits...')
s = WikiEditStream()
utcnow = datetime.utcnow()
try:
latest_edit_seen = Edit.objects.order_by('-timestamp')[0].timestamp
Expand Down
10 changes: 3 additions & 7 deletions revert/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def __str__(self):

def comment_with_stamp(self):
return (self.comment +
' ([[:toollabs:editgroups/b/EG/{}|details]])'.format(self.uid))
settings.REVERT_COMMENT_STAMP.format(self.uid))

def undo_summary(self, edit):
prefix = '/* undo:0||{}|{} */ '.format(edit.newrevid, edit.user)
Expand Down Expand Up @@ -63,13 +63,11 @@ def revert_edit(self, edit):
self.oauth_tokens['oauth_token_secret'])

# Get token
r = requests.get('https://www.wikidata.org/w/api.php', params={
r = requests.get(settings.MEDIAWIKI_API_ENDPOINT, params={
'action':'query',
'meta':'tokens',
'format': 'json',
}, auth=auth)
print('#### GET TOKEN')
print(r.text)
r.raise_for_status()
token = r.json()['query']['tokens']['csrftoken']

Expand Down Expand Up @@ -105,11 +103,9 @@ def revert_edit(self, edit):
'watchlist': 'nochange',
}

r = requests.post('https://www.wikidata.org/w/api.php',
r = requests.post(settings.MEDIAWIKI_API_ENDPOINT,
data=data, auth=auth)

print('#### UNDO EDIT')
print(r.text)
#r.raise_for_status()


Expand Down
2 changes: 1 addition & 1 deletion revert/templates/revert/initiate.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

{% block mainBody %}
<div class="page-header">
<h3>Undoing edit group by <a href="https://www.wikidata.org/wiki/User:{{ batch.user }}">{{ batch.user }}</a>: {{ batch.summary }} ({{ batch.uid }})</h3>
<h3>Undoing edit group by <a href="{{ USER_BASE_URL }}{{ batch.user }}">{{ batch.user }}</a>: {{ batch.summary }} ({{ batch.uid }})</h3>
</div>
<div class="revert-dialog">
<p>You are about to undo {{ batch.nb_revertable_edits }} edits{% if batch.nb_undeleted_new_pages %}, which will delete or restore {{ batch.nb_undeleted_new_pages }} items{% endif %}.</p>
Expand Down
12 changes: 6 additions & 6 deletions store/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ def archive_old_batches(cls, batch_inspector):

class Edit(models.Model):
"""
A wikidata edit as returned by the Event Stream API
A MediaWiki edit as returned by the Event Stream API
"""
id = models.IntegerField(unique=True, primary_key=True)
oldrevid = models.IntegerField(null=True)
Expand Down Expand Up @@ -306,16 +306,16 @@ class Meta:

@property
def url(self):
return 'https://www.wikidata.org/wiki/index.php?diff={}&oldid={}'.format(self.newrevid,self.oldrevid)
return '{}?diff={}&oldid={}'.format(settings.MEDIAWIKI_INDEX_ENDPOINT, self.newrevid, self.oldrevid)

@property
def revert_url(self):
if self.oldrevid:
return 'https://www.wikidata.org/w/index.php?title={}&action=edit&undoafter={}&undo={}'.format(self.title, self.oldrevid, self.newrevid)
return '{}?title={}&action=edit&undoafter={}&undo={}'.format(settings.MEDIAWIKI_INDEX_ENDPOINT, self.title, self.oldrevid, self.newrevid)
elif self.changetype == 'delete':
return 'https://www.wikidata.org/wiki/Special:Undelete/{}'.format(self.title)
return '{}Special:Undelete/{}'.format(settings.MEDIAWIKI_BASE_URL, self.title)
else:
return 'https://www.wikidata.org/w/index.php?title={}&action=delete'.format(self.title)
return '{}?title={}&action=delete'.format(settings.MEDIAWIKI_INDEX_ENDPOINT, self.title)

def __str__(self):
return '<Edit {} >'.format(self.url)
Expand Down Expand Up @@ -366,7 +366,7 @@ def ingest_edits(cls, json_batch):
tools = Tool.objects.all()

for edit_json in json_batch:
if not edit_json or edit_json.get('namespace') not in [0,120]:
if not edit_json or edit_json.get('namespace') not in settings.WATCHED_NAMESPACES:
continue
timestamp = datetime.fromtimestamp(edit_json['timestamp'], tz=UTC)

Expand Down
5 changes: 3 additions & 2 deletions store/stream.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import json
from sseclient import SSEClient as EventSource
from django.conf import settings

class WikidataEditStream(object):
class WikiEditStream(object):
def __init__(self):
self.url = 'https://stream.wikimedia.org/v2/stream/recentchange'
self.wiki = 'wikidatawiki'
self.wiki = settings.WIKI_CODENAME

def stream(self, from_time=None):
url = self.url
Expand Down
Loading

0 comments on commit a0c3c06

Please sign in to comment.