-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large wiki pages #1046
Comments
This also caused problems when we added mod_evasive based rate limiting because pages like map features generate hundreds of requests and HTTP/2 allows a browser to make all those requests (up to about 600 for map features) within a a few seconds. |
See also #466. Do we have a good way to identify the pages that need reducing in complexity? Some image-heavy pages in the previous issue were
Checking the reports for these was a pain because you have to wait for the page to load, and all of them are slow loading since they're part of the problem. Pages like https://wiki.openstreetmap.org/wiki/Featured_images/Jan-Mar_2019 do not appear to be a problem. Map Features is perhaps a worst-case scenario because it is both excessively complex and triggers too many requests. |
category such as https://wiki.openstreetmap.org/wiki/Category:Pages_where_node_count_is_exceeded is mentioned in mediawiki documentation but it is empty for some reason on OSM Wiki (maybe it needs to be configured? maybe it is hit by the bug that caused categories to be not filled?) It works at https://en.wikipedia.org/wiki/Category:Pages_where_node_count_is_exceeded |
I would expect that is for the hard limits on the server which are too high right now. Two others special categories that do list some items are |
Wikimedia servers will throttle requests from OSM – it even does so while viewing some large categories on Wikimedia Commons itself. But I was unaware of Wikimedia blocking OSM’s IP address. Is that documented somewhere? |
There are scripts that are supposed to run on the pages but we've had to turn off |
A server-side script like UpdateSpecialPages.php, or a client-side script like a gadget or site script? |
I assume he's referring to us recently dropping a job that ran One problem was that it leaks memory like a sieve, which is why the mediawiki manual advises running it in chunks and gives a script to do so but I found that even that was hanging, seemingly while trying to talk to commons, presumably when it was refreshing commons links. |
Some of the pages listed in #1046 (comment) can probably be split up or replaced with off-wiki resources. I’ve been trying to interest the community in replacing the “Map features” page with a dynamic off-wiki page based on id-tagging-schema: openstreetmap/id-tagging-schema#646. We could migrate large key pages, such as the one for The sign catalogs are tricky because their sole purpose is to annotate Commons images with community-developed tagging suggestions. MediaWiki’s links table is important because it lets readers know which tags are documented or undocumented and notifies proposal authors (via Echo) that their tags are being adopted by the community. Long-term, these catalogs should also live outside the wiki via either osmlab/name-suggestion-index#8225 or openstreetmap/id-tagging-schema#11. Note that both issues have been declined in favor of the other. In the meantime, if there is a pressing need, we could try uploading the full complement of Commons sign diagrams to the wiki locally. I’ve just given @matkoniecz some bad memories by floating this idea – it’ll probably just trade one set of problems for another. If refreshLinks.php is choking on Commons requests, then I wonder to what degree splitting up pages would even help. The script rebuilds the entire Ever since #466 was fixed by installing QuickInstantCommons, I’ve found that the image links are cached very aggressively. If I prospectively embed an image that I only later upload to Commons, it may take days for the page to reflect that upload. I was content with this delay as the tradeoff for being able to load the pages again. Spitballing a bit: One of the differences between InstantCommons and QuickInstantCommons is that the latter straight-up asks for thumbnails that don’t exist, expecting Commons to generate them on the fly. Do we have pages attempting to create giant thumbnails of JPEGs or complex SVGs? Back when #466 was a noticeable problem, it also affected some short pages that contained only one or two large images, nothing like the pathological sign catalogs I cited. |
Would it maybe help to mirror all Wikimedia Commons images used at OSM Wiki to local OSM Wiki? That could be annoying as it would require mirroring also documentation pages (or at least legally required credit part for CC-BY and other attribution-requiring images) and would require monitoring is image deleted on Commons as copyright violation but in principle is doable. |
All of them? That would be around 40,000 distinct images and a lot of disk space. It would be more feasible for the sign catalogs, which generally traffic in relatively simple SVGs in the public domain, but that’s still thousands of images and counting. This discussion seems like a bit of a catch-all for wiki performance issues attributed to page content. Commons would be one contributing factor, but it isn’t clear that cutting Commons out of the picture would yield a better tradeoff than migrating the Map Features page or one of the other massive pages off the wiki. We could also restart the effort to optimize commonly transcluded templates. I started on more performant rewrites of both |
EDIT: This is now posted at https://wiki.openstreetmap.org/wiki/Talk:Wiki#Too_complex_wiki_pages I propose to post at https://wiki.openstreetmap.org/w/index.php?title=Talk:Wiki&action=edit§ion=new following
feel free to propose changes, take it and post on your own or tell me that it is fine (and yes, I know that one of this pages is mine - I am just testing code change that should limit relevant page size of https://wiki.openstreetmap.org/w/index.php?title=User:Mateusz_Konieczny/unusual_shop_values/United_Kingdom&action=history ) |
“OpenHistoricalMap/Projects/Rennes” isn’t a particularly large or complex page, but it took several tries to load the page, each time taking upwards of 3 minutes, due to seven thumbnails of Commons images ranging from 4 to 8 megabytes and one broken image link, which has been fixed. (Subsequent loads came back quickly due to caching.) |
A Lua-based rewrite of |
A Lua-based rewrite of |
The |
https://wiki.openstreetmap.org/wiki/MUTCD/R has gone from 6.3s to 8.8s, with almost all the time spent in Template:Tag. It doesn't seem to have helped. Ultimately I think these pages need restructuring, but they're good as tests right now. I'm going to reach out to the wiki admins in case some of them aren't yet aware. |
It’s expected that the transclusion expansion time report would show that almost all the time is spent in just about the only template transcluded onto the page. 😉 How are you getting the 6.3 seconds? Is that from March, or from a version of the page that uses the old version of
Long-term, these traffic sign catalogs will be largely unnecessary once either id-tagging-schema or name-suggestion-index can be convinced to add traffic sign presets and osmberlin/osm-traffic-sign-tool#11 takes care of the remaining details that can’t be described on the sign itself. But that will just shift the spotlight onto other pages that are currently less obvious, like the one in #1046 (comment). |
Here are 15 more pages that are taking a relatively long time to load according to the NewPP limit report, representing a broad cross-section of the wiki’s English-language content. Now that I’ve accessed each of these pages, they load promptly, but the report shows that they took a long time at first:
Footnotes
|
I'm getting 504 Gateway Timeout on https://wiki.openstreetmap.org/wiki/Australian_Tagging_Guidelines/Road_Signage which also has a lot of commons images. The page used to work though. |
Ya, there's too many images on the page. |
I think it was an intermittent issue. When @andrewharvey posted #1046 (comment), I spot-checked some other pages with lots of images and they all were timing out similarly. But now they’re all snappy again. 🤞 Nevertheless, “Australian Tagging Guidelines/Road Signage” feels pretty unwieldy to me. I think it would be pretty intuitive to break it down into subpages by sign category, similar to what we did with the MUTCD pages. A reader will generally know which sign category they’re interested in based on the color and shape of the sign. For readers who need to look up the sign by tag, we can include a search box scoped to the Australian sign pages. Also consider splitting off per-state pages for any state-specific signs. |
Exactly it's fine now, at the time I checked the prometheus charts of the wiki server and it didn't look overloaded, so not sure why it had issues at the time. Ideally the issue causing the page to be difficult to load at times could be addressed (I know it's not straightforward though), breaking up the page is not ideal but might be okay as a workaround. Though it makes it harder to search using Ctrl+F and I disagree that someone will know which category to look in. |
The issue is likely Wiki Commons (Instant Commons) rate limiting. |
It could've been an intermittent issue on Wikimedia’s side, since these images all come from Wikimedia Commons and not enough of them are widespread enough on the wiki that they’d already have been cached by the time you view this page.
It is a tradeoff. The main MUTCD page labels each subpage with a model of what the typical sign for that category looks like, for visual identification. The Australian sign system is largely based on the American one, so that might work to some extent. That said, even some of the MUTCD per-state per-category pages have enough images to be affected by the timeouts whenever this happens, yet there isn’t a natural option for splitting them up further. |
A few pages on the wiki are too large and complex for Mediawiki to properly handle them. This has in the past lead to downtime when they have stopped working and brought the entire wiki down with them.
The problems seem to be coming from pages with hundreds of images from Wikimedia Commons and high complexity pages with thousands of transcludes. The most prominent example is the Map features page which requires about 30 seconds to generate, but there are other examples which are worse for complexity.
In the case of map features, clicking on the edit link takes you to
https://wiki.openstreetmap.org/w/index.php?title=Map_features&veaction=edit
which, if it does not crash the browser, will time out.On the complexity side, the number of images causes Wikimedia commons to consider the requests a DOS attack and block the user. This happens when the user views the page, but also when various scripts are run.
We've tried papering over the issues as they've come up but we need to address the root cause: Mediawiki does not work with pages this complex or with this many images.
By viewing source you can see a report at the bottom of the page in a HTML comment - search for
NewPP limit report
. For map features, it shows that has 5000+ transclusions and took 28.7s to generate when I loaded the page. For comparison, the front page has about 50 transclusions and takes 0.69 seconds to generate.Wikipedia's page on their template limits is a reasonable overview of some of the problems.
Another class of page with problems is lists of thousands of relations.
I think there's a few steps we need to do.
The text was updated successfully, but these errors were encountered: