-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are our scores misleading? #339
Comments
Random idea: Move the total scores to the bottom. Makes readers not immediately think "oh X is the best, whatever my usecase I'll just go for X" but does keep the challenge to the bundlers to improve their total scores :) |
That's certainly a quick fix. @argyleink @una: what do you think? |
If we move4 the Summary section down, I think we'll want to rework the first section to have some more visual information and break up the text blocks. This doesn't seem like a quick fix as we'll need to rethink the IA: Alternatively as a quick fix, we could add some text in the sidebar: [EDIT] Or underneath: |
the data is the data, and people will be people; data above or below, people still the same 🤷 the scores arent misleading imo. people will forever read the headline and bail or search for the headline / tldr so they can bail. what we've done is present the information from the top down, rollup up to unrolled. it's all there, and not in a raw format, in a multi-level digestible format. it'd only be misleading if we prevented folks from seeing more information and only gave them the card totals. which we arent: our data is open, transparent, and accessible. i don't feel we need to change the design. nothing we can do to prevent incorrect extrapolation, i feel we've done our best here to prevent that already. |
One idea for the future could be to let people change the score weights. If one app doesn't use, say, web workers, they might want to opt out of anything that grades these aspects (i.e., set a weight of 0). If they don't mind overly much, but want still consider somewhat, say, lack of customization, they might set the related weights to 0.1. It could be interactive with sliders… |
We already have weights for the tests, so letting those be customized sounds like an interesting idea. |
I think that's an interesting idea. It would be a nice future feature to allow for customized scores based on user needs, but it feels like this might be better placed on a "frameworks" overview page or elsewhere, as this would essentially become a recommendation engine or wizard. Currently, the site intention is to present the research data without specific recommendations. |
@jeremy-coleman How so? If you can provide us with a way for browserify to consume es modules, please let us know (but this should be in a separate issue). |
I respectfully (partly) disagree. To a degree yes, the same info is there and you can pull your own weird conclusions from it no matter what. That said, humans seeing a total score above the fold have drawn conclusions before looking at the rest of the data. Presenting context about the test, the individual test results and then a summary gives users more incentive to see more of the page, catching visuals of certain bundlers scoring better in certain categories, before arriving at a conclusion. I'd (again) propose moving the conclusions down, with a clear disclaimer on them, and using the first viewport to set expectations of the page
They are not, if set in the appropriate context of the set of tests chosen. That information is extremely important and my worry as well as the person on Twitter that sparked this issue is that just seeing the rollup doesn't imply the explicit need for the context enough.
Again, I hope you agree the mere totals on their own shouldn't be the headline of the page, as they in isolation say nothing of value about any of the bundlers. The idea mentioned here of modifying the weights for different purposes is of course the best way to add value to these totals, but a step further from the current purpose of the site. As a conclusion, my suggestion of moving the totals isn't to prevent any potential biased extrapolation of results, but to help set them in the correct context. The difference is subtle but in my personal opinion very beneficial in your content strategy. |
Why not do something similar to what Lighthouse does? The tests are already grouped into categories, but perhaps distilling those into 4 or 5 keys areas of concern would be enough to create a "scorecard" of sorts to present at the top. |
@emilio-martinez also a fine suggestion in my book! Anything that can help better set the total number in the correct context is a win in my book here |
What about changing “x out of y tests passed” to something like “x out of y solutions developed”? That way, it reads more like possible/difficult, while also being an implicit call to action for community members to submit or develop missing solutions. That’d also add a degree of separation to the summary stats. |
"Are our scores misleading?" Is a good question to ask. So perhaps a better solution to the question is not about the numbers, but providing a concise text summery for each tool? EG:
|
I'm in agreement with @argyleink - let's not let feelings get in the way of good, hard data. Anyone else old enough to remember the Acid Tests for browsers and how important those were in getting browsers on the same page in terms of support? They didn't tiptoe around what failed and what didn't. It was right there in your face, and it was useful. Removing or obscuring easy to digest data will reduce the usability of the tool. Adding a hard to ignore "read this first" ahead of the scorecards is reasonable. From there, if people want to skip the hard to ignore information, it's on them. |
I think we can have an adult conversation here that doesn't involve "flat earth". |
Perhaps then, if deemed proper for adults, a bit of googling on the fallacy of suppressed evidence and inductive vs deductive reasoning may be in order. |
@jeremy-coleman constructive comments are welcome here. Comparisons to "flat earth" and "urr just Google it" are not constructive. Keep it civil please. |
This could be used to handle my concern about appearing to encourage features that aren't compatible with native platform capabilities. One category could be something like "non-standard module types" or "non-standard extensions" and I could ignore it :) |
The Acid Tests were testing specified behavior. Many bundler features, outside of the desire to actually preserve module semantics, are not specified and some are some are really opinionated. So reducing the score to a single number loses a lot more information compared to Acid. |
@jakearchibald let me make this as simple as i possibly can (fallacy) (scientific method) I realize the tests here are tdd style, which is fine for development and a great fit for this project, but you cannot publicly report the absence of a pass as a failure. Instead, present the findings as a solution-only recipe book not a pass/fail test suite. |
Jeremy,
For that test, yes you can. From https://bundlers.tooling.report/about/:
The "your" is important there. Parcel fails tests where you'd need to write your own plugin, because it doesn't have usable plugin documentation right now. If you're likely to need to write your own plugins, then failure to pass those tests is a strong indicator for your situation. In other situations, maybe it doesn't matter. As you can see from the OP and other issues, I am concerned that the overall score doesn't effectively communicate that, so your point is already covered by others, and others managed to cover it without invoking flat earth or ostriches. If I've missed your point, can you make it without obfuscation? If you're struggling to do that effectively, please reach out to me directly ([email protected], or Twitter DM), and I'll help you figure it out. |
Jake, I should have said "but you cannot publicly report the absence of a pass as a failure (without being misleading)". You claim yes you can, but I think it's fair to assume a reader will interpret a failure as "this bundler can't do this task". When in reality, the data is saying "we havn't figured out how to do this task with this bundler yet".( @argyleink thoughts? ). You also said, “...Parcel fails tests where you would need to write your own plugin...”. This is the core issue at hand. The "failed" tests could be due to the bundler itself or any number of non-bundler issues, including misconfiguration, lack of user knowledge, lack of existing plugins, lack of knowledge of the existence of plugins, etc. It is impossible to be certain if the failure is due to the bundler or user-error. Therefore, it seems the answer to “are our scores misleading?” is unequivocally yes and the discussion should be if/how to mitigate it. I'm not saying the project is shit and you should tear it down and set it on fire, just this specific issue should maybe be addressed (mainly because i am a browserify fan boy). But, why even ask the question if you don't want to entertain a logical answer? |
had a brief chat on twitter with @argyleink and @surma about this today, and was linked here, so I wanna add my thoughts and feedback if they are of any use. what problem does the "summary" score solve?whether it's at the the top of the bottom, or displayed in a different way ... what is it really solving? if I'm missing any positive value it introduces, I'm happy to learn more ... but in my view, it's creating a "gaming dynamic" which signals "winners and losers", "better and worse", etc ... humans are gonna lean towards the lazy path, and just pick the "top" item. feedback: simply remove the summary section, let developers read through the itemized list to realize what matters to them / their project's needs. on "failing" teststhis also creates a "competitive" signal, where if some tool doesn't support certain functionality, it's deemed as a failure? I don't think all tools need to have parity of features, and certainly developers using those tools in their projects don't necessarily NEED all these features for their projects ... if a new tools is built tomorrow that only does 10 things really well and it's only purpose is to do those 10 things, and targets a specific types of projects .. is it not worthy to be included? what if I don't need my bundler to handle "image compression"? or my project has no concern for "Custom Type imports", or any "Non-JavaScript Resources" for that matter? feedback: use terms like "supported" / "unsupported" / "partially supported" to clearly signal and help the reader pick tools with features that fit their project's needs I guess this also brings up in my mind the target audience, many developers who are not building and shipping "open source" projects don't need full coverage of features and functionality, rather they are better off with tools that best fit their project needs ... I'm thinking of Enterprise Developers who are often overworked / too busy to deep dive and just pick the highest signal tooling and get stuck with their choices for years ... |
We're currently seeing build tools competing on this number. Fixing long-standing bugs and improving documentation. |
@jeremy-coleman sorry I hadn't seen you'd edited your post:
I'm sorry, but that isn't the reality. Failure means it can't 'reasonably' be achieved with the bundler. There is some wiggle room in 'reasonably', but it's roughly:
We made the site available to the tool authors weeks before we went public, and they reviewed the tests. We're also continuing to work with them to update the site as bugs are fixed and documentation is written #357. Sure, maybe a test is marked as a failure because, we, and the authors of the tool itself, couldn't make it work, when in reality it can be made to work according to the criteria above, but surely we've done due diligence. Also, we accept bug reports and PRs on the data we've provided.
I don't understand this point. From a news story to a scientific paper, you could say "it's impossible to be certain if the information is accurate or an error", but I don't know what that proves. In a scientific paper, this is generally mitigated by providing detailed test conditions and raw results, so a third party can assess the conclusion and reproduce the test. I feel like we've done exactly this with tooling.report, no? |
To bring this back to the original point: I have some sympathy for the concerns around the scores. I do think they are valuable and create a bit of healthy competition, but I also agree that they don’t need to be at the top of the page. I liked the idea of moving the overall scores to after the grid, maybe with an extra paragraph explaining the nuance of the scores shown. Optional: I also liked the idea of adding a total score per section, which might actually be valuable information for users. |
@surma I like both moving the scores to after the grid and the intermediate totals! |
For a customized view, we can do something like this based on the sub-categories, enabling scores to be adjusted for different user needs. It's a start. I'm thinking something like the Material UI Chip would be perfect. This also would work on mobile, as they would just stack. The little "+" can rotate into the "x" as the colors change as well. The results would also respond to the filtering. |
I for one was very confused that the tools are not sorted by decreasing score from left to right. At first I thought browserify had the highest score and was confused after I already looked at half the comparisons. However after reading this thread I realized that scores are not all and sorting by name might be more convenient for the future. Still I think there are probably people who fall for this trap. For this reason I would welcome any changes which would somehow emphasize that score are not all when it comes to making your selection. |
https://twitter.com/boriscoder/status/1277937351164035078
Do we want people to switch to Rollup because it has a higher score? Well, no, and we explain that in our FAQ, but I'm not sure folks will see that.
I like the scores as 'a bit of fun', and something for tool maintainers to aim for, but should we add a disclaimer or something?
The text was updated successfully, but these errors were encountered: