Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category descriptions #89

Merged
merged 8 commits into from
Jan 15, 2025
Merged

Category descriptions #89

merged 8 commits into from
Jan 15, 2025

Conversation

max-ostapenko
Copy link

  • Generated category descriptions and updated a sync to BQ table

@max-ostapenko max-ostapenko marked this pull request as ready for review December 20, 2024 19:03
@max-ostapenko max-ostapenko changed the title generated descriptions Category descriptions Dec 20, 2024
@rviscomi
Copy link
Member

So these are all AI-generated?

For some reason most of them try to connect the category back to web performance, which seems strange. For example:

Domain parking solutions redirect domains to a different location or page. These should be lightweight and avoid performance issues.

@max-ostapenko
Copy link
Author

max-ostapenko commented Dec 20, 2024

So these are all AI-generated?

Yes.

For some reason most of them try to connect the category back to web performance, which seems strange.

I have mentioned:

The HTTP Archive Tracks How the Web is Built.
We periodically crawl ...

to keep these aligned with the use-cases, but yeah, it leans too much on performance topic.
Let's just drop the second part.

@max-ostapenko
Copy link
Author

@pmeenan I see category descriptions are being pulled into detected_technologies, which is nice of our test run.
But can it break anything on WPT side?

@pmeenan
Copy link
Member

pmeenan commented Dec 20, 2024

Shouldn't. Not sure what the technologies page in WPT uses but new fields are usually not a problem.

src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
src/categories.json Outdated Show resolved Hide resolved
Copy link
Member

@rviscomi rviscomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth spot checking some of the most popular/used categories, otherwise LGTM. I did comment on one that looked off.

src/categories.json Outdated Show resolved Hide resolved
Copy link

WPT test run for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=250115_Z6_2
Detected technologies:

{
    "detected": {
        "IaaS": "Google Cloud",
        "JavaScript libraries": "web-vitals",
        "RUM": "web-vitals",
        "Performance": "Priority Hints,Google Cloud Trace",
        "Security": "HSTS",
        "Webmail": "Google Workspace",
        "Email": "Google Workspace",
        "Analytics": "Google Analytics",
        "CDN": "Cloudflare",
        "Miscellaneous": "RSS,Open Graph"
    },
    "detected_apps": {
        "Google Cloud": "",
        "web-vitals": "",
        "Priority Hints": "",
        "HSTS": "",
        "Google Workspace": "",
        "Google Cloud Trace": "",
        "Google Analytics": "",
        "Cloudflare": "",
        "RSS": "",
        "Open Graph": ""
    },
    "detected_technologies": {
        "Google Cloud": {
            "name": "Google Cloud",
            "description": "Google Cloud is a suite of cloud computing services.",
            "slug": "google-cloud",
            "categories": [
                {
                    "id": 63,
                    "slug": "iaas",
                    "description": "Provides computing resources",
                    "groups": [
                        7
                    ],
                    "name": "IaaS",
                    "priority": 8
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Cloud.svg",
            "website": "https://cloud.google.com",
            "pricing": [],
            "cpe": "cpe:2.3:a:google:cloud_platform:*:*:*:*:*:*:*:*"
        },
        "web-vitals": {
            "name": "web-vitals",
            "description": "The web-vitals JavaScript is a tiny, modular library for measuring all the web vitals metrics on real users.",
            "slug": "web-vitals",
            "categories": [
                {
                    "id": 59,
                    "slug": "javascript-libraries",
                    "description": "Collections of pre-written JavaScript code",
                    "groups": [
                        9
                    ],
                    "name": "JavaScript libraries",
                    "priority": 9
                },
                {
                    "id": 78,
                    "slug": "rum",
                    "description": "Tools that track performance as experienced by users",
                    "groups": [
                        2
                    ],
                    "name": "RUM",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "web-vitals.svg",
            "website": "https://github.com/GoogleChrome/web-vitals",
            "pricing": [],
            "cpe": null
        },
        "Priority Hints": {
            "name": "Priority Hints",
            "description": "Priority Hints exposes a mechanism for developers to signal a relative priority for browsers to consider when fetching resources.",
            "slug": "priority-hints",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "description": "Tools that measure and optimize site speed",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Priority Hints.svg",
            "website": "https://wicg.github.io/priority-hints/",
            "pricing": [],
            "cpe": null
        },
        "HSTS": {
            "name": "HSTS",
            "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
            "slug": "hsts",
            "categories": [
                {
                    "id": 16,
                    "slug": "security",
                    "description": "Technologies that protect websites from vulnerabilities and attacks",
                    "groups": [
                        11
                    ],
                    "name": "Security",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "default.svg",
            "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
            "pricing": [],
            "cpe": null
        },
        "Google Workspace": {
            "name": "Google Workspace",
            "description": "Google Workspace, formerly G Suite, is a collection of cloud computing, productivity and collaboration tools.",
            "slug": "google-workspace",
            "categories": [
                {
                    "id": 30,
                    "slug": "webmail",
                    "description": "Systems that allow users to send and receive emails through a browser",
                    "groups": [
                        4
                    ],
                    "name": "Webmail",
                    "priority": 2
                },
                {
                    "id": 75,
                    "slug": "email",
                    "description": "Systems that manage email communication",
                    "groups": [
                        4,
                        2
                    ],
                    "name": "Email",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google.svg",
            "website": "https://workspace.google.com/",
            "pricing": [],
            "cpe": null
        },
        "Google Cloud Trace": {
            "name": "Google Cloud Trace",
            "description": "Google Cloud Trace is a distributed tracing system that collects latency data from applications and displays it in the Google Cloud Console.",
            "slug": "google-cloud-trace",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "description": "Tools that measure and optimize site speed",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "google-cloud-trace.svg",
            "website": "https://cloud.google.com/trace",
            "pricing": [],
            "cpe": null
        },
        "Google Analytics": {
            "name": "Google Analytics",
            "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
            "slug": "google-analytics",
            "categories": [
                {
                    "id": 10,
                    "slug": "analytics",
                    "description": "Tools that track user behavior and provide insights into website performance",
                    "groups": [
                        8
                    ],
                    "name": "Analytics",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Analytics.svg",
            "website": "https://google.com/analytics",
            "pricing": [],
            "cpe": null
        },
        "Cloudflare": {
            "name": "Cloudflare",
            "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
            "slug": "cloudflare",
            "categories": [
                {
                    "id": 31,
                    "slug": "cdn",
                    "description": "(Content Delivery Network) Distribute website content globally to improve load times for users",
                    "groups": [
                        7
                    ],
                    "name": "CDN",
                    "priority": 9
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "CloudFlare.svg",
            "website": "https://www.cloudflare.com",
            "pricing": [],
            "cpe": null
        },
        "RSS": {
            "name": "RSS",
            "description": "RSS is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format.",
            "slug": "rss",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "description": "Tools and technologies that don't fit into other categories",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "RSS.svg",
            "website": "https://www.rssboard.org/rss-specification",
            "pricing": [],
            "cpe": null
        },
        "Open Graph": {
            "name": "Open Graph",
            "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
            "slug": "open-graph",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "description": "Tools and technologies that don't fit into other categories",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Open Graph.png",
            "website": "https://ogp.me",
            "pricing": [],
            "cpe": null
        }
    }
}

@max-ostapenko max-ostapenko merged commit 070f0bb into main Jan 15, 2025
2 checks passed
@max-ostapenko max-ostapenko deleted the category_descriptions branch January 15, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants