forked from malloydata/learn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmalloy_cardio.malloynb
80 lines (71 loc) · 3.59 KB
/
malloy_cardio.malloynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
>>>markdown
# Malloy Cardio
This set of exercises is sure to get your heartrate up and your brain firing on all cylinders.
Fire up your favorite workout playlist and dive in 👇
>>>markdown
## Step 0: Get Ready
The first step in any Malloy data modeling exercise is to declare your sources. In the following code cell, declare a source for each of the tables in the ecommerce dataset. Refer to the [documentation for the precise syntax](https://malloydata.github.io/documentation/language/source).
>>>malloy
// users table at data/users.parquet
source: users is duckdb.table('data/ecommerce/users.parquet')
// Create a source for the rest of the tables in this dataset:
// products
source: products is duckdb.table('data/ecommerce/products.parquet')
// orders
source: orders is duckdb.table('data/ecommerce/orders.parquet') {
join_one: users on user_id = users.user_id
}
// items
source: items is duckdb.table('data/ecommerce/items.parquet') {
join_one: orders on order_id = orders.order_id
join_one: products on product_id = products.product_id
}
>>>markdown
## Step 1: Warmup
Simple queries are easy to write in Malloy. The following query looks at the count of users by gender:
>>>malloy
run: users -> {
group_by: gender
aggregate: user_count is count()
}
>>>markdown
Now your turn to try. Write a query that calculates the count of orders by status.
>>>malloy
// Calculate the count of orders by status:
// run: orders -> {
// ...
// }
>>>markdown
## Step 2: Joins
In this data model, `items` and `orders` are related fact tables, and the `users` and `products` tables are dimensions that join in. Update the sources above to include these joins:
- `items` joins to `orders` via `order_id`
- `items` joins to `products` via `product_id`
- `orders` joins to `users` via `user_id`
>>>markdown
## Step 3: Using Joins
Now that we have the joins modeled, let's run a query that takes advantage of them. For each product category, find the average age of users who ordered from that category.
>>>malloy
// Compute the average age of users who ordered from each category
>>>markdown
## Step 4: Calculating Percent of Total
Malloy makes "percent of total" calculations very easy. For each product category, what is the total revenue, and calculate the percent of overall revenue.
>>>malloy
// Compute revenue for each product category, as well as percent of revenue relative to overall
>>>markdown
## Step 6: Nested Query
The above query shows us revenue numbers per product category. Let's drill in deeper with a nested query. Can you add a `nest` to this query to show the top 5 brands by revenue for each category?
>>>malloy
// For each product category, show top 5 brands by revenue
>>>markdown
## Step 8: Saving this query
Update the data model to save the "top brands" query directly onto the source. Now modify what you previously wrote to use this saved query.
>>>malloy
// Update what you previously wrote to use a saved query from the Source
>>>markdown
## Step 9: Refining a Saved Query
Try using a query refinement to take the analysis further. Perhaps you can filter to only look at top brands where the `department` is Men or Women? Or perhaps you want to only look at brands with products above a certain `retail_price` threshold?
>>>malloy
// Update this query again to include a refinement
>>>markdown
## Step 10: Keep Exploring!
There are a handful of other datasets in this repository that are ready for modeling, exploration, and analysis. Take what you've learned here and try to apply it to another dataset in the `data` subfolder. Or better yet, find a dataset relevant to your day-to-day work and use Malloy to model and analyze it!