Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bookstore example #143

Merged
merged 110 commits into from
Feb 10, 2023
Merged

Add Bookstore example #143

merged 110 commits into from
Feb 10, 2023

Conversation

izmalk
Copy link
Contributor

@izmalk izmalk commented Dec 1, 2022

What is the goal of this PR?

As a part of my onboarding, I have prepared yet another example with TypeDB. This one is in Python. Its purpose is to demonstrate how to use Python client to TypeDB.

What are the changes implemented in this PR?

Added a directory commerce/bookstore - with all files of the example. The README.md file contains all necessary documentation for this example.

Added requests.py with search book and search user functions.
Minor testing of requests, data generation.
Also polished schema and the load_data script.
Moved ordered items generation into load_data.py.
Minor fixes
Added search Order functionality.
Added tags hierarchy, including reasoning rule. Tags requests - work in progress.
Minor fixes.
Added assigning tags of sup-tags rule.
Fixed searching tags with reasoning.
@izmalk izmalk self-assigned this Dec 1, 2022
@typedb-bot
Copy link
Member

typedb-bot commented Dec 1, 2022

PR Review Checklist

Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.


Trivial Change

  • This change is trivial and does not require a code or architecture review.

Code

  • Packages, classes, and methods have a single domain of responsibility.
  • Packages, classes, and methods are grouped into cohesive and consistent domain model.
  • The code is canonical and the minimum required to achieve the goal.
  • Modules, libraries, and APIs are easy to use, robust (foolproof and not errorprone), and tested.
  • Logic and naming has clear narrative that communicates the accurate intent and responsibility of each module (e.g. method, class, etc.).
  • The code is algorithmically efficient and scalable for the whole application.

Architecture

  • Any required refactoring is completed, and the architecture does not introduce technical debt incidentally.
  • Any required build and release automations are updated and/or implemented.
  • Any new components follows a consistent style with respect to the pre-existing codebase.
  • The architecture intuitively reflects the application domain, and is easy to understand.
  • The architecture has a well-defined hierarchy of encapsulated components.
  • The architecture is extensible and scalable.

Added README
Added comments
Modified load_data.py to create/re-create DB
Deleted one of the request examples
Changed default DB name
@izmalk izmalk marked this pull request as ready for review December 4, 2022 19:29
@izmalk izmalk requested a review from alexjpwalker as a code owner December 4, 2022 19:29
Added info about the new bookstore example into the top-level readme file of the TypeDB Examples repository.
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
commerce/bookstore/README.md Outdated Show resolved Hide resolved
commerce/bookstore/README.md Outdated Show resolved Hide resolved
commerce/bookstore/README.md Outdated Show resolved Hide resolved
commerce/bookstore/python/load_data.py Outdated Show resolved Hide resolved
commerce/bookstore/python/load_data.py Outdated Show resolved Hide resolved
commerce/bookstore/python/load_data.py Outdated Show resolved Hide resolved


# This is a list of imported files with datasets for the DB
Inputs = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feel like something we should type a bit more strongly. (We're quite big on strong typing.) How about introducing an Input class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow - It is not used anywhere outside of loading the data. It does not need to be in the schema. It's basically just a list of .csv file names and the correlated functions to load them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define an abstract base class Input, with subclasses BookInput, UserInput, RatingInput etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to do that in a separate PR here https://github.com/izmalk/typedb-examples/pull/1/files
But I feel like the result is by far worse than the initial variant. I suggest discussing in this separate PR how to proceed. Maybe it's just a problem with my implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased the PR with classes on the latest version of this repository. So it's ready to be merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup the database - by loading the schema.
If you want I can rename it to something like load_schema?

Copy link
Contributor Author

@izmalk izmalk Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous comment was about the setup() function. Fixed it in typedb/typedb-examples@0655495

]

# This is the main body of this script
with TypeDB.core_client("localhost:1729") as client:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a large number of code comments here. In general, we prefer to minimise code comments, because if the code gets refactored / changed (which we do frequently), then the comments go out of date very rapidly, and people forget about them, because they're just comments - a minor thing on the side when compared against the actual source code itself.

Some of the comments are fairly clear. For example, check the db existence on L264 is obvious from simply reading the code, so we can just delete that comment.

On the other hand, some of the comments are necessary in order to actually make it clear what the line does. We always strive to have our code self-documenting: the code itself should describe what it does, concisely, at a high level.

For example, if check_data() == 0, is not clear about what it means. What are we checking about the data? What does the number "0" mean?

By reading deep into the implementation of check_data, we can figure out all these questions: but the average developer reading the code would be better placed to understand it if the method itself was clear in its naming and in its return value. For example, if has_existing_data() == False (or more tersely, if not has_existing_data()).

We should try to make this code more readable in the sense of "gaining a conceptual understanding, just by reading this method, of how the whole program works." Moreover, we should try to apply this throughout all of our Python code. This may be a fairly major refactor task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards to how many comments:
I feel like we can't give too many comments here. This is not the usual code, but an example. Instead of writing some additional documentation with links to lines of code, we are using comments to annotate almost any important line. Even if it's obvious. That's important because of many reasons, including:

  1. We will have a lot of different users onboarding. Including some of the juniors that might be unfamiliar with even obvious code patterns and functions. Some of them might struggle with English and additional comments in simplified English might help them.
  2. This tutorial's goal is not only to lead newcomers through the process to create a working bookstore. But also to enable them to explore the solution. So the less time and effort they spend to understand the existing solution, the more time and motivation they will have to continue exploring their own ideas and variants. They can spend these additional resources to modify the tutorial a little bit for their intended use case. Or build something of their own. Even the slightest bump on the road can reduce their opinion of our product, motivation to explore it, and/or their time/resources budget for their own exploration. So with that comments even in the most obvious places, I'm trying to do some shortcuts here and there. it's not that these comments will obstruct anything. Please tell me if you think this is the case.
  3. The example itself will. be mostly static. I'm thinking of adding a few addons (like debug logging), maybe even adding an automated test (to test this example with new releases of TypeDB to get a warning even if something will get broken). But in terms of refactoring, I don't think the example will change a lot over time. Even if it will change - comments are very important here and should be double-checked just like the rest of the code.
    That's why, as long as these comments are not obstructing anything, I think we should stick to this comment-rich approach in examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards to function names - yeah, I'll try to re-read and improve what I can. I feel like there is always room for improvement. So it might be an iterative process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading:

  1. Create a Client, check if DB exists, if not, create it and load data; if it exists, connect to it.
  2. If DB exists, check if it has existing data. If it does, prompt user to reload it; it not, then "setup".

The narrative is better than before - I think everything is now clear except "setup". What does setup do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loads schema. I renamed the function in typedb/typedb-examples@0655495

izmalk and others added 9 commits December 5, 2022 23:28
Fixed a bug with rating calculations - prevented division by zero.
Minor refactoring
Added sorting to some requests. Just to show this ability.
Fixed a bug with genre tags loading too late to correctly load genre-taging relations.
Minor refactoring
Renamed functions to generate queries from templates to generate query
izmalk added 14 commits January 16, 2023 21:01
Renamed setup() to load_schema()
Implemented show_book function.
Refactored some comments.
Created show_user function.
Minor comments improvements
super-tag-ownership
Added os.path to fix a problem of starting the load_data.py with any other working dir except python/ directory.
Changed Delivery address dataset to auto-generated addresses for more realistic look.


def load_data(): # Main data load function
with TypeDB.core_client("localhost:1729") as client: # Establishing connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store "localhost:1729" in config.py. This is also good practice for readers particularly as their project matures and connects to an external host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.verbose = verbose


class BookInput(Loader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using subclassing we should generally adopt a consistent naming pattern: BookInput is not inherently related to Loader; BookLoader would be.

We should rename these loaders to BookLoader, UserLoader, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# This is a list of classes to import data. The order of values is important for loading data order.
# Classes have filenames and corresponding methods to load the parsed data into the TypeDB
Input_types_list = [GenreInput, GenreHierarchyInput, BookInput, UserInput, RatingInput, OrderInput, BookGenreInput]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per comment above we should rename this list to loaders .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file is called loaders.py — our reference will be loaders.loaders that way.
That's why I suggest using Loaders_list as a variable name. Just for visual aesthetics.
loaders.Loaders_list
More unique, accurate, and descriptive than loaders.loaders or just loaders.list. But I can see loaders.list as an alternative if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we discussed it I implemented the loaders.loaders_list variant in typedb/typedb-examples@f815079

@alexjpwalker
Copy link
Member

alexjpwalker commented Jan 20, 2023

Before we proceed to merge this PR, we should ensure that the Bookstore example is tested in Python tests.

Added tests for all main functions.
All tests are green.
@izmalk
Copy link
Contributor Author

izmalk commented Jan 30, 2023

Before we proceed to merge this PR, we should ensure that the Bookstore example is tested in Python tests.

I have implemented the tests. I hope that's enough for now. typedb/typedb-examples@167562a

print("Detected DB " + config.db + ". Connecting.")
if not has_existing_data(): # Most likely the DB is empty and has no schema
print("Attempting to load the schema and data.")
if load_schema(): # Schema has been loaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the calls to core_client look a lot cleaner now that we've introduced the typedb_server_addr config - now let's also simplify the implementation!

In the future, our Docs will contain detailed guidelines on when to create Clients, Sessions and Transactions and what their lifetimes should be. Until then, it's all our internal knowledge. And basically, a Client should, ideally, be created once in the whole lifetime of an application. That's what it's conceptually meant to represent (and it holds onto a single persistent connection with the server under-the-hood; this consumes resource, and we rarely need more than one.)

Currently, we open a client at the start of the main method, but then we open another client in load_schema (and another in has_existing_data, etc.)

Let's refactor all other methods in this file to take in a client parameter. Then, main will create one TypeDB client, and reuse it throughout the entire lifetime of the application.

I'd then do the same thing in requests.py. In this case, create the TypeDB client in the main method. Of course, there is the possibility that the user just exits the application and never uses the client, but this would be an edge case scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Fixed in typedb/typedb-examples@090229b

import config
from typedb.client import TypeDB, SessionType, TransactionType

class LoadDataTests(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion in Discord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed comments in typedb/typedb-examples@1d54bc4
Also, I added the numbers. Otherwise, it's hard to decipher this comment. Too many non-numbered elements in the list

@izmalk izmalk requested a review from alexjpwalker February 10, 2023 14:09
@izmalk izmalk merged commit 85463bc into typedb:master Feb 10, 2023
@@ -0,0 +1,8 @@
# ToDo list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file meant to be checked in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for the future me or future generations.
I believe It's even mentioned in the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants