Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model 7: Add unincorporate_reference method. #204

Merged
merged 4 commits into from
Sep 23, 2024

Conversation

emilyfertig
Copy link

No description provided.

@@ -132,7 +132,6 @@ class CleanRelation : public Relation<T> {
// It's safe to unincorporate this element since no other data point
// refers to it.
data_r.at(name).erase(items[i]);
domains[i]->unincorporate(items[i]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is a bug, since domains can be shared across relations (so just because the domain doesn't take a given value in this relation doesn't mean it can't in other relations). (We had resurrected the unincorporate code from an obsolete commented-out version.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Do we want to add a TODO here or create an issue to come up with some sort of scheme for garbage collecting domains when they become completely unused?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The garbage collection is now done in irm.incorporate. I added a comment to Relation::unincorporate to explain.

@emilyfertig
Copy link
Author

I disabled pclean_lib_test because there's a bug where make_pclean_samples isn't sampling unique entity vectors (and incorporating the same set of entity vectors twice should raise an assertion -- it wasn't doing that earlier, and this PR fixes that). I confirmed that it isn't sampling unique entity vectors at head.

@@ -132,7 +132,6 @@ class CleanRelation : public Relation<T> {
// It's safe to unincorporate this element since no other data point
// refers to it.
data_r.at(name).erase(items[i]);
domains[i]->unincorporate(items[i]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Do we want to add a TODO here or create an issue to come up with some sort of scheme for garbage collecting domains when they become completely unused?

cxx/gendb.cc Outdated
@@ -165,4 +165,128 @@ T_items GenDB::sample_class_ancestors(std::mt19937* prng,
return items;
}

// This function walks the tree of reference indices to populate the "items"
// vector. It should be called with ind = items.size() - 1 and class_item

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have ind as a parameter then? Or do you mean ind <= items.size() - 1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the top-level call (updated the comment).


// If this relation doesn't contain the relevant class/reference field, skip
// it.
if (domain_inds.size() == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there non-error reasons why this can happen? If not, should we print a warning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- I expanded the comment.

@@ -25,14 +26,36 @@ void PCleanSchemaHelper::compute_domains_for(const std::string& name) {
std::vector<std::string> annotated_ds;
PCleanClass c = schema.classes[name];

for (const auto& v: c.vars) {
// Recursively maps the indices of class "name" (and ancestors) in relation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all of this new code go into a new method, compute_ref_indices() ? It would be great if compute_domains_cache() could stick to its original mission.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (not sure which I prefer here, combining them saved some duplicate code and ensured that the domains cache was populated first, which the caller now has to ensure).

@emilyfertig emilyfertig merged commit 1adddc6 into master Sep 23, 2024
2 checks passed
@emilyfertig emilyfertig deleted the emilyaf-091824-model7-unincorporate-reference branch September 23, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants