Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utstring_release #95

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

tbeu
Copy link
Contributor

@tbeu tbeu commented Nov 25, 2016

Not sure if you want to have it as function or macro.

@@ -121,6 +121,15 @@ do { \

#define utstring_body(s) ((s)->d)

static char* utstring_release(UT_string *s) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't going to be a macro, I think it needs the word _UNUSED_ in front of it.
No comment on the functionality; I'll let @troydhanson take it or leave it.

Copy link
Owner

@troydhanson troydhanson Nov 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done this kind of thing before manually. So, I think it's a useful operation. Agreed on the need for _UNUSED_. I think the name _release is so-so. I want to imply that it's releasing it "to" the caller. Maybe utstring_disown. Thoughts? Incidentally I once or twice wrote the opposite function that adopts the string you give it and makes a utstring around it. From that point of view, utstring_own and utstring_disown would make a pair.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely the opposite of utstring_disown would be utstring_adopt!

The longer I look at the existing utstring interface, the more I think it'll never quite line up with my C++ish expectations. Seems like I get a utstring by calling utstring_new(p), but then to put a string into it I have to use utstring_bincpy(p, "mydata", strlen("mydata")) (there's no utstring_strcpy), and then when I'm done I do utstring_free(p); p = NULL; (if I forget that reset-to-null operation, I'll get a double-free if I utstring_renew(p) — I think utstring_renew is an unusual wart in the API).

With this patch, if I want to "release/disown" the string, I do either char *s = utstring_body(p); utstring_init(p); or else char *s = utstring_release(p); p = NULL; (again forgetting the reset-to-null might lead to double-frees) — and if I want to "release/disown" a non-alloc'ed utstring, I have to use the first method, because utstring_release always wants to free the utstring I give it.

I'd want to separate the act of "disowning the buffer" and the act of "destroying the (possibly non-alloc'ed) utstring object itself":

_UNUSED_ static char *utstring_disown(UT_string *s) {
  char *ret = s->d;
  s->n = 0; s->i = 0; s->d = NULL;
  return ret;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@troydhanson wrote:

my intuition was that the original way (release does both) made sense

I can compellingly argue that in the C++ worldview they'd definitely be separate. But I can't argue that utstring currently follows the C++ worldview, because of the "renew" verb and the lack of (s)=NULLs in some places and the incomplete(?) support for UT_strings allocated on the stack. So if utstring isn't C++ish, there's no syllogistic reason for it to have to make disown and free separate operations.

The only thing I can say for sure is that disown/free should be separated if-and-only-if adopt/new are separated; i.e. if it's possible for an existing and/or stack-allocated utstring to adopt a new buffer, then symmetrically we must make it possible for a utstring to disown its buffer without vanishing and/or having been heap-allocated. Whereas, if utstring_adopt involves malloc(sizeof (UT_string)), then utstring_disown must involve free(s). Does that makes sense to everyone involved?

@tbeu
Copy link
Contributor Author

tbeu commented Nov 29, 2016

So what about utstring_attach and utstring_detach as names?

@troydhanson
Copy link
Owner

I think disown/adopt is more intuitive (well, that's an overstatement, somewhat more intuitive). About the renew/new/free - it never occurred to me to have something freed and then renewed. The way I've used renew is to delay initialization of a string until it's needed, then I leave it for the lifetime of a program and then only free it at the end. If you think free-and-later-renew is a use case that needs to work better we could add s=NULL to the utstring_free macro.

I'm still pondering the idea of separating the acts above, my intuition was that the original way (release does both) made sense. I need to turn my attention to other things this week so I trust your judgment to make a decision.

@Quuxplusone
Copy link
Collaborator

@troydhanson: My intuition is "I don't really care about utstring, and don't have a good mental model for what the primitive operations are supposed to be (see my comment above); so, I'm just going to let this patch languish until @troydhanson has time to figure it out." :)

I agree that disown/adopt is the best pair of verbs. IMO @tbeu should adopt those verbs and provide macros (or _UNUSED_ static functions) for both verbs in the same patch.

If you think free-and-later-renew is a use case that needs to work better

Well, that's the thing: I currently have no mental model for what use-cases are supposed to work and which aren't. (I'll pick this back up in the comment thread above.)

@troydhanson
Copy link
Owner

I agree with the recommendation for those verbs and providing both in one patch. Let us know if you're up for that @tbeu. :)

@tbeu
Copy link
Contributor Author

tbeu commented Nov 30, 2016

Sure, I'll revise this PR accordingly.

@tbeu
Copy link
Contributor Author

tbeu commented Nov 30, 2016

a4f1104 adds utstring_adopt/utstring_disown as macros.

| utstring_init(s) | init a utstring (non-alloc)
| utstring_done(s) | dispose of a utstring (non-allocd)
| utstring_init(s) | init a utstring (non-alloc)
| utstring_done(s) | dispose of a utstring (non-alloc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll take these diffs to the existing docs upstream right now (hopefully in a way that merges cleanly with this PR).

do { \
if (s) { \
(b) = (s)->d; \
utstring_clear(s); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect; utstring_clear will set s->d[0] = '\0', blowing away the contents of b. What you want here is more like utstring_init(s), but actually calling out to utstring_init seems confusing to me; I think you should just inline the couple of statements you want here. Which IMO would be

(b) = (s)->d;
(s)->d = NULL;
(s)->i = 0;
(s)->n = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I changed it to utstring_clear after I tested which was wrong.

if (s) { \
(s)->d = (b); \
(s)->n = (l); \
(s)->i = (l); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@troydhanson, in the codebases where you've needed this function in the past, would this have sufficed? Or should it be utstring_adopt(s,d,i,n) for cases where the initial capacity is intentionally greater than the initial length?
I guess since C's realloc and free don't need to know the original capacity, this version is safe; but if someone later filed a feature request to make utstring work with ptr = sizedmalloc(sz) and sizedfree(ptr,sz) the way uthash currently does, it would be hard to retrofit that into this "length-only" model.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a tweak. The other "put" functions (utstring_printf, utstring_bincpy) internally set the internal buffer length n one larger than the input size, to accommodate a trailing nul, and then forcibly put it there. (Look at the utstring_bincpy definition to see what I mean).
It's a convenience so there is a guaranteed nul terminator in the spare capacity part of the string. It is a quirk of utstring that I like because, it makes it compatible with printf (etc) that expect nul-terminated strings, while keeping the nul past the length i means that the content/length are unharmed.

One way to preserve that notion, is to switch adopt to a two parameter version utstring_adopt(s,d) that demands a C string as input, and then its implementation sets

s->i = strlen(d);
s->n = s->i +1 ;

The problem with the current three-parameter adopt is that the caller can pass a non-terminated string, breaking the expectation that utstring always has a nul after the buffer.

We can switch to the four-parameter version utstring_adopt(s,d,i,n) but, to preserve the nul expectation, I would explicitly force a nul at d[i] (which requires a preceding realloc if i==n).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbeu: Sorry! I think what happened here was that the feedback I was awaiting on 2016-12-15 did arrive, in the form of Troy's comment on 2016-12-16. And then I never explicitly said "@tbeu please address this," but I also never really bothered (AFAIR) to figure out whether I even agreed with what Troy said, either.

I suggest that at this point, with many outdated review comments and some pieces of this patch merged already, the best thing to do would be to close this PR and (if you still want the feature merged) open a fresh PR.

Do you have any thoughts on the desirability of utstring_adopt(s, buf, len) versus utstring_adopt(s, ntbs) versus utstring_adopt(s, buf, len, cap)? I think that (at the moment) I agree with Troy that utstring_adopt(s, buf, len) is the worst of both worlds, because it would let you create a non-null-terminated utstring.

Quuxplusone added a commit to Quuxplusone/uthash that referenced this pull request Nov 30, 2016
Thanks to @tbeu for the patch!
This is adapted from part of troydhanson#95.
@tbeu
Copy link
Contributor Author

tbeu commented Dec 15, 2016

Ready to merge?

@Quuxplusone
Copy link
Collaborator

I'd still like to get @troydhanson's feedback on my comment about the possibility of capacity != length https://github.com/troydhanson/uthash/pull/95/files#r90346232 — but if there's nothing forthcoming in the next 48 hours, then sure, I'll squash and merge this patch as is.

@tbeu
Copy link
Contributor Author

tbeu commented Mar 7, 2017

Anything more I can do here?

val314159 added a commit to val314159/uthash that referenced this pull request Apr 14, 2017
* Minor utstring docs cleanup.

Thanks to @tbeu for the patch!
This is adapted from part of troydhanson#95.

* Remove cast to "unsigned" from utstring_len macro.

It causes truncation on 64-bit platforms, and neither @troydhanson
nor myself know why it should have been there in the first place.

Thanks to @mse11 for the patch (troydhanson#97)!

* Silence an instance of -Wsign-compare in the tests.

* Fix a segfault in HASH_ADD_KEYPTR_INORDER.

Thanks to @IronBug for the bug report (troydhanson#99) and test90.c!

test87.c would have caught this bug at the time the macros were added,
except that by chance `offsetof(hstruct_t, hh)` was `0`. I've adjusted
the test so that it fails before this patch and succeeds after.

* Document test90.c in tests/README.

I should have done this in 13723b1, but forgot.
No functional change.

* Make `make pedantic` and `make cplusplus` build cleanly again.

Basically just adding malloc-casts and explicit casts from
string literals to `(char *)` (since that gives a deprecation
warning in clang++ these days).

No intended functional change.

* Add "Android" to the list of supported platforms.

Requested by @silvioprog! He writes:

> It worked like a charm for my ARMv7 and finally I can use an efficient
> C hash table in my Android (tested on version 5 and 6) application.

Sure, technically "Android" is covered by "Linux" anyway. :)

Fixes troydhanson#103.

* Bump version to 2.0.2.

* utlist: Add an assert to silence a scan-build diagnostic.

Thanks to @aaronmdjones for the report!

Unfortunately scan-build is unhappy with uthash and utarray
as well, and the fixes seem harder in those cases. Especially
since there's no tradition of adding redundant asserts in those
files, and I wouldn't really want to start. This case is easy
because we've already committed to liberal use of asserts in
utlist (for whatever reason).

* utlist: Add INORDER macros for inserting into sorted lists.

Thanks to @jeffreylovitz for the initial patch!

The `{LL,DL,CDL}_LOWER_BOUND` macros have semantics similar to C++'s
`std::lower_bound` algorithm: in essence they return an iterator to
the first existing list element which is *not less than* the given
one. However, in this context an "iterator to element X" is actually
a "pointer to element X-1", because our lists support only
insertion-after, not insertion-before. Therefore, in situations
where `std::lower_bound` would return `lst.begin()` we return `NULL`,
and in cases where `std::lower_bound` would return `lst.end()` we
return a pointer to the last element of the list.

* uthash, utlist: Make the NO_DECLTYPE codepaths match more closely.

utlist had made `NO_DECLTYPE` the default on `__ICCARM__` since 368010b,
but uthash had never done so. Do so.

utlist gave double-definition errors if `-DNO_DECLTYPE` was passed
on the command line (for a compiler that otherwise supported decltype
and/or __typeof). Fix that.

Passing `-DLDECLTYPE(x)=decltype(x)` on the command line ought to do
the right thing. Make it at least not double-define `DECLTYPE`. But
we still have the problem that we do things like `LDECLTYPE(*p)`,
which in C++ yields an lvalue reference type, not an object type.
I don't know how to fix that.

uthash's test87.c and test90.c gave errors in C++ with `-DNO_DECLTYPE`
because the code now known as `HASH_AKBI_INNER_LOOP` would attempt to
call the provided `cmpfcn` with an argument of type `(void *)` instead
of the expected `decltype(head)`. This works in C with an implicit
conversion, but doesn't work at all in C++ because it would try to
call a different overload of the comparison function. So, if we have
`NO_DECLTYPE`, let's use `head` as a scratch variable.

* C89 compatibility: add a missing "return 0" to test89.c

No functional change.

* Convince Clang that `pid` is never used uninitialized in hashscan.c.

No functional change.

* Use non-reserved identifiers for some helper macros.

Identifiers beginning with an underscore and an uppercase letter
are reserved to the implementation, and Clang apparently warns on
such things these days (via -Wreserved-id-macro). Solution: rename
those macros to avoid leading underscores.

Fixes troydhanson#112.

* remove libut
@tbeu
Copy link
Contributor Author

tbeu commented Aug 12, 2020

but if there's nothing forthcoming in the next 48 hours, then sure, I'll squash and merge this patch as is.

It's almost 48 months now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants