-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utstring_release #95
base: master
Are you sure you want to change the base?
Conversation
@@ -121,6 +121,15 @@ do { \ | |||
|
|||
#define utstring_body(s) ((s)->d) | |||
|
|||
static char* utstring_release(UT_string *s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't going to be a macro, I think it needs the word _UNUSED_
in front of it.
No comment on the functionality; I'll let @troydhanson take it or leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done this kind of thing before manually. So, I think it's a useful operation. Agreed on the need for _UNUSED_
. I think the name _release
is so-so. I want to imply that it's releasing it "to" the caller. Maybe utstring_disown
. Thoughts? Incidentally I once or twice wrote the opposite function that adopts the string you give it and makes a utstring around it. From that point of view, utstring_own
and utstring_disown
would make a pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely the opposite of utstring_disown
would be utstring_adopt
!
The longer I look at the existing utstring interface, the more I think it'll never quite line up with my C++ish expectations. Seems like I get a utstring by calling utstring_new(p)
, but then to put a string into it I have to use utstring_bincpy(p, "mydata", strlen("mydata"))
(there's no utstring_strcpy
), and then when I'm done I do utstring_free(p); p = NULL;
(if I forget that reset-to-null operation, I'll get a double-free if I utstring_renew(p)
— I think utstring_renew
is an unusual wart in the API).
With this patch, if I want to "release/disown" the string, I do either char *s = utstring_body(p); utstring_init(p);
or else char *s = utstring_release(p); p = NULL;
(again forgetting the reset-to-null might lead to double-frees) — and if I want to "release/disown" a non-alloc'ed utstring, I have to use the first method, because utstring_release
always wants to free
the utstring I give it.
I'd want to separate the act of "disowning the buffer" and the act of "destroying the (possibly non-alloc'ed) utstring object itself":
_UNUSED_ static char *utstring_disown(UT_string *s) {
char *ret = s->d;
s->n = 0; s->i = 0; s->d = NULL;
return ret;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@troydhanson wrote:
my intuition was that the original way (release does both) made sense
I can compellingly argue that in the C++ worldview they'd definitely be separate. But I can't argue that utstring currently follows the C++ worldview, because of the "renew" verb and the lack of (s)=NULL
s in some places and the incomplete(?) support for UT_strings allocated on the stack. So if utstring isn't C++ish, there's no syllogistic reason for it to have to make disown
and free
separate operations.
The only thing I can say for sure is that disown/free
should be separated if-and-only-if adopt/new
are separated; i.e. if it's possible for an existing and/or stack-allocated utstring to adopt
a new buffer, then symmetrically we must make it possible for a utstring to disown
its buffer without vanishing and/or having been heap-allocated. Whereas, if utstring_adopt
involves malloc(sizeof (UT_string))
, then utstring_disown
must involve free(s)
. Does that makes sense to everyone involved?
So what about |
I think disown/adopt is more intuitive (well, that's an overstatement, somewhat more intuitive). About the renew/new/free - it never occurred to me to have something freed and then renewed. The way I've used renew is to delay initialization of a string until it's needed, then I leave it for the lifetime of a program and then only free it at the end. If you think free-and-later-renew is a use case that needs to work better we could add I'm still pondering the idea of separating the acts above, my intuition was that the original way (release does both) made sense. I need to turn my attention to other things this week so I trust your judgment to make a decision. |
@troydhanson: My intuition is "I don't really care about utstring, and don't have a good mental model for what the primitive operations are supposed to be (see my comment above); so, I'm just going to let this patch languish until @troydhanson has time to figure it out." :) I agree that disown/adopt is the best pair of verbs. IMO @tbeu should adopt those verbs and provide macros (or
Well, that's the thing: I currently have no mental model for what use-cases are supposed to work and which aren't. (I'll pick this back up in the comment thread above.) |
I agree with the recommendation for those verbs and providing both in one patch. Let us know if you're up for that @tbeu. :) |
Sure, I'll revise this PR accordingly. |
a4f1104 adds |
| utstring_init(s) | init a utstring (non-alloc) | ||
| utstring_done(s) | dispose of a utstring (non-allocd) | ||
| utstring_init(s) | init a utstring (non-alloc) | ||
| utstring_done(s) | dispose of a utstring (non-alloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll take these diffs to the existing docs upstream right now (hopefully in a way that merges cleanly with this PR).
do { \ | ||
if (s) { \ | ||
(b) = (s)->d; \ | ||
utstring_clear(s); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect; utstring_clear
will set s->d[0] = '\0'
, blowing away the contents of b
. What you want here is more like utstring_init(s)
, but actually calling out to utstring_init
seems confusing to me; I think you should just inline the couple of statements you want here. Which IMO would be
(b) = (s)->d;
(s)->d = NULL;
(s)->i = 0;
(s)->n = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I changed it to utstring_clear
after I tested which was wrong.
if (s) { \ | ||
(s)->d = (b); \ | ||
(s)->n = (l); \ | ||
(s)->i = (l); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@troydhanson, in the codebases where you've needed this function in the past, would this have sufficed? Or should it be utstring_adopt(s,d,i,n)
for cases where the initial capacity is intentionally greater than the initial length?
I guess since C's realloc
and free
don't need to know the original capacity, this version is safe; but if someone later filed a feature request to make utstring work with ptr = sizedmalloc(sz)
and sizedfree(ptr,sz)
the way uthash currently does, it would be hard to retrofit that into this "length-only" model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs a tweak. The other "put" functions (utstring_printf
, utstring_bincpy
) internally set the internal buffer length n
one larger than the input size, to accommodate a trailing nul, and then forcibly put it there. (Look at the utstring_bincpy
definition to see what I mean).
It's a convenience so there is a guaranteed nul terminator in the spare capacity part of the string. It is a quirk of utstring that I like because, it makes it compatible with printf (etc) that expect nul-terminated strings, while keeping the nul past the length i
means that the content/length are unharmed.
One way to preserve that notion, is to switch adopt
to a two parameter version utstring_adopt(s,d)
that demands a C string as input, and then its implementation sets
s->i = strlen(d); s->n = s->i +1 ;
The problem with the current three-parameter adopt is that the caller can pass a non-terminated string, breaking the expectation that utstring always has a nul
after the buffer.
We can switch to the four-parameter version utstring_adopt(s,d,i,n)
but, to preserve the nul
expectation, I would explicitly force a nul at d[i] (which requires a preceding realloc if i==n).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbeu: Sorry! I think what happened here was that the feedback I was awaiting on 2016-12-15 did arrive, in the form of Troy's comment on 2016-12-16. And then I never explicitly said "@tbeu please address this," but I also never really bothered (AFAIR) to figure out whether I even agreed with what Troy said, either.
I suggest that at this point, with many outdated review comments and some pieces of this patch merged already, the best thing to do would be to close this PR and (if you still want the feature merged) open a fresh PR.
Do you have any thoughts on the desirability of utstring_adopt(s, buf, len)
versus utstring_adopt(s, ntbs)
versus utstring_adopt(s, buf, len, cap)
? I think that (at the moment) I agree with Troy that utstring_adopt(s, buf, len)
is the worst of both worlds, because it would let you create a non-null-terminated utstring.
Thanks to @tbeu for the patch! This is adapted from part of troydhanson#95.
Ready to merge? |
I'd still like to get @troydhanson's feedback on my comment about the possibility of |
Anything more I can do here? |
* Minor utstring docs cleanup. Thanks to @tbeu for the patch! This is adapted from part of troydhanson#95. * Remove cast to "unsigned" from utstring_len macro. It causes truncation on 64-bit platforms, and neither @troydhanson nor myself know why it should have been there in the first place. Thanks to @mse11 for the patch (troydhanson#97)! * Silence an instance of -Wsign-compare in the tests. * Fix a segfault in HASH_ADD_KEYPTR_INORDER. Thanks to @IronBug for the bug report (troydhanson#99) and test90.c! test87.c would have caught this bug at the time the macros were added, except that by chance `offsetof(hstruct_t, hh)` was `0`. I've adjusted the test so that it fails before this patch and succeeds after. * Document test90.c in tests/README. I should have done this in 13723b1, but forgot. No functional change. * Make `make pedantic` and `make cplusplus` build cleanly again. Basically just adding malloc-casts and explicit casts from string literals to `(char *)` (since that gives a deprecation warning in clang++ these days). No intended functional change. * Add "Android" to the list of supported platforms. Requested by @silvioprog! He writes: > It worked like a charm for my ARMv7 and finally I can use an efficient > C hash table in my Android (tested on version 5 and 6) application. Sure, technically "Android" is covered by "Linux" anyway. :) Fixes troydhanson#103. * Bump version to 2.0.2. * utlist: Add an assert to silence a scan-build diagnostic. Thanks to @aaronmdjones for the report! Unfortunately scan-build is unhappy with uthash and utarray as well, and the fixes seem harder in those cases. Especially since there's no tradition of adding redundant asserts in those files, and I wouldn't really want to start. This case is easy because we've already committed to liberal use of asserts in utlist (for whatever reason). * utlist: Add INORDER macros for inserting into sorted lists. Thanks to @jeffreylovitz for the initial patch! The `{LL,DL,CDL}_LOWER_BOUND` macros have semantics similar to C++'s `std::lower_bound` algorithm: in essence they return an iterator to the first existing list element which is *not less than* the given one. However, in this context an "iterator to element X" is actually a "pointer to element X-1", because our lists support only insertion-after, not insertion-before. Therefore, in situations where `std::lower_bound` would return `lst.begin()` we return `NULL`, and in cases where `std::lower_bound` would return `lst.end()` we return a pointer to the last element of the list. * uthash, utlist: Make the NO_DECLTYPE codepaths match more closely. utlist had made `NO_DECLTYPE` the default on `__ICCARM__` since 368010b, but uthash had never done so. Do so. utlist gave double-definition errors if `-DNO_DECLTYPE` was passed on the command line (for a compiler that otherwise supported decltype and/or __typeof). Fix that. Passing `-DLDECLTYPE(x)=decltype(x)` on the command line ought to do the right thing. Make it at least not double-define `DECLTYPE`. But we still have the problem that we do things like `LDECLTYPE(*p)`, which in C++ yields an lvalue reference type, not an object type. I don't know how to fix that. uthash's test87.c and test90.c gave errors in C++ with `-DNO_DECLTYPE` because the code now known as `HASH_AKBI_INNER_LOOP` would attempt to call the provided `cmpfcn` with an argument of type `(void *)` instead of the expected `decltype(head)`. This works in C with an implicit conversion, but doesn't work at all in C++ because it would try to call a different overload of the comparison function. So, if we have `NO_DECLTYPE`, let's use `head` as a scratch variable. * C89 compatibility: add a missing "return 0" to test89.c No functional change. * Convince Clang that `pid` is never used uninitialized in hashscan.c. No functional change. * Use non-reserved identifiers for some helper macros. Identifiers beginning with an underscore and an uppercase letter are reserved to the implementation, and Clang apparently warns on such things these days (via -Wreserved-id-macro). Solution: rename those macros to avoid leading underscores. Fixes troydhanson#112. * remove libut
It's almost 48 months now. |
Not sure if you want to have it as function or macro.