Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understandable names #14340

Closed
bramtayl opened this issue Dec 9, 2015 · 17 comments
Closed

Understandable names #14340

bramtayl opened this issue Dec 9, 2015 · 17 comments

Comments

@bramtayl
Copy link
Contributor

bramtayl commented Dec 9, 2015

Base Julia is full of abbreviations. Functions with names like iscntrl are inscrutable. I think, before it's too late, that Julia should make every effort to be written in plain English. To this end, I think that rules like this should be put into effect:

No abbreviations should be used in base julia for any words shorter than 10 (or so) characters
All words should be separated by _

@johnmyleswhite
Copy link
Member

I'm sorry, but this kind of sweeping issue isn't credibly actionable. Please stop opening things that are so broad.

@ScottPJones
Copy link
Contributor

@bramtayl Might be better to 1) make a list of at least the worst cases 2) write a PR to add new, more understandable names (including deprecating the inscrutable abbreviations) 3) wait for the fireworks!

@JeffBezanson
Copy link
Member

The story of picking function names is much more complex than you'd think at first. True, iscntrl is pretty obscure, but it's inherited from C. It's possible all of those functions (about 15 of them) could be replaced by a single character-class-querying function.

We also have an avowed policy of avoiding underscores, as it tends to produce overly long names and less-well-factored functions. For example print_shortest(x) (which we currently have, unfortunately) seems worse to me than, say, print(x, shortest=true).

So I think these have to be taken on a case-by-case basis. To that end, it's certainly possibly there are a few especially obscure names we can fix.

@ScottPJones
Copy link
Contributor

We also have an avowed policy of avoiding underscores

@JeffBezanson Where is that policy? That doesn't seem to be what Julia Docs says:

Word separation can be indicated by underscores ('_'), but use of underscores is discouraged unless the name would be hard to read otherwise.

The statement in Julia Docs seems reasonable, however, avoiding underscores by simply removing them flies in the face of research that has been done into readability of names. I believe that language design should be data driven, and think it would be good to take that research into account in julia.

For things like your print example, where there is a better way, with keywords or the type system, I definitely agree, it's would be good to get rid of the composite names.

Another thing is to adopt hard to understand names because they came from C (from back in the day when linkers could not handle more than 8 characters in a name, file names were limited to either 8.3 (C/PM & MS-DOS), or 14 for Unix), Matlab, or R. I've been told a number of times that arguing that something should be done like some other language was not a good reason to do something in julia, however it is has been used as an excuse for those inscrutable names.

That's sad, because julia can do so much better, with it's great (and getting better) type system and multi-dispatch.

How about something like: ischar(Control, x) or ischartype(Control, x)?
(with abstract CharClass ; immutable Control <: CharClass ; end)
A package could define iscntrl(x) = ischartype(Control, x) if they really wanted to.
Looking at Base, only 4 of those functions are even used, isspace is used in 4 files, 11 times total,
isprint is used twice, in 2 files, isupper and islower are used once each in 1 file.

Would people accept a PR that added ischartype, with say just the short forms for isspace, islower, isupper, isprint and isdigit (which I saw some in packages), deprecating the rest, like iscntrl?
isblank was deprecated already.

@bramtayl
Copy link
Contributor Author

I definitely think that is is good have "well-factored functions". And I understand how it would be easier to inherit C function for users familiar with C.

One thing to do would be to go through Julia and make the three changes below:
-No words longer than 10 or so characters (for example, "short" instead of "abbrev.")
-No abbreviations at all
-All words should be separated by _

And in the process, building "well-factored functions" groups where possible, and maintaining C compatibility if deemed necessary, all on a case by case basis.

From this thread, isupper, isprint, islower have the words supper, sprint, and slower within them. Unwary users could be confused by all the apple products referenced in Julia! This ambiguity would be avoided by is_upper, is_print, and is_lower.

@eschnett
Copy link
Contributor

Here's an example: The function to remove a file is rm, taken from the Unix shell syntax. However, rm is strictly more powerful as it also includes the functionality of Unix rmdir. The same function is called remove in other languages (including C!), so this could be an entry on your list.

Note, however, that there is a group of similar functions (including e.g. mkdir, cp, mv, symlink, and a few more), and they should be named in a consistent manner.

@bramtayl
Copy link
Contributor Author

Unix is a great example of an interface that is completely unreadable because of excessive abbreviation. Regex is another example (and there are great alternatives; see the R package rex).

@StefanKarpinski
Copy link
Member

This issue is also mistitled since it's not about syntax at all.

@bramtayl bramtayl changed the title Understandable syntax Understandable names Dec 10, 2015
@JeffBezanson
Copy link
Member

-No words longer than 10 or so characters (for example, "short" instead of "abbrev.")
-No abbreviations at all
-All words should be separated by _

We are definitely not doing this. This isn't three changes; more like thousands. But the real issue is that as soon as you start trying to apply sweeping policies like this you run into hard decisions over and over. Is rand really so objectionable that it needs to be renamed random? Should abs be absolute_value, and should svd be singular_value_decomp? "Decomposition" is longer than 10 characters; the English language and history of mathematics aren't always convenient.

Is DateTime unreadable? Should it be Date_Time? Does "TCP" qualify as an abbreviation, or is that allowed by custom? Are people better off learning that "eof" in an I/O context means "end of file", or are they better off typing end_of_file every single time? Is NaN an evil abbreviation that needs to be replaced with NotANumber even though it's incredibly standard? Should kron be kronecker_product? Should peakflops be peak_floating_point_operations_per_second?

Another non-obvious point is that when we copy names from C, it's not usually because we're trying to make things easy for C programmers. Rather it's for global consistency --- that is, consistency not just within julia but among software systems. When picking a word for something, one of the first things you ask is whether there is already a word for it, and if so, why not reuse it.

see the R package rex

Isn't "rex" an abbreviation for "regular expression"? Who allowed that?

@mason-bially
Copy link
Contributor

I think before a 1.0 release, the list of all the names could be given a once over. That doesn't seem unreasonable. It might allow a few particularly poor names to be noticed when viewed in a larger context.

But in the long run Julia provides capabilities to create packages which export custom sets of names, heck the package could even be auto generated for the most part. If someone want's a custom set of names then they can provide it themselves.

@ScottPJones
Copy link
Contributor

But the real issue is that as soon as you start trying to apply sweeping policies like this you run into hard decisions over and over

@JeffBezanson You are kind of arguing against yourself here - while I also don't agree with @bramtayl's 3 rules, that's why I also don't agree with the "avowed policy of avoiding underscores".

DateTime isn't a good example, as that already has camelcase to help distinguish the words.
If you follow the convention that all caps implies a constant, that is where underscores are most needed to help separate "words" and make things readable. They aren't needed for camelcase names, by convention module and type names in Julia, as the capital letters are breaking them up.
All lowercase, although more readable than all uppercase (and there are numerous studies on that, from long before computers), do benefit greatly by using underscores if they are long.

I agree with most of your other examples, except the last two - peakflops could be peak_flops, as the common abbreviation is flops, not peakflops, and maybe kron would be more readable as kron_prod (given prod is a common abbreviation for product, assuming that kron is a known abbreviation at least among mathematicians for kronecker).

My recommendation, on a case-by-case basis, would be to look at longer names with _s, and short abbreviated names such as iscntrl, and first see if there is a more julian way of factoring them.
If not, then leave the underscores alone, they do help readability, don't remove them just because of
some policy that somehow underscores are always "bad". Same thing with abbreviations such as TCP
or NaN.

@mason-bially Yes, totally agree. I think short hard to understand names (i.e. to be like Matlab, Unix sh, or C) really belong in compatibility packages. One of the (many) great things about Julia is that it makes doing that sort of thing trivial.

@mason-bially
Copy link
Contributor

I think short hard to understand names (i.e. to be like Matlab, Unix sh, or C) really belong in compatibility packages. One of the (many) great things about Julia is that it makes doing that sort of thing trivial.

I would love to see that, especially as part of #5155, for example the printf library really has no business in Base except for comparability with a c like environment (for which it is quite useful). But we could do with a more python like format that actually enables more powerful format strings. But in the long run those sorts of choices should be made by the people using the language. The base library though should strive to have a well factored set of names, which is why I completely agree with your points.

@tkelman
Copy link
Contributor

tkelman commented Dec 10, 2015

As Jeff has stated elsewhere, if a name is long enough to need an underscore to break it up, then it's probably actually doing too many things. Generic functions should do one thing, a long name is a sign of missing abstraction and something should be refactored into multiple concepts.

Continued comments on this issue are not helpful. Make concrete proposals. Adding underscores will likely be rejected.

@ScottPJones
Copy link
Contributor

@mason-bially As far as printf, I don't think it fits well with julia, and the type-based stuff that @tbreloff added in JuliaIO/Formatting.jl#10 I think is much nicer (and more julian)

@tkelman "probably actually doing too many things", I'd agree, but that is only probably, and there are a few cases where _'s are appropriate. I have tried to make a concrete proposal here, about the is* functions from utf8proc.jl, but haven't heard back yet. I'll make a PR if it looks like people might be interested.

@JeffBezanson
Copy link
Member

Ok, I admit the language "avowed policy" was too strong. I should have said "strong preference" for avoiding underscores.

I agree with the goal of identifying and fixing bad names. I think various different approaches will be needed.

I would also prefer an ordinary function call API for formatting to any kind of format strings.

@ScottPJones
Copy link
Contributor

That sounds great! 💯% behind that! (I'll make a PR for the character category stuff, @nalimilan has already made some great suggestions on that)

@bramtayl
Copy link
Contributor Author

I've recently put in a CRAN package for building time formats. It looks like the underlying logic used is similar to the print formats of @tbreloff . I might write a similar R package for sprintf. I don't think anyone's actually used the package so it's probably full of bugs, but I thought it might be useful.

https://cran.r-project.org/web/packages/strptimer/vignettes/strptimer.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants