-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get decoded/unfolded semantically "full" header #122
Comments
Yes, I think your examples are right -- there's also a convenience method in AddressHeader to get the email/name of the first header, so you could Again, not entirely against it -- so would it display kind of like an email client shows email addresses? "Blah user@email, Second user@secondemail"? Perhaps AddressHeader::getDisplayValue or something? |
No, sorry, I didn't point that out clear enough: This is a general header issue, not just about So if the email contains:
There should be three functions to get:
For the naming, I'm proposing a new system: All functions should follow the pattern
|
Just noticed that RFC 5322 uses the term "body" for what I called "content":
So maybe |
I still don't see the value in changing __toString or it's meaning (the "string" representation of the header).
But isn't there? I get that there's some inconsistency in getHeaderValue/getValue with addresses, but often for AddressHeader's you might only be interested in the first address, so I made it inconsistent so you didn't have to worry about AddressHeader potentially representing more than one address for a 'from' header for instance. I'm not sure what's worse though, both have their drawbacks imo :( But yeah, that could've been a mistake. I'll give it some though. Maybe a convenient very explicit "getFirstAddress" "getFirstName" or something would've made more sense. And yeah, if we're to change it we'd have to deprecate getHeaderValue and getValue and change them to getHeaderContent and getContent (or Body). |
No, since for a In which situations would somebody only be interested in the first address? |
I get what you're saying about AddressHeader being inconsistent with its getValue and acknowledge that. In your example though, Subject, getValue would give result 3. I can think of quite a few examples where you might only care about a single address... the primary being if you're checking 'From' for an address, is it preferable to go |
Yeah, sure the second is shorter - but how can you know in advance that the address you're looking for must be the first one? |
I think you'd rarely care if 'from' contained more than one address, I think that's very uncommon. |
Yeah, sorry, I missed that cause I was misled by the "errors" caused by #130
I was about to say that, but was afraid you'd come up with some mailing list (or whatever) example ;-) Anyway: What about So what I'm mainly suggesting is to change the function's behavior. And since this is a BC break, you consequently need to change its name too. And this is a nice opportunity to rethink the entire naming concept. |
I get it and am not disagreeing on this one as well. I feel like 'getValue' is so standard that it might be preferred even if there's a problem with it though... and the deprecation and introduction of 'getContent' also seems at least as equally annoying to me as the problem it's trying to fix. Everyone will have to change their calls from getValue to getContent, and they're all already used to the existing functionality. |
Well, I for myself don't care how you handle the transition :-) Semantic versioning probably demands the deprecation since it's a BC break. BTW: How could you "send" a deprecation warning message at all? And where/how could users see it? About the very name: The more I think of it, the more I'm torn towards getBody(). Why? Because it's the term the RFC uses! And nobody was ever fired for sticking to the RFC... ;-) |
The only deprecation I've done so far I've put up on the main readme (and main page of mail-mime-parser.org). I'd also put it up in the release notes for the specific version it changed in. |
Here's the answer I was looking for: https://stackoverflow.com/a/194135/1668200 |
Aah, I see what you're getting at -- yeah, I didn't do that. I only added an |
Two more points about the naming:
|
Yeah, actually I think in that case it's different though too... if you consider the type of header Content-Type is, it's a header with optional additional 'parameters', so the value of the header is "text/plain", but it may have additional parameters like 'charset', which has a default value of 'us-ascii'... I think this is also different from my processing of emails which I agree is confusing. You can see what I'm getting at here: RFC 1341 (I'm sure this must be mentioned in RFC822 and/or derivatives somewhere but I couldn't find it with a quick glance lol... anyway, specifically:
Note that the 'parameter' is optional, and can be different depending on type/subtype -- for multipart, it could be 'boundary', for content-type it could be charset, etc... the 'value' is one thing, and the 'parameters' are another. This is true for other parameterized headers too, like 'content-disposition'... it's 'value' could be 'attachment', and it could optionally have a 'filename' parameter for example.
I think most people expect the comments to be removed in the value, they don't need to worry about that being explicitly stated imo. |
Well, another argument for "getBody()" :-)
What I definitely can say is that most people don't know the difference between "value", "parameter", "comment" etc. for an email header! If I see something like Comments: BTW: Is your decoding engine usable stand-alone? I.e. can I throw |
I'd say you're wrong about most people. Consider these examples: https://javaee.github.io/javamail/docs/api/javax/mail/Header.html#getValue-- https://swiftmailer.symfony.com/docs/headers.html :
The only instance of 'getBody' I've seen is here: http://james.apache.org/mime4j/apidocs/org/apache/james/mime4j/stream/Field.html#getBody() And in this case it's like my getRawValue.
You should be able to do something like this if you want: $di = new \ZBateson\Container();
$mlpf = $di->getMimeLiteralPartFactory();
$mlpf->newInstance('=?ISO-8859-1?Q?f=F6=F6b=E4r?='); You could also create an instance of \ZBateson\MailMimeParser\Header\Part\MimeLiteralPart, pass it a \ZBateson\MbWrapper\MbWrapper instance as first parameter, and encoded word as second parameter. My example here though gives you options with dependency injection (in case future readers are doing that). |
Yeah, sure "getBody" is uncommon. But what's your suggestion?? IMO Besides: Since you follow the RFCs where ever possible, why not take the names from them too? |
I never "just give you whatever's in there"... it's never the equivalent of "getRawValue". The only instance it's a little different is the way it gives the first email rather than 'all emails and names' in a string. |
(unless the header doesn't require any processing of course) |
How about I deprecate getValue only in AddressHeader, introduce a getCompleteValue (again just for AddressHeader) in a 1.3... then in 2.0 I deprecate getCompleteValue and restore getValue with the functionality of getCompleteValue. It might affect a few people upgrading from an older version to the latest, but still less than deprecating all of 'getValue', introducing a new method like 'getBody' or 'getContent' which to me definitely implies the whole header instead of the value of it for something like a parameterized header, and also means most people can just be 'aware' of the change without changing every call to 'getValue' to something else. I'd say I'm between that or living with the inconsistency, and I'll give it some thought. |
This is exactly the problem! To me, a header is a header. I don't know which headers need processing...
And what about Seriously: If you want to get a clear, clean nomenclature, please set up a wiki page (is this still possible on GitHub?) with a big table: All headers and all your functions; and in each cell, what the function returns. This will give us an overview which cases we need to cover. This table will ultimately move to And maybe we should also get Vincent on board, cause you said that was the original idea behind https://gitlab.com/mailcare/parser ? Might take a while till we reach a consensus, but on the other hand: If you two agree on a common interface, this is going to be the de-facto standard for any PHP mail parsing project! |
No, you misunderstood me...
I'm always processing headers, some look the same regardless though (first example)... in that case there's no difference between the two.
Yup, I can set it up for you if you want to undertake that 👍
That's his project... he's not currently doing much with headers, I think he was more interested in using my header parser (when that was being discussed). |
You can have a look at what type of Header object is created for what type of header here: https://github.com/zbateson/mail-mime-parser/blob/master/src/Header/HeaderFactory.php The default is a "GenericHeader" which filters out comments and processes quoted parts. |
OK, please! |
Some data I've just stumbled upon: For the
|
The information in that case is still available via getRawValue if you want it that way of course, but parts are accessible separately too via That returns all the parsed HeaderPart objects, including any CommentPart objects... although having a 'getComments' defined in AbstractHeader might make that easier. |
Hi @ThomasLandauer -- I've had a chance to look at this some more and think about it and the behaviour isn't quite the way you described it. 'getValue' actually consistently returns the value of the first part of a header. Consider:
You can see that this is true in that AddressHeader doesn't override 'getValue', and neither does 'ParameterHeader'. The default implementation of getValue does this in AbstractHeader: if (!empty($this->parts)) {
return $this->parts[0]->getValue();
}
return null; This actually does make sense to me and I think it's still my preferred way... it allows a user to not have to handle knowing if an address contains more than one address in some cases like with a From. If the user expects more than one address, they can specifically handle it. Having said that, it may be useful to have a convenience method to get all the names/addresses in an email field after decoding, that displays them like Thunderbird would or something. I'm a bit hesitant though because it has to make some decisions for the user -- how should I display this email address for example: |
I know this is an old topic but I am having a similar issue. There seems to be no way to get the "full" decoded and unfolded contents of a header. For example: HeaderThis is the header as it appears in the rfc2822 message:
Desired outputWhat I would like to retrieve is the full unfolded header without the header name, like so:
Code$result = $message->getHeader('Authentication-Results');
echo $result->getValue(),"\n\n";
echo $result->getRawValue(),"\n\n";
echo $result->__toString(),"\n\n"; Output
As you can see, none of these methods returns the complete header on one line. It seems like my best option is to use getRawValue and do the unfolding myself, however, getRawValue doesn't do any MIME decoding. So if I'm working with a header that is MIME encoded then getRawValue doesn't work and I have to use a different workaround. Can we have a method such as I'm using version 2.1.0. Thanks |
Hi @fkoyer, Looks like you just need a quick regex for it: Edit: should really read things properly before responding, sorry. |
Hi @zbateson, Thanks for the quick response. Yes, I can definitely do the unfolding myself. It would just be handy to have a function that returns the full decoded and unfolded header text so that we don't have to do it every time we access a header. TBH, that's what I expected Thanks |
I'm done with
Date:
,From:
is next :-)I have this:
And I want to get this:
Currently, there's no easy way to get it, right?
echo $message->getHeader('from');
gives:See
AbstractHeader::__toString()
$message->getHeader('from')->getRawValue()
gives:$message->getHeaderValue('from')
gives:So all that's left is to use something like
->getAddresses()[0]->getName()
and->getAddresses()[0]->getEmail()
and reassemble them manually, right?php-mime-mail-parser/php-mime-mail-parser
has agetHeader()
which does exactly what I mean.So I'm suggesting that besides giving the raw header, or the deconstructed parts, you should offer an "intermediate" function that gives the semantically full header (i.e. full header decoded and unfolded).
The text was updated successfully, but these errors were encountered: