-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore Regex: Neither finding a suiting regex nor an "invert" option #341
Comments
Hello @georg-d , Good suggestions, thank you. I will appreciate a PR with regex examples and help on Ignore Regex feature in Help.md 👍 BR |
First of all, let's imagine a theorical comparison with only the first line in the two files, then only the second line in the two files and so on..., without any regex restriction. Then, normally, depending on the Indeed :
@georg-d, you can verify my asertion, if you add a blank line, between each line in the two files, every line is displayed with its right highlighting ! In addition, let's imagine that you change the However, when using your test files, without any blank line in between, it happens that lines So it seems, @pnedev, that, in this specific case, it breaks down the min line ressemblance rule. I won't consider it as a bug but I prefer that you are aware of ;-)) Now, @georg-d, you regex explanations are not exact in some cases. For instance you said, in point 3, concerning the regex
No ! Of course, the part So, the end of your sentence is correct if we consider, either, the See the solution, below ! Now, let's consider the @georg-d's test files with blank lines in between ( my version ) File 123 same
234 differs in the texts!
345 differs in number
456 differs in text + number, 1st file
56 differs in number
foo And file 123 same
234 changes in characters
648 differs in number
789 differs in text + number, 2nd file
9 differs in number
bar Starting with these two texts :
Now, @pnedev, I suppose, that the exclude regex region feature could easily be changed into an include regex region part :
And, of course, if the
@pnedev, I also suppose that this Note that this would lead to strange situations in some cases : for instance, if you tick the Best Regards, guy038 Remainders :
However, in most cases, it's easy enough to find out an alternative to these limitations ! |
Hello @guy038 , Thank you very much for your thorough analysis and regex help and for the suggestions too! I'll fully try your examples later but let me add a few comments now:
Yes, the min line resemblance rule is sometimes not that strict, I am aware of it and as you mentioned it is not a bug actually. The min resemblance rule in some cases is exact and in others is more an approximation or a guide (at least that is what it looks like to the user). Thank you for pointing that out.
I would prefer to avoid such long entries in the plugins menu as the suggested About adding the invert option (consider only the regex matching ranges in the line comparisons) - I would prefer to stick to the ignore option only for consistency on one hand and also because that will complicate the ignore logic on the other.
Yes, that's right. And in some occasions it might lead to strange/unexpected behavior (as in your example perhaps). Thanks once again for the help 👍 Happy holidays! |
I want provide a documentation on the RegEx feature, so I want to create a patch file for Help.md. While one example is ready, the second example raised questions. I succeeded to cause compare to ignore some parts, but I did not find an inverse regex, so to ignore exactly the other part. I would have quickly reached the goal if ComparePlus had an option to "negate the provided regex" (like -v for grep) or an option "use following pattern for compare" which accepts the common substitution patterns like
$1
and the result will be used by compare – which would also allow quite sophisticated stuff, e.g. to re-order characters like for regional differences in date format. But maybe such options are not required and someone comes up with a regex pattern to ignore everything except leading 3 digits?This example is inspired by #313 but with modified input to show all possible cases. File 1 is
and file 2 is
For the help file, I'd like to show a) how to compare only the string behind the leading numbers and b) how to only compare the leading numbers (so what was requested in above mentioned issue). My tries and the results:
Details (you may skip them if result is clear to you): It will highlight lines 4 and 5 as different because the default value 30% Min line resemblance to mark as changed is reached (first 6 of total 11 characters are identical) and last line as new (added/deleted) because 0% of characters are identical.
^\d{3}
and clicking Enable is clear to me, causes compare to ignore first 3 digits and is thus reaching goal a.Details on how it works (you may skip this if regex is clear to you): This regex causes compare to ignore first 3 chars if they are decimals. So if a line does not start with 3 digits, whole line is compared, and if line starts with 3 digits, compare ignores first 3 characters and only looks at the remainder of the line, so the text behind first 3 characters, i.e. column 4 and beyond is the relevant part. In line 2, this relevant part has 21 characters of which only 3 (space and "in") don't differ, and because 3/21=14% is below the default of 30% for Min line resemblance to mark as changed, the line is not considered to have changed but to be new (added/deleted). In line 3, the relevant part is completely identical, so the line is considered to have no relevant difference and is not marked at all. Line 4 has only a slight difference of 3/34=9% of characters in the relevant part, hence, it not considered new (added/deleted) but changed. As lines 5 and 6 do not start with 3 digits, nothing is ignored and they are compared like in 1st case.
(?:^\d{3})(.*)
and clicking Enable is clear to me and only nearly a solution for goal b.Details on how it works (you may skip this if regex is clear to you): 1st parentheses form a non-capturing group thus not "consuming" any characters but just defining the rest of the regex shall only be considered if first 3 characters are decimals (this causes lines 5 and 6 to be highlighted), and the 2nd parentheses fetch any amount of any characters, so whole line is matched and thus shall be ignored by compare – causing lines 2,3 and 4 to be considered unchanged.
I did not yet find a way to restrict 2nd parentheses to the part behind the three digits, or to leave the 3 digits out completely:
(?:^\d{3})([^\d].*)
does not create a different result – which I do not understand, as the capturing group must not include the starting digit, so at least lines 3 and 4 shall be considered not equal. You can try out the regex in regex101 or regexr and substitute by $1 to see it does return exactly the requested content (in row 1 to 4 the part behind column 3 and in rows 5 and 6 complete row because they do not start with 3 digits).(?:^\d{3})([^\d].*+)
but that is rejected because of last + (unsupported syntax).(?<=\d{3})([^\d].+)
which would need some fine tuning to work for line 5 and 6.(*COMMIT)
and some other advanced regex syntax – hence, I did not try many more of them.I welcome a regex reaching goal b as much as announcement of adding one or both options mentioned in the beginning 🙂
The text was updated successfully, but these errors were encountered: