Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix path_to_tree_path #635

Merged
merged 1 commit into from
Jul 18, 2018
Merged

Fix path_to_tree_path #635

merged 1 commit into from
Jul 18, 2018

Conversation

r0mainK
Copy link
Contributor

@r0mainK r0mainK commented Jun 14, 2018

A couple entangled issues here:

  1. Incheck_ignore:
  • path_to_tree_path was not being used appropriately, basically path = os.path.relpath(path, r.path) was done twice
  • paths in index are always bytes, but paths passed to ignore_manager should always be strings: had to add some encoding/decoding
  1. In path_to_tree_path:
  • not checking if the tree_path did indeed start by repo_path
  • types again, after all calls to path_to_tree_path the returned path is used to check if it's in the index: might as well always return bytes. also small bug possible when replacing the separator, if the input path was in fact bytes.

@r0mainK r0mainK mentioned this pull request Jun 14, 2018
Copy link
Owner

@jelmer jelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for behaviour you're modifying.

@@ -170,17 +170,21 @@ def open_repo_closing(path_or_repo):
return closing(Repo(path_or_repo))


def path_to_tree_path(repopath, path):
def path_to_tree_path(repo_path, tree_path):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in other PR, please don't rename arguments.

os.path.relpath(path, repopath)
if tree_path.startswith(repo_path):
tree_path = os.path.relpath(tree_path, repo_path)
if isinstance(tree_path, str):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid on Python2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which part ? startswith is defined, as is isinstance, no ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the isinstance bit; if something is a bytestring on python2 and it has non-ascii characters, then encoding it will lead to mayhem since it tries to convert from ascii:

"ij".encode('utf-8')
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, then we can just do the opposite: not isinstance(f(path), bytes). just checked it seems to work in python2

@@ -1189,9 +1193,12 @@ def check_ignore(repo, paths, no_index=False):
ignore_manager = IgnoreFilterManager.from_repo(r)
for path in paths:
if os.path.isabs(path):
path = os.path.relpath(path, r.path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. paths specified here are relative to the current path, not to the repository root (similar to the behaviour of "git check-ignore").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm let me get this straight.

  • if at the start both path and r.path are already relative to the current path then you do not need this
  • if at the start you know for sure that r.path is relative to the current path, and path might be absolute, then the right code should be: path = os.path.relpath(path, os.getcwd()) no ?

Copy link
Owner

@jelmer jelmer Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either path or r.path can be absolute paths, so we do need the call to os.path.relpath.

os.path.relpath will determine the absolute paths for path and r.path relative to os.getcwd() and then determine the relative path between them. E.g.:

% cd /tmp
/tmp% python
Python 2.7.15 (default, May  1 2018, 05:55:50) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.relpath("foo", "/home/blah/bar")
'../../../tmp/foo'
>>> os.path.relpath("/tmp/foo", "/home/blah/bar")
'../../../tmp/foo'

"foo" gets interpreted as "/tmp/foo" and then the relative path from /home/blah/bar is determined.

Just using path = os.path.relpath(path, os.getcwd()) would not get us a path relative to the repository root.

@r0mainK
Copy link
Contributor Author

r0mainK commented Jun 14, 2018

So I guess I have two follow up question after seeing the comments, before I add rewrite and add tests:

  • should path_to_tree_path assume that path and repopath are both relative paths (to the same dir) or not ?
  • regarding types, should there be assumptions on path and repopath's types, and is there a specific type one should expect as output ?

My personal opinion is that path and repopath should be assumed to be of the same type, either bytes or str, and that the output should always be bytes, given the usage of this function is exclusively to check if paths are in the Index. Furthermore, we should not have to check both paths are relative to the same path, the job should have been done previously.

@jelmer
Copy link
Owner

jelmer commented Jun 14, 2018

tree/index paths are supposed to be bytestrings (since they're git paths)

repopath and paths passed into the various porcelain functions are filesystem paths; they can be either bytes or strings, depending on the OS/Python version you're on. The porcelain functions should be able to deal with all this, hence the path_to_tree_path helper.

@codecov
Copy link

codecov bot commented Jun 14, 2018

Codecov Report

Merging #635 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #635      +/-   ##
==========================================
+ Coverage   90.66%   90.72%   +0.05%     
==========================================
  Files          75       75              
  Lines       18976    18992      +16     
  Branches     2022     2018       -4     
==========================================
+ Hits        17205    17230      +25     
+ Misses       1353     1348       -5     
+ Partials      418      414       -4
Impacted Files Coverage Δ
dulwich/tests/test_porcelain.py 100% <100%> (ø) ⬆️
dulwich/porcelain.py 76.24% <100%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4529549...7125a45. Read the comment docs.

@r0mainK
Copy link
Contributor Author

r0mainK commented Jun 14, 2018

@jelmer ok thks I understand better now. So in the end no need to change the logic of check_ignore, also since path_to_tree_path is only a helper function that makes sure the input path is bytes with the correct separator I just removed the line os.path.relpath(...) that did nothing and took out the repopath from the input, since not used. Also added a test.

Copy link
Owner

@jelmer jelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a tests that fail without your changes to get_untracked_paths/check_ignore.

"""
os.path.relpath(path, repopath)
if not isinstance(path, bytes):
path = path.encode(DEFAULT_ENCODING)
Copy link
Owner

@jelmer jelmer Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to decode from sys.getfilesystemencoding() if original path was a bytestring filesystem path.

E.g. the filesystem path can be iso8859, while the git encoding is 'utf-8'.

@@ -387,7 +388,7 @@ def remove(repo=".", paths=None, cached=False):
index = r.open_index()
for p in paths:
full_path = os.path.abspath(p).encode(sys.getfilesystemencoding())
tree_path = path_to_tree_path(r.path, p)
tree_path = path_to_tree_path(p)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tree_path needs to be relative to the repository path; where is this enforced now? os.getcwd() is not necessarily r.path.

Copy link
Contributor Author

@r0mainK r0mainK Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it was never enforced here apparently, as you can see from the diff path_to_tree_path did not enforce it before since the call to os.path.relpath did not change the value of the returned path

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. IMO that was the bug, not that r.path was passed into path_to_tree_path in the first place.

@r0mainK r0mainK changed the title Fix path_to_tree_path and check_ignore Fix path_to_tree_path Jun 14, 2018
@jelmer
Copy link
Owner

jelmer commented Jun 15, 2018

A related bug is #598

@r0mainK
Copy link
Contributor Author

r0mainK commented Jun 25, 2018

@jelmer hey, sorry for being afk last week ^^" anyway I looked into #598 and it was indeed related. With the changes I brought it worked for me.
So now here is the flow of path_to_tree_path:

  1. Convert both path and repopath to bytes, replace separator if needed.
  2. Test if path is relative to repopath. Can't use os.path.join because it detects common absolute paths, so I do (repopath + b'/' + path).replace(b'//', b'/'). If this exists, then we can return path directly.
  3. If not, at this point we know both path and repopath can be absolute or relative to CWD. If one is absolute and the other relative, we transform the relative to absolute, then return relpath(path, repopath)

I also added tests for path_to_tree_path. Regarding the builds, two test suites are failing (Python:2.7 and Python: pypy) with a lot of errors, but the errors do not seem related to my code, e.g.:

ERROR: test_simple (dulwich.tests.test_porcelain.LsFilesTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "dulwich/tests/test_porcelain.py", line 1261, in test_simple
    porcelain.add(repo=self.repo.path, paths=[fullpath])
  File "dulwich/porcelain.py", line 386, in add
    if ignore_manager.is_ignored(relpath):
  File "dulwich/ignore.py", line 336, in is_ignored
    matches = list(self.find_matching(path))
  File "dulwich/ignore.py", line 308, in find_matching
    raise ValueError('%s is an absolute path' % path)
ValueError: /tmp/tmpbM2QIK/repo/foo is an absolute path

In this error path_to_tree_path is not called, and I did not touch anything else. By the way, I saw you just did a release but I would rly appreciate if you could do a new one once this gets merged, be it as is or after changes. I've finished a project that relies on dulwich last week, and it fails because of the #598 right now ^^".

:param repo: Repository
:param path: A path
:param repopath: Repository path, absolute or relative to the cwd
:param path: A path, absolute or relative to either the cwd or repopath
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing relative paths to either cwd or repopath makes the behaviour of this function unpredictable. This should just be cwd (to be compatible with c git).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this was to stay compatible with check ignore command which seemed, according to tests, to necessitate this. Should i change tests or logic of that function then ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dulwich.porcelain.check_ignore should also behave similarly to "git check-ignore". It looks like this will indeed require changing the tests for check_ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

os.path.sep.encode(sys.getfilesystemencoding()), b'/')
repopath = repopath.replace(
os.path.sep.encode(sys.getfilesystemencoding()), b'/')
if os.path.exists((repopath + b'/' + path).replace(b'//', b'/')):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - this makes the behaviour of the function unpredictable, relative paths should only ever be interpreted in one way.

".", path))
self.assertEqual(b"bar/baz", porcelain.path_to_tree_path(
cwd, "bar/baz"))
os.mkdir("foo")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use TestCaseInTempDir if you're going to create files/directories, otherwise we're leaving these all over the source directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, Will change first thing tomoreiw

@r0mainK
Copy link
Contributor Author

r0mainK commented Jun 26, 2018

@jelmer did the changes you asked, but again tests for Python2.7 and PyPy are failing, do you know what this is about ?

@jelmer
Copy link
Owner

jelmer commented Jun 26, 2018 via email

@r0mainK
Copy link
Contributor Author

r0mainK commented Jul 2, 2018

@jelmer ping - I know you made comments on the other PR that I will address, but since it is a feature for add which uses path_to_tree_path via get_untracked_pathsI would like to get this merged before in all cases. It fixes, as far as I know, #598 and again tests suites for Python2.7 and Pypy are currently failing for reasons I do not understand, if you could have a look Id really appreciate.

Copy link
Owner

@jelmer jelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good, but I've left two more comments. Can you add a test case for #598 as well?


class HelperTests(PorcelainTestCase):

def test_path_to_tree_path_base(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you perhaps also add a test for the situation where the path is outside of repopath? That should raise an exception of some sort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, then I'll add a ValueError to be raised when the returned path starts with ..

self.assertEqual(
['foo'],
list(porcelain.check_ignore(self.repo, ['foo'], no_index=True)))
list(porcelain.check_ignore(self.repo, [path], no_index=True)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a check with a relative path too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if not no_index and path_to_tree_path(r.path, path) in index:
continue
if os.path.isabs(path):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this change is what made the python2.7 tests fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thks, gonna check it out

@r0mainK
Copy link
Contributor Author

r0mainK commented Jul 3, 2018

@jelmer added the 3 tests you wanted, and checked if it was the continue block the problem but it doesnt seem so, I wonder if Travis isnt broken for some reason - like when it seemingly cant find refs Oo

@r0mainK
Copy link
Contributor Author

r0mainK commented Jul 3, 2018

OK so the error was in the tests I added for path_to_tree_path, more precisely when testing the case where the separator is different from /. This allowed me to see that os.path library of python 2 and 3 is not compatible in this regard:

  • in python 2, I had to change the separator after I had done relpath:
    if os.path.sep != '/':
        treepath = treepath.replace(
            os.path.sep.encode(sys.getfilesystemencoding()), b'/')

-> or the returned tree_path was an absolute path: b'/Users/romain/dulwich/bar/baz'

  • in python 3, I had to do it before:
    if os.path.sep != '/':
        path = path.replace(
            os.path.sep.encode(sys.getfilesystemencoding()), b'/')
        repopath = repopath.replace(
            os.path.sep.encode(sys.getfilesystemencoding()), b'/')

-> or a ValueError is raised because tree_path=b'../@sepUsers@sepromain@sepdulwich@sepbar@sepbaz'

So I have taken this test and feature out.

Signed-off-by: Romain Keramitas <[email protected]>
@r0mainK
Copy link
Contributor Author

r0mainK commented Jul 17, 2018

@jelmer pings, the appveyor build is failing because of change detailed previous comment, tell me what you would rather do.

@jelmer jelmer merged commit 7125a45 into jelmer:master Jul 18, 2018
@jelmer
Copy link
Owner

jelmer commented Jul 18, 2018

Thanks, Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants