Charles Stanhope

Charles Stanhope at

I just learned a painful lesson about the behavior of os.path.join() in Python when it encounters an argument that begins with a "/"... Good thing I was working in a VM while I learned this lesson.

"If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component."

Just before that the docs say, "Join one or more path components intelligently." That's probably not the adverb I would select. Oh well... :)

AJ Jordan, Christopher Allan Webber likes this.

clacke@libranet.de ❌ shared this.

Show all 6 replies

>> Charles ☕ Stanhope:

“I'm still puzzling over why that is a desirable default behavior for a function called join(). I'm sure the answer is in a mailing list archive or PEP.”

I found this email from 2012, discussing PEP 428:

“>> What's the use case for this behavior?

>>

>> I'd much rather if joining an absolute path to a relative one fail and

>> reveal the potential bug....

>>

>> >>> os.unlink(Path('myproj') / Path('/lib'))

>> Traceback (most recent call last):

>> File "", line 1, in

>> TypeError: absolute path can't be appended to a relative path

>

> In all honesty I followed os.path.join's behaviour here. I agree a

> ValueError (not TypeError) would be sensible too.


Please no -- this is a very important use case (for os.path.join, at least):

resolving a path from config/user/command line that can be given either absolute

or relative to a certain directory.


Right now it's as simple as join(default, path), and i'd prefer to keep this.

There is no bug here, it's working as designed. ”

https://mail.python.org/pipermail/python-ideas/2012-October/016474.html

James Dearing 🐲 at 2017-01-04T02:14:06Z

James Dearing 🐲 Wow! Nice find! Thanks for digging that up. It's basically what I figured it would be, somebody preserving some sort of existing behavior. I still find it surprising behavior based on the name. It doesn't just join, it also elides.

Charles Stanhope at 2017-01-04T14:00:37Z

Look at it like this: For each component you add to the chain, it resolves the next path as if it was standing at the path so far and resolved the new component from there. (a sequence of `cd`s, if one could `cd` to a file)

It makes no sense for any component to start with a slash, even though I see how that might happen (e.g. git encourages you to make relative-absolute ignore paths, or it will apply the filter in every subdirectory).

Still, it's a good quirk to know about when validating user input.

clacke@libranet.de ❌ at 2017-01-13T05:23:53Z

@Clacke moved to quitter.se and microca.st I understand the behavior. I just think the function is misnamed. You just described it's operation as a series of 'cd' invocations. That's not a join operation. That's a 'virtual_cd_chain' or something. :)

But on top of that:

>>> os.path.join("path", ".",".",".")
'path/././.'

Which is not what you would get with a series of 'cd' commands. So the function sort of works like you said, except when it doesn't. ;)

Anyway, all languages have their quirks, and it turns out that for most of my use cases, os.path.join() does not have desirable behavior. But that's okay because I was able to create a one line replacement that did what I needed it to do. I just have to remember to use it. :)

By the way, my use case is creating directories that hold the rootfs of a container or disk image. It's very natural to refer to "/etc" or the "/bin" of the destination rootfs and have the tool place things in their correct place relative to some starting directory in the host. In this situation, I wanted to "join" two paths to each other, so I reached for the os.path.join() function and learned a very valuable lesson. :)

Charles Stanhope at 2017-01-13T16:01:31Z