aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Lib/pathlib
Commit message (Collapse)AuthorAge
* GH-128520: pathlib ABCs: add `JoinablePath.__vfspath__()` (#133437)Barney Gale2025-05-12
| | | | | | | | In the abstract interface of `JoinablePath`, replace `__str__()` with `__vfspath__()`. This frees user implementations of `JoinablePath` to implement `__str__()` however they like (or not at all.) Also add `pathlib._os.vfspath()`, which calls `__fspath__()` or `__vfspath__()`.
* GH-128520: pathlib ABCs: raise text encoding warnings at correct stack level ↵Barney Gale2025-04-28
| | | | | | | (#133051) Ensure that warnings about unspecified text encodings are emitted from `ReadablePath.read_text()`, `WritablePath.write_text()` and `magic_open()` with the correct stack level set.
* GH-125866: Support complete "file:" URLs in urllib (#132378)Barney Gale2025-04-14
| | | | | | | | Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when set to true, a complete URL is returned. Likewise add optional *require_scheme* argument to `url2pathname()`; when set to true, a complete URL is accepted. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* GH-123599: `url2pathname()`: handle authority section in file URL (#126844)Barney Gale2025-04-10
| | | | | | | | | | | In `urllib.request.url2pathname()`, if the authority resolves to the current host, discard it. If an authority is present but resolves somewhere else, then on Windows we return a UNC path (as before), and on other platforms we raise `URLError`. Affects `pathlib.Path.from_uri()` in the same way. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* GH-128520: pathlib ABCs: tighten up argument types (#131621)Barney Gale2025-03-24
| | | | | | | In `JoinablePath.full_match()` and `ReadablePath.glob()`, accept a `str` pattern argument rather than `JoinablePath | str`. In `ReadablePath.copy()` and `copy_into()`, accept a `WritablePath` target argument rather than `WritablePath | str`.
* GH-128520: pathlib ABCs: validate `magic_open()` arguments (#131617)Barney Gale2025-03-24
| | | | | When `pathlib._os.magic_open()` is called to open a path in binary mode, raise `ValueError` if any of the *encoding*, *errors* or *newline* arguments are given. This matches the `open()` built-in.
* GH-128520: pathlib ABCs: reject empty pattern in `ReadablePath.glob()` (#127343)Barney Gale2025-03-24
| | | | | For compatibility with `Path.glob()`, raise `ValueError` if an empty pattern is given to `ReadablePath.glob()`.
* GH-123599: Deprecate duplicate `pathname2url()` implementation (#127380)Barney Gale2025-03-20
| | | | | | Call `urllib.request.pathname2url()` from `pathlib.Path.as_uri()`, and deprecate the duplicate implementation in `PurePath`. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
* GH-123599: Remove duplicate `url2pathname()` implementation (#127237)Barney Gale2025-03-19
| | | | | | Call `urllib.request.url2pathname()` from `pathlib.Path.from_uri()` rather than re-implementing it. This paves the way for solving the main issue (ignoring local authorities and rejecting non-local ones) in urllib, not pathlib.
* GH-130614: pathlib ABCs: improve support for receiving path metadata (#131259)Barney Gale2025-03-16
| | | | | | | | | | | In the private pathlib ABCs, replace `_WritablePath._write_info()` with `_WritablePath._copy_from()`. This provides the target path object with more control over the copying process, including support for querying and setting metadata *before* the path is created. Adjust `_ReadablePath.copy()` so that it forwards its keyword arguments to `_WritablePath._copy_from()` of the target path object. This allows us to remove the unimplemented *preserve_metadata* argument in the ABC method, making it a `Path` exclusive.
* GH-130614: pathlib ABCs: parametrize test suite for path copying (#131168)Barney Gale2025-03-13
| | | | Test copying from `Path` and `ReadableZipPath` (types of `_ReadablePath`) to `Path` and `WritableZipPath` (types of `_WritablePath`).
* GH-127381: pathlib ABCs: remove `case_sensitive` argument (#131024)Barney Gale2025-03-10
| | | | | | | | | | | | | | | | Remove the *case_sensitive* argument from `_JoinablePath.full_match()` and `_ReadablePath.glob()`. Using a non-native case sensitivity forces the use of "case-pedantic" globbing, where we `iterdir()` even for non-wildcard pattern segments. But it's hard to know when to enable this mode, as case-sensitivity can vary by directory, so `_PathParser.normcase()` doesn't always give the full picture. The `Path.glob()` implementation is forced to make an educated guess, but we can avoid the issue in the ABCs by dropping the *case_sensitive* argument. (I probably shouldn't have added these arguments in `PurePath` and `Path` in the first place!) Also drop support for `_ReadablePath.glob(recurse_symlinks=False)`, which makes recursive globbing much slower.
* GH-130614: pathlib ABCs: support alternate separator in `full_match()` (#130991)Barney Gale2025-03-09
| | | | | | In `pathlib.types._JoinablePath.full_match()`, treat alternate path separators in the path and pattern as if they were primary separators. e.g. if the parser is `ntpath`, then `P(r'foo/bar\baz').full_match(r'*\*/*')` is true.
* GH-130614: pathlib ABCs: retain original separator in `with_name()` (#130990)Barney Gale2025-03-09
| | | | | In `pathlib.types._JoinablePath.with_name()`, retain any alternative path separator preceding the old name, rather stripping and replacing it with a primary separator. As a result, this method changes _only_ the name.
* GH-128520: Merge `pathlib._local` into `pathlib` (#130748)Barney Gale2025-03-07
| | | | | The `pathlib` module used to import stuff from both `_abc` and `_local`, but nowadays the `_local` module provides the entire public pathlib implementation, so there's no reason for the indirection.
* GH-128520: Merge `pathlib._abc` into `pathlib.types` (#130747)Barney Gale2025-03-03
| | | | | | | | | There used to be a meaningful distinction between these modules: `pathlib` imported `pathlib._abc` but not `pathlib.types`. This is no longer the case (neither module is imported), so we move the ABCs as follows: - `pathlib._abc.JoinablePath` --> `pathlib.types._JoinablePath` - `pathlib._abc.ReadablePath` --> `pathlib.types._ReadablePath` - `pathlib._abc.WritablePath` --> `pathlib.types._WritablePath`
* GH-127381: pathlib ABCs: remove `WritablePath.mkdir()` arguments (#130611)Barney Gale2025-03-01
| | | | | | | | Remove the *mode*, *parents* and *exist_ok* arguments from `WritablePath.mkdir()`. These arguments imply support for POSIX permissions and checking for preexistence of the path or its parents, but subclasses of `WritablePath` may not have these capabilities. The public `Path.mkdir()` method retains these arguments.
* GH-127381: pathlib ABCs: remove `ReadablePath.exists()` and `is_*()` (#130520)Barney Gale2025-03-01
| | | | | | Remove `ReadablePath` methods duplicated by `ReadablePath.info`. To be specific, we remove `exists()`, `is_dir()`, `is_file()` and `is_symlink()`. The public `Path` class retains these methods.
* Revert "GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. ↵Barney Gale2025-03-01
| | | | | | | | (#116392)" (#130743) This broke tests on the 'aarch64 Fedora Stable Clang Installed 3.x' and 'AMD64 Fedora Stable Clang Installed 3.x' build bots. This reverts commit da4899b94a9a9083fed4972b2473546e0d997727.
* GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392)Barney Gale2025-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Filtered recursive walk Expanding a recursive `**` segment entails walking the entire directory tree, and so any subsequent pattern segments (except special segments) can be evaluated by filtering the expanded paths through a regex. For example, `glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with `os.scandir()`, and then filters paths through a regex based on "`**/*.py`, with no further filesystem access needed. This fixes an issue where `glob()` could return duplicate results. ## Tracking path existence We store a flag alongside each path indicating whether the path is guaranteed to exist. As we process the pattern: - Certain special pattern segments (`""`, `"."` and `".."`) leave the flag unchanged - Literal pattern segments (e.g. `foo/bar`) set the flag to false - Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because children are found via `os.scandir()`) - Recursive pattern segments (e.g. `**`) leave the flag unchanged for the root path, and set it to true for descendants discovered via `os.scandir()`. If the flag is false at the end, we call `lstat()` on each path to filter out missing paths. ## Minor speed-ups - Exclude paths that don't match a non-terminal non-recursive wildcard pattern _prior_ to calling `is_dir()`. - Use a stack rather than recursion to implement recursive wildcards. - This fixes a recursion error when globbing deep trees. - Pre-compile regular expressions and pre-join literal pattern segments. - Convert to/from `bytes` (a minor use-case) in `iglob()` rather than supporting `bytes` throughout. This particularly simplifies the code needed to handle relative bytes paths with `dir_fd`. - Avoid calling `os.path.join()`; instead we keep paths in a normalized form and append trailing slashes when needed. - Avoid calling `os.path.normcase()`; instead we use case-insensitive regex matching. ## Implementation notes Much of this functionality is already present in pathlib's implementation of globbing. The specific additions we make are: 1. Support for `dir_fd` 2. Support for `include_hidden` 3. Support for generating paths relative to `root_dir` This unifies the implementations of globbing in the `glob` and `pathlib` modules. Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* GH-130608: Remove `dirs_exist_ok` argument from `pathlib.Path.copy()` (#130610)Barney Gale2025-02-28
| | | This feature isn't sufficiently motivated.
* GH-125413: Add private `pathlib.Path` method to write metadata (#130238)Barney Gale2025-02-26
| | | | | | | | Replace `WritablePath._copy_writer` with a new `_write_info()` method. This method allows the target of a `copy()` to preserve metadata. Replace `pathlib._os.CopyWriter` and `LocalCopyWriter` classes with new `copy_file()` and `copy_info()` functions. The `copy_file()` function uses `source_path.info` wherever possible to save on `stat()`s.
* GH-125413: Fix stale metadata from `pathlib.Path.copy()` and `move()` (#130424)Barney Gale2025-02-24
| | | | | In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an unpopulated `info` attribute, rather than a `Path` object with information recorded *prior* to the path's creation.
* GH-125413: pathlib ABCs: use caching `path.info.exists()` when globbing ↵Barney Gale2025-02-24
| | | | | | | (#130422) Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when globbing so that we use (or populate) the `info` cache.
* GH-128520: More consistent type-checking behaviour in pathlib (#130199)Barney Gale2025-02-21
| | | | | | | | | | | | | | | | In the following methods, skip casting of the argument to a path object if the argument has a `with_segments` attribute. In `PurePath`: `relative_to()`, `is_relative_to()`, `match()`, and `full_match()`. In `Path`: `rename()`, `replace()`, `copy()`, `copy_into()`, `move()`, and `move_into()`. Previously the check varied a bit from method to method. The `PurePath` methods used `isinstance(arg, PurePath)`; the `rename()` and `replace()` methods always cast, and the remaining `Path` methods checked for a private `_copy_writer` attribute. We apply identical changes to relevant methods of the private ABCs. This improves performance a bit, because `isinstance()` checks on ABCs are expensive.
* GH-127381: pathlib ABCs: remove `ReadablePath.rglob()` (#130207)Barney Gale2025-02-17
| | | | Remove `ReadablePath.rglob()` from the private pathlib ABCs. This method is a trivial wrapper around `glob()` and easily replaced.
* GH-125413: Add private metadata methods to `pathlib.Path.info` (#129897)Barney Gale2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | Add the following private methods to `pathlib.Path.info`: - `_posix_permissions()`: the POSIX file permissions (`S_IMODE(st_mode)`) - `_file_id()`: the file ID (`(st_dev, st_ino)`) - `_access_time_ns()`: the access time in nanoseconds (`st_atime_ns`) - `_mod_time_ns()`: the modify time in nanoseconds (`st_mtime_ns`) - `_bsd_flags()`: the BSD file flags (`st_flags`) - `_xattrs()`: the file extended attributes as a list of key, value pairs, or an empty list if `listxattr()` or `getxattr()` fail in an ignorable way. These methods replace `LocalCopyReader.read_metadata()`, and so we can delete the `CopyReader` and `LocalCopyReader` classes. Rather than reading metadata via `source._copy_reader.read_metadata()`, we instead call `source.info._posix_permissions()`, `_access_time_ns()`, etc. Preserving metadata is only supported for local-to-local copies at the moment. To support copying metadata between arbitrary `ReadablePath` and `WritablePath` objects, we'd need to make the new methods public and documented. Co-authored-by: Petr Viktorin <encukou@gmail.com>
* pathlib ABCs: remove caching of path parser case sensitivity (#130194)Barney Gale2025-02-16
| | | | | | | | | | Remove the caching `_is_case_sensitive()` function. The cache used to speed up `PurePath.[full_]match()` and `Path.[r]glob()`, but that's no longer the case - these methods use `self.parser is posixpath` to determine case sensitivity. This makes the `pathlib._abc` module a little easier to backport to Python 3.8, where `functools.cache()` is unavailable.
* GH-128520: Subclass `abc.ABC` in `pathlib._abc` (#128745)Barney Gale2025-02-16
| | | | | | | | | | | Convert `JoinablePath`, `ReadablePath` and `WritablePath` to real ABCs derived from `abc.ABC`. Make `JoinablePath.parser` abstract, rather than defaulting to `posixpath`. Register `PurePath` and `Path` as virtual subclasses of the ABCs rather than deriving. This avoids a hit to path object instantiation performance. No change of behaviour in the public (non-abstract) classes.
* GH-125413: Move `pathlib.Path.copy()` implementation alongside `Path.info` ↵Barney Gale2025-02-09
| | | | | | | | | | | | (#129856) Move pathlib's private `CopyReader`, `LocalCopyReader`, `CopyWriter` and `LocalCopyWriter` classes into `pathlib._os`, where they can live alongside the low-level copying functions (`copyfileobj()` etc) and high-level path querying interface (`PathInfo`). This sets the stage for merging `LocalCopyReader` into `PathInfo`. No change of behaviour; just moving some code around.
* GH-129835: Yield path with trailing slash from `ReadablePath.glob('')` (#129836)Barney Gale2025-02-08
| | | | | | | | | | In the private pathlib ABCs, make `ReadablePath.glob('')` yield a path with a trailing slash (if it yields anything at all). As a result, `glob()` works similarly to `joinpath()` when given a non-magic pattern. In the globbing implementation, we preemptively add trailing slashes to intermediate paths if there are pattern parts remaining; this removes the need to check for existing trailing slashes (in the removed `add_slash()` method) at subsequent steps.
* GH-125413: Add `pathlib.Path.info` attribute (#127730)Barney Gale2025-02-08
| | | | | Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory. The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
* GH-127381: pathlib ABCs: remove `JoinablePath.match()` (#129147)Barney Gale2025-01-28
| | | | | | | Unlike `ReadablePath.[r]glob()` and `JoinablePath.full_match()`, the `JoinablePath.match()` method doesn't support the recursive wildcard `**`, and matches from the right when a fully relative pattern is given. These quirks means its probably unsuitable for inclusion in the pathlib ABCs, especially given `full_match()` handles the same use case.
* GH-128520: Make `pathlib._abc.WritablePath` a sibling of `ReadablePath` ↵Barney Gale2025-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#129014) In the private pathlib ABCs, support write-only virtual filesystems by making `WritablePath` inherit directly from `JoinablePath`, rather than subclassing `ReadablePath`. There are two complications: - `ReadablePath.open()` applies to both reading and writing - `ReadablePath.copy` is secretly an object that supports the *read* side of copying, whereas `WritablePath.copy` is a different kind of object supporting the *write* side We untangle these as follow: - A new `pathlib._abc.magic_open()` function replaces the `open()` method, which is dropped from the ABCs but remains in `pathlib.Path`. The function works like `io.open()`, but additionally accepts objects with `__open_rb__()` or `__open_wb__()` methods as appropriate for the mode. These new dunders are made abstract methods of `ReadablePath` and `WritablePath` respectively. If the pathlib ABCs are made public, we could consider blessing an "openable" protocol and supporting it in `io.open()`, removing the need for `pathlib._abc.magic_open()`. - `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is deleted. A new `ReadablePath._copy_reader` property provides a `CopyReader` object, and similarly `WritablePath._copy_writer` is a `CopyWriter` object. Once GH-125413 is resolved, we'll be able to move the `CopyReader` functionality into `ReadablePath.info` and eliminate `ReadablePath._copy_reader`.
* GH-128520: Divide pathlib ABCs into three classes (#128523)Barney Gale2025-01-11
| | | | | | | | | | | | In the private pathlib ABCs, rename `PurePathBase` to `JoinablePath`, and split `PathBase` into `ReadablePath` and `WritablePath`. This improves the API fit for read-only virtual filesystems. The split of `PathBase` entails a similar split of `CopyWorker` (implements copying) and the test cases in `test_pathlib_abc`. In a later patch, we'll make `WritablePath` inherit directly from `JoinablePath` rather than `ReadablePath`. For a couple of reasons, this isn't quite possible yet.
* GH-127381: pathlib ABCs: remove `PathBase.move()` and `move_into()` (#128337)Barney Gale2025-01-04
| | | | | These methods combine `_delete()` and `copy()`, but `_delete()` isn't part of the public interface, and it's unlikely to be added until the pathlib ABCs are made official, or perhaps even later.
* GH-127381: pathlib ABCs: remove uncommon `PurePathBase` methods (#127853)Barney Gale2024-12-29
| | | | | | | | Remove `PurePathBase.relative_to()` and `is_relative_to()` because they don't account for *other* being an entirely different kind of path, and they can't use `__eq__()` because it's not on the `PurePathBase` interface. Remove `PurePathBase.drive`, `root`, `is_absolute()` and `as_posix()`. These are all too specific to local filesystems.
* GH-127381: pathlib ABCs: remove `PathBase.stat()` (#128334)Barney Gale2024-12-29
| | | | | | | Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API, with its 10 mandatory fields and low-level types, makes it an awkward fit for virtual filesystems. We'll look to add a `PathBase.info` attribute later - see GH-125413.
* GH-127807: pathlib ABCs: move private copying methods to dedicated class ↵Barney Gale2024-12-22
| | | | | | | | | | | (#127810) Move 9 private `PathBase` attributes and methods into a new `CopyWorker` class. Change `PathBase.copy` from a method to a `CopyWorker` instance. The methods remain private in the `CopyWorker` class. In future we might make some/all of them public so that user subclasses of `PathBase` can customize the copying process (in particular reading/writing of metadata,) but we'd need to make `PathBase` public first.
* GH-127807: pathlib ABCs: remove a few private attributes (#127851)Barney Gale2024-12-22
| | | | | From `PurePathBase` delete `_globber`, `_stack` and `_pattern_str`, and from `PathBase` delete `_glob_selector`. This helps avoid an unpleasant surprise for a users who try to use these names.
* GH-127807: pathlib ABCs: remove `PurePathBase._raw_paths` (#127883)Barney Gale2024-12-22
| | | | | Remove the `PurePathBase` initializer, and make `with_segments()` and `__str__()` abstract. This allows us to drop the `_raw_paths` attribute, and also the `Parser.join()` protocol method.
* GH-127807: pathlib ABCs: remove `PathBase._unsupported_msg()` (#127855)Barney Gale2024-12-12
| | | | | This method helped us customise the `UnsupportedOperation` message depending on the type. But we're aiming to make `PathBase` a proper ABC soon, so `NotImplementedError` is the right exception to raise there.
* GH-127381: pathlib ABCs: remove remaining uncommon `PathBase` methods (#127714)Barney Gale2024-12-12
| | | | | | | | | | | | | | | | | | Remove the following methods from `pathlib._abc.PathBase`: - `expanduser()` - `hardlink_to()` - `touch()` - `chmod()` - `lchmod()` - `owner()` - `group()` - `from_uri()` - `as_uri()` These operations aren't regularly supported in virtual filesystems, so they don't win a place in the `PathBase` interface. (Some of them probably don't deserve a place in `Path` :P.) They're quasi-abstract (except `lchmod()`), and they're not called by other `PathBase` methods.
* GH-127381: pathlib ABCs: remove `PathBase.samefile()` and rarer `is_*()` ↵Barney Gale2024-12-11
| | | | | | | | | | (#127709) Remove `PathBase.samefile()`, which is fairly specific to the local FS, and relies on `stat()`, which we're aiming to remove from `PathBase`. Also remove `PathBase.is_mount()`, `is_junction()`, `is_block_device()`, `is_char_device()`, `is_fifo()` and `is_socket()`. These rely on POSIX file type numbers that we're aiming to remove from the `PathBase` API.
* GH-127456: pathlib ABCs: add protocol for path parser (#127494)Barney Gale2024-12-09
| | | | | | | | | | Change the default value of `PurePathBase.parser` from `ParserBase()` to `posixpath`. As a result, user subclasses of `PurePathBase` and `PathBase` use POSIX path syntax by default, which is very often desirable. Move `pathlib._abc.ParserBase` to `pathlib._types.Parser`, and convert it to a runtime-checkable protocol. Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* GH-127381: pathlib ABCs: remove `PathBase.unlink()` and `rmdir()` (#127736)Barney Gale2024-12-08
| | | | | | | Virtual filesystems don't always make a distinction between deleting files and empty directories, and sometimes support deleting non-empty directories in a single operation. Here we remove `PathBase.unlink()` and `rmdir()`, leaving `_delete()` as the sole deletion method, now made abstract. I hope to drop the underscore prefix later on.
* GH-127381: pathlib ABCs: remove `PathBase.resolve()` and `absolute()` (#127707)Barney Gale2024-12-06
| | | | | | | | | Remove our implementation of POSIX path resolution in `PathBase.resolve()`. This functionality is rather fragile and isn't necessary in most cases. It depends on `PathBase.stat()`, which we're looking to remove. Also remove `PathBase.absolute()`. Many legitimate virtual filesystems lack the notion of a 'current directory', so it's wrong to include in the basic interface.
* GH-127381: pathlib ABCs: remove `PathBase.rename()` and `replace()` (#127658)Barney Gale2024-12-06
| | | | These methods are obviated by `PathBase.move()`, which can move directories and supports any `PathBase` object as a target.
* GH-125413: Revert addition of `pathlib.Path.scandir()` method (#127377)Barney Gale2024-12-05
| | | | | | | | | | Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721 Co-authored-by: Steve Dower <steve.dower@microsoft.com>
* GH-127381: pathlib ABCs: remove `PathBase.cwd()` and `home()` (#127427)Barney Gale2024-11-30
| | | | | These classmethods presume that the user has retained the original `__init__()` signature, which may not be the case. Also, many virtual filesystems don't provide current or home directories.