| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
There used to be a meaningful distinction between these modules: `pathlib`
imported `pathlib._abc` but not `pathlib.types`. This is no longer the
case (neither module is imported), so we move the ABCs as follows:
- `pathlib._abc.JoinablePath` --> `pathlib.types._JoinablePath`
- `pathlib._abc.ReadablePath` --> `pathlib.types._ReadablePath`
- `pathlib._abc.WritablePath` --> `pathlib.types._WritablePath`
|
|
|
|
|
|
|
|
| |
Remove the *mode*, *parents* and *exist_ok* arguments from
`WritablePath.mkdir()`. These arguments imply support for POSIX permissions
and checking for preexistence of the path or its parents, but subclasses of
`WritablePath` may not have these capabilities.
The public `Path.mkdir()` method retains these arguments.
|
|
|
|
|
|
| |
Remove `ReadablePath` methods duplicated by `ReadablePath.info`. To be
specific, we remove `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
The public `Path` class retains these methods.
|
|
|
|
|
|
|
|
| |
(#116392)" (#130743)
This broke tests on the 'aarch64 Fedora Stable Clang Installed 3.x' and
'AMD64 Fedora Stable Clang Installed 3.x' build bots.
This reverts commit da4899b94a9a9083fed4972b2473546e0d997727.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## Filtered recursive walk
Expanding a recursive `**` segment entails walking the entire directory
tree, and so any subsequent pattern segments (except special segments) can
be evaluated by filtering the expanded paths through a regex. For example,
`glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with
`os.scandir()`, and then filters paths through a regex based on "`**/*.py`,
with no further filesystem access needed.
This fixes an issue where `glob()` could return duplicate results.
## Tracking path existence
We store a flag alongside each path indicating whether the path is
guaranteed to exist. As we process the pattern:
- Certain special pattern segments (`""`, `"."` and `".."`) leave the flag
unchanged
- Literal pattern segments (e.g. `foo/bar`) set the flag to false
- Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because
children are found via `os.scandir()`)
- Recursive pattern segments (e.g. `**`) leave the flag unchanged for the
root path, and set it to true for descendants discovered via
`os.scandir()`.
If the flag is false at the end, we call `lstat()` on each path to filter
out missing paths.
## Minor speed-ups
- Exclude paths that don't match a non-terminal non-recursive wildcard
pattern _prior_ to calling `is_dir()`.
- Use a stack rather than recursion to implement recursive wildcards.
- This fixes a recursion error when globbing deep trees.
- Pre-compile regular expressions and pre-join literal pattern segments.
- Convert to/from `bytes` (a minor use-case) in `iglob()` rather than
supporting `bytes` throughout. This particularly simplifies the code
needed to handle relative bytes paths with `dir_fd`.
- Avoid calling `os.path.join()`; instead we keep paths in a normalized
form and append trailing slashes when needed.
- Avoid calling `os.path.normcase()`; instead we use case-insensitive regex
matching.
## Implementation notes
Much of this functionality is already present in pathlib's implementation
of globbing. The specific additions we make are:
1. Support for `dir_fd`
2. Support for `include_hidden`
3. Support for generating paths relative to `root_dir`
This unifies the implementations of globbing in the `glob` and `pathlib`
modules.
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
|
|
|
| |
This feature isn't sufficiently motivated.
|
|
|
|
|
|
|
|
| |
Replace `WritablePath._copy_writer` with a new `_write_info()` method. This
method allows the target of a `copy()` to preserve metadata.
Replace `pathlib._os.CopyWriter` and `LocalCopyWriter` classes with new
`copy_file()` and `copy_info()` functions. The `copy_file()` function uses
`source_path.info` wherever possible to save on `stat()`s.
|
|
|
|
|
| |
In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an
unpopulated `info` attribute, rather than a `Path` object with information
recorded *prior* to the path's creation.
|
|
|
|
|
|
|
| |
(#130422)
Call `ReadablePath.info.exists()` rather than `ReadablePath.exists()` when
globbing so that we use (or populate) the `info` cache.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the following methods, skip casting of the argument to a path object if
the argument has a `with_segments` attribute. In `PurePath`:
`relative_to()`, `is_relative_to()`, `match()`, and `full_match()`. In
`Path`: `rename()`, `replace()`, `copy()`, `copy_into()`, `move()`, and
`move_into()`.
Previously the check varied a bit from method to method. The `PurePath`
methods used `isinstance(arg, PurePath)`; the `rename()` and `replace()`
methods always cast, and the remaining `Path` methods checked for a private
`_copy_writer` attribute.
We apply identical changes to relevant methods of the private ABCs. This
improves performance a bit, because `isinstance()` checks on ABCs are
expensive.
|
|
|
|
| |
Remove `ReadablePath.rglob()` from the private pathlib ABCs. This method is
a trivial wrapper around `glob()` and easily replaced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following private methods to `pathlib.Path.info`:
- `_posix_permissions()`: the POSIX file permissions (`S_IMODE(st_mode)`)
- `_file_id()`: the file ID (`(st_dev, st_ino)`)
- `_access_time_ns()`: the access time in nanoseconds (`st_atime_ns`)
- `_mod_time_ns()`: the modify time in nanoseconds (`st_mtime_ns`)
- `_bsd_flags()`: the BSD file flags (`st_flags`)
- `_xattrs()`: the file extended attributes as a list of key, value pairs,
or an empty list if `listxattr()` or `getxattr()` fail in an ignorable
way.
These methods replace `LocalCopyReader.read_metadata()`, and so we can
delete the `CopyReader` and `LocalCopyReader` classes. Rather than reading
metadata via `source._copy_reader.read_metadata()`, we instead call
`source.info._posix_permissions()`, `_access_time_ns()`, etc.
Preserving metadata is only supported for local-to-local copies at the
moment. To support copying metadata between arbitrary `ReadablePath` and
`WritablePath` objects, we'd need to make the new methods public and
documented.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Remove the caching `_is_case_sensitive()` function.
The cache used to speed up `PurePath.[full_]match()` and `Path.[r]glob()`,
but that's no longer the case - these methods use
`self.parser is posixpath` to determine case sensitivity.
This makes the `pathlib._abc` module a little easier to backport to Python
3.8, where `functools.cache()` is unavailable.
|
|
|
|
|
|
|
|
|
|
|
| |
Convert `JoinablePath`, `ReadablePath` and `WritablePath` to real ABCs
derived from `abc.ABC`.
Make `JoinablePath.parser` abstract, rather than defaulting to `posixpath`.
Register `PurePath` and `Path` as virtual subclasses of the ABCs rather
than deriving. This avoids a hit to path object instantiation performance.
No change of behaviour in the public (non-abstract) classes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#129856)
Move pathlib's private `CopyReader`, `LocalCopyReader`, `CopyWriter` and
`LocalCopyWriter` classes into `pathlib._os`, where they can live alongside
the low-level copying functions (`copyfileobj()` etc) and high-level path
querying interface (`PathInfo`).
This sets the stage for merging `LocalCopyReader` into `PathInfo`.
No change of behaviour; just moving some code around.
|
|
|
|
|
|
|
|
|
|
| |
In the private pathlib ABCs, make `ReadablePath.glob('')` yield a path with
a trailing slash (if it yields anything at all). As a result, `glob()`
works similarly to `joinpath()` when given a non-magic pattern.
In the globbing implementation, we preemptively add trailing slashes to
intermediate paths if there are pattern parts remaining; this removes the
need to check for existing trailing slashes (in the removed `add_slash()`
method) at subsequent steps.
|
|
|
|
|
| |
Add `pathlib.Path.info` attribute, which stores an object implementing the `pathlib.types.PathInfo` protocol (also new). The object supports querying the file type and internally caching `os.stat()` results. Path objects generated by `Path.iterdir()` are initialised with status information from `os.DirEntry` objects, which is gleaned from scanning the parent directory.
The `PathInfo` protocol has four methods: `exists()`, `is_dir()`, `is_file()` and `is_symlink()`.
|
|
|
|
|
|
|
| |
Unlike `ReadablePath.[r]glob()` and `JoinablePath.full_match()`, the
`JoinablePath.match()` method doesn't support the recursive wildcard `**`,
and matches from the right when a fully relative pattern is given. These
quirks means its probably unsuitable for inclusion in the pathlib ABCs,
especially given `full_match()` handles the same use case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#129014)
In the private pathlib ABCs, support write-only virtual filesystems by
making `WritablePath` inherit directly from `JoinablePath`, rather than
subclassing `ReadablePath`.
There are two complications:
- `ReadablePath.open()` applies to both reading and writing
- `ReadablePath.copy` is secretly an object that supports the *read* side
of copying, whereas `WritablePath.copy` is a different kind of object
supporting the *write* side
We untangle these as follow:
- A new `pathlib._abc.magic_open()` function replaces the `open()` method,
which is dropped from the ABCs but remains in `pathlib.Path`. The
function works like `io.open()`, but additionally accepts objects with
`__open_rb__()` or `__open_wb__()` methods as appropriate for the mode.
These new dunders are made abstract methods of `ReadablePath` and
`WritablePath` respectively. If the pathlib ABCs are made public, we
could consider blessing an "openable" protocol and supporting it in
`io.open()`, removing the need for `pathlib._abc.magic_open()`.
- `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is
deleted. A new `ReadablePath._copy_reader` property provides a
`CopyReader` object, and similarly `WritablePath._copy_writer` is a
`CopyWriter` object. Once GH-125413 is resolved, we'll be able to move
the `CopyReader` functionality into `ReadablePath.info` and eliminate
`ReadablePath._copy_reader`.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the private pathlib ABCs, rename `PurePathBase` to `JoinablePath`, and
split `PathBase` into `ReadablePath` and `WritablePath`. This improves the
API fit for read-only virtual filesystems.
The split of `PathBase` entails a similar split of `CopyWorker` (implements
copying) and the test cases in `test_pathlib_abc`.
In a later patch, we'll make `WritablePath` inherit directly from
`JoinablePath` rather than `ReadablePath`. For a couple of reasons,
this isn't quite possible yet.
|
|
|
|
|
| |
These methods combine `_delete()` and `copy()`, but `_delete()` isn't part
of the public interface, and it's unlikely to be added until the pathlib
ABCs are made official, or perhaps even later.
|
|
|
|
|
|
|
|
| |
Remove `PurePathBase.relative_to()` and `is_relative_to()` because they
don't account for *other* being an entirely different kind of path, and
they can't use `__eq__()` because it's not on the `PurePathBase` interface.
Remove `PurePathBase.drive`, `root`, `is_absolute()` and `as_posix()`.
These are all too specific to local filesystems.
|
|
|
|
|
|
|
| |
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API,
with its 10 mandatory fields and low-level types, makes it an awkward fit
for virtual filesystems.
We'll look to add a `PathBase.info` attribute later - see GH-125413.
|
|
|
|
|
|
|
|
|
|
|
| |
(#127810)
Move 9 private `PathBase` attributes and methods into a new `CopyWorker`
class. Change `PathBase.copy` from a method to a `CopyWorker` instance.
The methods remain private in the `CopyWorker` class. In future we might
make some/all of them public so that user subclasses of `PathBase` can
customize the copying process (in particular reading/writing of metadata,)
but we'd need to make `PathBase` public first.
|
|
|
|
|
| |
From `PurePathBase` delete `_globber`, `_stack` and `_pattern_str`, and
from `PathBase` delete `_glob_selector`. This helps avoid an unpleasant
surprise for a users who try to use these names.
|
|
|
|
|
| |
Remove the `PurePathBase` initializer, and make `with_segments()` and
`__str__()` abstract. This allows us to drop the `_raw_paths` attribute,
and also the `Parser.join()` protocol method.
|
|
|
|
|
| |
This method helped us customise the `UnsupportedOperation` message
depending on the type. But we're aiming to make `PathBase` a proper ABC
soon, so `NotImplementedError` is the right exception to raise there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the following methods from `pathlib._abc.PathBase`:
- `expanduser()`
- `hardlink_to()`
- `touch()`
- `chmod()`
- `lchmod()`
- `owner()`
- `group()`
- `from_uri()`
- `as_uri()`
These operations aren't regularly supported in virtual filesystems, so they
don't win a place in the `PathBase` interface. (Some of them probably don't
deserve a place in `Path` :P.) They're quasi-abstract (except `lchmod()`),
and they're not called by other `PathBase` methods.
|
|
|
|
|
|
|
|
|
|
| |
(#127709)
Remove `PathBase.samefile()`, which is fairly specific to the local FS, and
relies on `stat()`, which we're aiming to remove from `PathBase`.
Also remove `PathBase.is_mount()`, `is_junction()`, `is_block_device()`,
`is_char_device()`, `is_fifo()` and `is_socket()`. These rely on POSIX
file type numbers that we're aiming to remove from the `PathBase` API.
|
|
|
|
|
|
|
|
|
|
| |
Change the default value of `PurePathBase.parser` from `ParserBase()` to
`posixpath`. As a result, user subclasses of `PurePathBase` and `PathBase`
use POSIX path syntax by default, which is very often desirable.
Move `pathlib._abc.ParserBase` to `pathlib._types.Parser`, and convert it
to a runtime-checkable protocol.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
|
|
|
|
|
|
|
| |
Virtual filesystems don't always make a distinction between deleting files
and empty directories, and sometimes support deleting non-empty directories
in a single operation. Here we remove `PathBase.unlink()` and `rmdir()`,
leaving `_delete()` as the sole deletion method, now made abstract. I hope
to drop the underscore prefix later on.
|
|
|
|
|
|
|
|
|
| |
Remove our implementation of POSIX path resolution in `PathBase.resolve()`.
This functionality is rather fragile and isn't necessary in most cases. It
depends on `PathBase.stat()`, which we're looking to remove.
Also remove `PathBase.absolute()`. Many legitimate virtual filesystems lack
the notion of a 'current directory', so it's wrong to include in the basic
interface.
|
|
|
|
| |
These methods are obviated by `PathBase.move()`, which can move directories
and supports any `PathBase` object as a target.
|
|
|
|
|
|
|
|
|
|
| |
Remove documentation for `pathlib.Path.scandir()`, and rename the method to
`_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and
call it from `_scandir()`.
It's not worthwhile to add this method at the moment - see discussion:
https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
|
|
|
|
|
| |
These classmethods presume that the user has retained the original
`__init__()` signature, which may not be the case. Also, many virtual
filesystems don't provide current or home directories.
|
|
|
|
|
|
| |
Remove the `PathBase.lstat()` method, which is a trivial variation of
`stat()`.
No user-facing changes because the pathlib ABCs are still private.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `PathBase.resolve()`, raise `UnsupportedOperation` if a non-POSIX path
parser is used (our implementation uses `posixpath._realpath()`, which
produces incorrect results for non-POSIX path flavours.) Also tweak code to
call `self.absolute()` upfront rather than supplying an emulated `getcwd()`
function.
Adjust `PathBase.absolute()` to work somewhat like `resolve()`. If a POSIX
path parser is used, we treat the root directory as the current directory.
This is the simplest useful behaviour for concrete path types without a
current directory cursor.
|
|
|
|
|
|
|
|
| |
In the past I've equivocated about whether to require at least one argument
in the `PurePathBase` (and `PathBase`) initializer, and what the default
should be if we make it optional. I now have a local use case that has
persuaded me to make it optional and default to the empty string (a
`zipp.Path`-like class that treats relative and absolute paths similarly.)
Happily this brings the base class more in line with `PurePath` and `Path`.
|
|
|
|
|
|
|
|
| |
Defer joining of path segments in the private `PurePathBase` ABC. The new
behaviour matches how the public `PurePath` class handles path segments.
This removes a hard-to-grok difference between the ABCs and the main
classes. It also slightly reduces the size of `PurePath` objects by
eliminating a `_raw_path` slot.
|
|
|
|
|
|
|
| |
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly
reduces the number of `PathBase.stat()` calls needed when walking.
There are no user-facing changes, because the pathlib ABCs are still
private and `Path.walk()` doesn't use the implementation in its superclass.
|
|
|
|
|
|
|
| |
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly
reduces the number of `PathBase.stat()` calls needed when globbing.
There are no user-facing changes, because the pathlib ABCs are still
private and `Path.glob()` doesn't use the implementation in its superclass.
|
|
|
|
|
| |
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. This
will be used to implement several `PathBase` methods more efficiently,
including methods that provide `Path.copy()`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove *ignore* and *on_error* arguments from `pathlib.Path.copy[_into]()`,
because these arguments are under-designed. Specifically:
- *ignore* is appropriated from `shutil.copytree()`, but it's not clear
how it should apply when the user copies a non-directory. We've changed
the callback signature from the `shutil` version, but I'm not confident
the new signature is as good as it can be.
- *on_error* is a generalisation of `shutil.copytree()`'s error handling,
which is to accumulate exceptions and raise a single `shutil.Error` at
the end. It's not obvious which solution is better.
Additionally, this arguments may be challenging to implement in future user
subclasses of `PathBase`, which might utilise a native recursive copying
method.
|
|
|
|
|
|
|
|
| |
Per feedback from Paul Moore on GH-123158, it's better to defer making
`Path.delete()` public than ship it with under-designed error handling
capabilities.
We leave a remnant `_delete()` method, which is used by `move()`. Any
functionality not needed by `move()` is deleted.
|
|
|
|
|
|
|
|
|
|
|
|
| |
These two methods accept an *existing* directory path, onto which we join
the source path's base name to form the final target path.
A possible alternative implementation is to check for directories in
`copy()` and `move()` and adjust the target path, which is done in several
`shutil` functions. This behaviour is helpful in a shell context, but
less so in a stored program that explicitly specifies destinations. For
example, a user that calls `Path('foo.py').copy('bar.py')` might not
imagine that `bar.py/foo.py` would be created, but under the alternative
implementation this will happen if `bar.py` is an existing directory.
|
|
|
|
|
| |
Add a `Path.move()` method that moves a file or directory tree, and returns a new `Path` instance pointing to the target.
This method is similar to `shutil.move()`, except that it doesn't accept a *copy_function* argument, and it doesn't check whether the destination is an existing directory.
|
|
|
|
| |
(#122924)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Path.read_bytes()` is used to read a whole file. buffering /
BufferedIO is focused around making small, possibly interleaved,
read/write efficient which doesn't add value in this case.
On my Mac, running the benchmark:
```python
import pyperf
from pathlib import Path
def read_all(all_paths):
for p in all_paths:
p.read_bytes()
def read_file(path_obj):
path_obj.read_bytes()
all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"
runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))
```
before:
```python
.....................
read_file_small: Mean +- std dev: 6.80 us +- 0.07 us
.....................
read_file_large: Mean +- std dev: 10.8 us +- 0.2 us
````
after:
```python
.....................
read_file_small: Mean +- std dev: 5.67 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 9.77 us +- 0.52 us
```
|
|
|
|
|
|
|
|
|
|
| |
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)
Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
|
|
|
|
|
|
| |
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
|