This brief post will describe the benefits of pathlib
over os.Path
.
When using Python, we often need to interact with the filesystem, for instance to create, delete, read or write files,
or to check if files exist.
Since Python 3.4 the pathlib
library introduced the Path
class for this, which has some advantages over
os.Path
, which tends to return strings.
Getting the cwd
Let’s first familiarise ourselves with the syntax, with a simple call to get the current working directory.
With os
:
In [1]: import os
In [2]: os.getcwd()
Out[2]: '/Users/Daniel/Desktop
Now, with pathlib
:
In [1]: from pathlib import Path
In [2]: Path.cwd()
Out[2]: PosixPath('/Users/Daniel/Desktop')
Whilst these two approaches are both simple, we are introduced to a minor difference, the pathlib
approach
returns a PosixPath
class due this being run in a Unix-like environment. This is a subtle but important distinction,
which the next example will bring out.
Check if file exists
With os
:
In [1]: import os.path
In [2]: os.path.exists(os.path.join(os.getcwd(), 'test.txt'))
Out[2]: True
Now, with pathlib
:
In [1]: from pathlib import Path
In [2]: Path.cwd().joinpath('test.txt').exists()
Out[2]: True
Using pathlib
means we can method-chain, so the code is a lot more readable than the os.path
equivalent
which needs nested calls. Method-chaining also makes it easy to use your IDEs auto-complete to understand all the
available functionality.
Navigating the file tree
With os
:
In [1]: import os.path
In [2]: os.path.dirname(os.path.dirname(os.getcwd()))
Out[2]: '/Users'
Now, with pathlib
:
In [1]: from pathlib import Path
In [2]: Path.cwd().parents[1]
Out[2]: PosixPath('/Users')
In this case pathlib
avoids nested function calls and method-chaining entirely. Instead, we access all the
possible parents of the path with the .parents
attribute and obtain the appropriate parent.
In [1]: list(Path.cwd().parents)
Out[1]: [PosixPath('/Users/Daniel'), PosixPath('/Users'), PosixPath('/')]
Example Use
pathlib
classes work with pandas for reading and writing:
In [1]: iris_df = pd.read_csv(Path.cwd().joinpath('iris.csv'))
In [2]: iris_df.to_csv(Path.cwd().joinpath('iris2.csv'))
We can also use them when reading and writing files with the standard library:
In [2]: with open(Path.cwd().joinpath('iris.csv')) as f:
...: print(f.readline())
"sepal.length","sepal.width","petal.length","petal.width","variety"
Although for this simple case we can just use .read_text()
method:
In [1]: Path.cwd().joinpath('iris.csv').read_text()[0:500]
Out[1]: '"sepal.length","sepal.width","petal.length","petal.width","variety"\n5.1,3.5,1.4,.2,"Setosa"\n4.9,3,1.4,.2,"Setosa"\n4.7,3.2,1.3,.2,"Setosa"\n4.6,3.1,1.5,.2,"Setosa"\n5,3.6,1.4,.2,"Setosa"\n5.4,3.9,1.7,.4,"Setosa"\n4.6,3.4,1.4,.3,"Setosa"\n5,3.4,1.5,.2,"Setosa"\n4.4,2.9,1.4,.2,"Setosa"\n4.9,3.1,1.5,.1,"Setosa"\n5.4,3.7,1.5,.2,"Setosa"\n4.8,3.4,1.6,.2,"Setosa"\n4.8,3,1.4,.1,"Setosa"\n4.3,3,1.1,.1,"Setosa"\n5.8,4,1.2,.2,"Setosa"\n5.7,4.4,1.5,.4,"Setosa"\n5.4,3.9,1.3,.4,"Setosa"\n5.1,3.5,1.4,.3,"Setosa"\n5.7,3.8,1.7,'
If we need to ever convert paths to strings we can use the .as_posix()
attribute.
In [1]: Path.cwd().joinpath('iris.csv').as_posix()
Out[1]: '/Users/Daniel/Desktop/iris.csv'
Overall, pathlib
use of the Path
class feels a more intuitive way of working with the filesystem.