BalancedDiscStorage¶
This module is used to provide storage for your files / archives. Storage itself makes sure, that there is never more files on one directory, than BalancedDiscStorage.dir_limit
.
BalancedDiscStorage
accepts single files, BalancedDiscStorageZ
whole directories packed using ZIP.
This class is necessary, because a lot of filesystems have problems with tens of thousands / milions files stored in one directory. This module stores the files in trees, which are similar to binary trees, but our trees should never change, once created. You can thus reference the returned paths in other software.
Usage example¶
Lets say, that we have some directory dedicated as file storage, for example /tmp/xex
. Lets also say, that we want maximally two files in one directory.
>>> from BalancedDiscStorage import BalancedDiscStorage
>>> bds = BalancedDiscStorage("/tmp/xex", dir_limit=2)
>>> bds
BalancedDiscStorage(path='/tmp/xex', dir_limit=2)
We can now add the files. I have found two files, which hash starts with the letter a
. They are string 38
(hash aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470
) and 318
(hash aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965
):
from StringIO import StringIO # we will use "fake" files
>>> bds.add_file(StringIO("38"))
/tmp/xex/a/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
>>> bds.add_file(StringIO("318"))
/tmp/xex/a/aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
Now, lets look at the state of the filesystem:
$ tree /tmp/xex
/tmp/xex
└── a
├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
└── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
1 directory, 2 files
As we can see, there are now two files in directory a/
. Thats correct, because we set the directory limit to 2
. What will now happen, when we will add another file? Lets add „file“ `391
(hash a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9
):
>>> p = bds.add_file(StringIO("391"))
>>> p
/tmp/xex/a/9/a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3
As you can see from the example, file is now stored in subdirectory 9
:
$ tree /tmp/xex
/tmp/xex
└── a
├── 9
│ └── a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3
├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
└── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
2 directories, 3 files
This is how the BalancedDiscStorage
balances the filesystem. Lets now look at the object returned from the add_file()
call:
>>> type(p)
<class 'BalancedDiscStorage.path_and_hash.PathAndHash'>
>>> p.path
'/tmp/xex/a/9/a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3'
>>> p.hash
'a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3'
Notice the hash
property, which may be used to delete (delete_by_hash()
) the file:
>>> bds.delete_by_hash(p.hash)
As you can see, the file and also the directory was removed:
$ tree /tmp/xex
/tmp/xex
└── a
├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
└── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
1 directory, 2 files
You can of course delete the file also by full path (delete_by_path()
):
>>> bds.delete_by_path("/tmp/xex/a/aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3")
$ tree /tmp/xex
/tmp/xex
└── a
└── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
1 directory, 1 file
Or by original file object:
>>> bds.delete_by_file(StringIO("38"))
$ tree /tmp/xex
/tmp/xex
0 directories, 0 files
API¶
BalancedDiscStorage class¶
-
class
BalancedDiscStorage.balanced_disc_storage.
BalancedDiscStorage
(path, dir_limit=32000)¶ Bases:
object
Store files, make sure, that there are never more files in one directory than
_dir_limit
.-
path
= None¶ Path on which the storage operates.
-
dir_limit
= None¶ Maximal number of files in directory.
-
read_bs
= None¶ File read blocksize.
-
hash_builder
= None¶ Hashing function used for FN.
-
file_path_from_hash
(file_hash, path=None, hash_list=None)¶ For given file_hash, return path on filesystem.
Parameters: - file_hash (str) – Hash of the file, for which you wish to know the path.
- path (str, default None) – Recursion argument, don’t set this.
- hash_list (list, default None) – Recursion argument, don’t set this.
Returns: Path for given file_hash contained in
PathAndHash
object.Return type: Raises: IOError
– If the file with corresponding file_hash is not in storage.
-
add_file
(file_obj)¶ Add new file into the storage.
Parameters: file_obj (file) – Opened file-like object.
Returns: Path where the file-like object is stored contained with hash in
PathAndHash
object.Return type: obj
Raises: AssertionError
– If the file_obj is not file-like object.IOError
– If the file couldn’t be added to storage.
-
delete_by_file
(file_obj)¶ Remove file from the storage. File is identified by opened file_obj, from which the hashes / path are computed.
Parameters: file_obj (file) – Opened file-like object, which is used to compute hashes. Raises: IOError
– If the file_obj is not in storage.
-
BalancedDiscStorageZ class¶
-
class
BalancedDiscStorage.balanced_disc_storage_z.
BalancedDiscStorageZ
(path)¶ Bases:
BalancedDiscStorage.balanced_disc_storage.BalancedDiscStorage
This class is the same as
BalancedDiscStorage
, but it also allows adding the.zip
files, which are unpacked to proper path in storage.-
max_zipfiles
= None¶ How many files may be in .zip
-
add_archive_as_dir
(zip_file_obj)¶ Add archive to the storage and unpack it.
Parameters: zip_file_obj (file) – Opened file-like object.
Returns: Path where the zip_file_obj was unpacked wrapped in
PathAndHash
structure.Return type: obj
Raises: ValueError
– If there is too many files in .zip archive. See_max_zipfiles
for details.AssertionError
– If the zip_file_obj is not file-like object.
-
PathAndHash class¶
-
class
BalancedDiscStorage.path_and_hash.
PathAndHash
(path, hash=None)¶ Bases:
str
Path representation, which also holds hash.
Note
This class is based on str, with which is fully interchangeable.
str(PathAndHash(path="xe", hash="asd")) == "xe"
-
path
¶ str – Path to the file.
-
hash
¶ str – Hash of the file.
-
Installation¶
Module is hosted at PYPI, and can be easily installed using PIP:
sudo pip install BalancedDiscStorage
Source code¶
Project is released under the MIT license. Source code can be found at GitHub:
Unittests¶
Almost every feature of the project is tested by unittests. You can run those
tests using provided run_tests.sh
script, which can be found in the root
of the project.
If you have any trouble, just add --pdb
switch at the end of your run_tests.sh
command like this: ./run_tests.sh --pdb
. This will drop you to PDB shell.
Example¶
./run_tests.sh
============================= test session starts ==============================
platform linux2 -- Python 2.7.6 -- py-1.4.30 -- pytest-2.7.2
rootdir: /home/bystrousak/Plocha/Dropbox/c0d3z/prace/BalancedDiscStorage/tests, inifile:
plugins: cov
collected 22 items
tests/test_a_path_and_hash.py ..
tests/test_balanced_disc_storage.py ..............
tests/test_balanced_disc_storagez.py ......
========================== 22 passed in 0.04 seconds ===========================