BalancedDiscStorage

This module is used to provide storage for your files / archives. Storage itself makes sure, that there is never more files on one directory, than BalancedDiscStorage.dir_limit.

BalancedDiscStorage accepts single files, BalancedDiscStorageZ whole directories packed using ZIP.

This class is necessary, because a lot of filesystems have problems with tens of thousands / milions files stored in one directory. This module stores the files in trees, which are similar to binary trees, but our trees should never change, once created. You can thus reference the returned paths in other software.

Usage example

Lets say, that we have some directory dedicated as file storage, for example /tmp/xex. Lets also say, that we want maximally two files in one directory.

>>> from BalancedDiscStorage import BalancedDiscStorage
>>> bds = BalancedDiscStorage("/tmp/xex", dir_limit=2)
>>> bds
BalancedDiscStorage(path='/tmp/xex', dir_limit=2)

We can now add the files. I have found two files, which hash starts with the letter a. They are string 38 (hash aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470) and 318 (hash aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965):

from StringIO import StringIO  # we will use "fake" files

>>> bds.add_file(StringIO("38"))
/tmp/xex/a/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2
>>> bds.add_file(StringIO("318"))
/tmp/xex/a/aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3

Now, lets look at the state of the filesystem:

$ tree /tmp/xex
/tmp/xex
└── a
    ├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
    └── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2

1 directory, 2 files

As we can see, there are now two files in directory a/. Thats correct, because we set the directory limit to 2. What will now happen, when we will add another file? Lets add „file“ `391 (hash a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9):

>>> p = bds.add_file(StringIO("391"))
>>> p
/tmp/xex/a/9/a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3

As you can see from the example, file is now stored in subdirectory 9:

$ tree /tmp/xex
/tmp/xex
└── a
    ├── 9
    │   └── a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3
    ├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
    └── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2

2 directories, 3 files

This is how the BalancedDiscStorage balances the filesystem. Lets now look at the object returned from the add_file() call:

>>> type(p)
<class 'BalancedDiscStorage.path_and_hash.PathAndHash'>
>>> p.path
'/tmp/xex/a/9/a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3'
>>> p.hash
'a934c244755c66aebb0d6f9f5687038ffae8f00b00b28b4e17521016393f38b9_3'

Notice the hash property, which may be used to delete (delete_by_hash()) the file:

>>> bds.delete_by_hash(p.hash)

As you can see, the file and also the directory was removed:

$ tree /tmp/xex
/tmp/xex
└── a
    ├── aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3
    └── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2

1 directory, 2 files

You can of course delete the file also by full path (delete_by_path()):

>>> bds.delete_by_path("/tmp/xex/a/aae02129362d611717b6c00ad8d73bf820a0f6d88fca8e515cafe78d3a335965_3")
$ tree /tmp/xex
/tmp/xex
└── a
    └── aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470_2

1 directory, 1 file

Or by original file object:

>>> bds.delete_by_file(StringIO("38"))
$ tree /tmp/xex
/tmp/xex

0 directories, 0 files

Installation

Module is hosted at PYPI, and can be easily installed using PIP:

sudo pip install BalancedDiscStorage

Source code

Project is released under the MIT license. Source code can be found at GitHub:

Unittests

Almost every feature of the project is tested by unittests. You can run those tests using provided run_tests.sh script, which can be found in the root of the project.

If you have any trouble, just add --pdb switch at the end of your run_tests.sh command like this: ./run_tests.sh --pdb. This will drop you to PDB shell.

Example

./run_tests.sh
============================= test session starts ==============================
platform linux2 -- Python 2.7.6 -- py-1.4.30 -- pytest-2.7.2
rootdir: /home/bystrousak/Plocha/Dropbox/c0d3z/prace/BalancedDiscStorage/tests, inifile:
plugins: cov
collected 22 items

tests/test_a_path_and_hash.py ..
tests/test_balanced_disc_storage.py ..............
tests/test_balanced_disc_storagez.py ......

========================== 22 passed in 0.04 seconds ===========================

Indices and tables