Class ByteArrayFrontCodedBigList
- java.lang.Object
-
- java.util.AbstractCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectBigList<byte[]>
-
- it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedBigList
-
- All Implemented Interfaces:
BigList<byte[]>
,ObjectBigList<byte[]>
,ObjectCollection<byte[]>
,ObjectIterable<byte[]>
,Size64
,Stack<byte[]>
,Serializable
,Cloneable
,Comparable<BigList<? extends byte[]>>
,Iterable<byte[]>
,Collection<byte[]>
,RandomAccess
public class ByteArrayFrontCodedBigList extends AbstractObjectBigList<byte[]> implements Serializable, Cloneable, RandomAccess
Compact storage of big lists of arrays using front coding.This class stores immutably a big list of arrays in a single big array using front coding (of course, the compression will be reasonable only if the list is sorted lexicographically—see below). It implements an immutable type-specific list that returns the i-th array when calling
get(i)
. The returned array may be freely modified.Front coding is based on the idea that if the i-th and the (i+1)-th array have a common prefix, we might store the length of the common prefix, and then the rest of the second array.
This approach, of course, requires that once in a while an array is stored entirely. The ratio of a front-coded list defines how often this happens (once every
ratio()
arrays). A higher ratio means more compression, but means also a longer access time, as more arrays have to be probed to build the result. Note that we must build an array every timeget(long)
is called, but this class provides also methods that extract one of the stored arrays in a given array, reducing garbage collection. See the documentation of the family ofget()
methods.By setting the ratio to 1 we actually disable front coding: however, we still have a data structure storing large list of arrays with a reduced overhead (just one integer per array, plus the space required for lengths).
Note that the typical usage of front-coded lists is under the form of serialized objects; usually, the data that has to be compacted is processed offline, and the resulting structure is stored permanently. Since the pointer array is not stored, the serialized format is very small.
Implementation Details
All arrays are stored in a big array. A separate array of pointers indexes arrays whose position is a multiple of the ratio: thus, a higher ratio means also less pointers.
More in detail, an array whose position is a multiple of the ratio is stored as the array length, followed by the elements of the array. The array length is coded by a simple variable-length list of k-1 bit blocks, where k is the number of bits of the underlying primitive type. All other arrays are stored as follows: let
common
the length of the maximum common prefix between the array and its predecessor. Then we store the array length decremented bycommon
, followed bycommon
, followed by the array elements whose index is greater than or equal tocommon
. For instance, if we storefoo
,foobar
,football
andfool
in a front-coded character-array list with ratio 3, the character array will contain3 f o o 3 3 b a r 5 3 t b a l l 4 f o o l
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectBigList
AbstractObjectBigList.ObjectSubList<K>
-
-
Constructor Summary
Constructors Constructor Description ByteArrayFrontCodedBigList(Collection<byte[]> c, int ratio)
Creates a new front-coded list containing the arrays in the given collection.ByteArrayFrontCodedBigList(Iterator<byte[]> arrays, int ratio)
Creates a new front-coded list containing the arrays returned by the given iterator.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
arrayLength(long index)
Computes the length of the array at the given index.ByteArrayFrontCodedBigList
clone()
Returns a copy of this list.byte[]
get(long index)
Returns the element at the specified position.int
get(long index, byte[] a)
Stores in the given array an array stored in this front-coded list.int
get(long index, byte[] a, int offset, int length)
Stores in the given array elements from an array stored in this front-coded list.byte[]
getArray(long index)
Returns an array stored in this front-coded list.ObjectBigListIterator<byte[]>
listIterator(long start)
Returns a type-specific list iterator on this type-specific big list starting at a given index.int
ratio()
long
size64()
Returns the size of this data structure as a long.String
toString()
-
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectBigList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, size, size, subList, top
-
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray
-
Methods inherited from interface java.util.Collection
containsAll, isEmpty, parallelStream, remove, removeAll, removeIf, retainAll, spliterator, stream, toArray, toArray, toArray
-
-
-
-
Constructor Detail
-
ByteArrayFrontCodedBigList
public ByteArrayFrontCodedBigList(Iterator<byte[]> arrays, int ratio)
Creates a new front-coded list containing the arrays returned by the given iterator.- Parameters:
arrays
- an iterator returning arrays.ratio
- the desired ratio.
-
ByteArrayFrontCodedBigList
public ByteArrayFrontCodedBigList(Collection<byte[]> c, int ratio)
Creates a new front-coded list containing the arrays in the given collection.- Parameters:
c
- a collection containing arrays.ratio
- the desired ratio.
-
-
Method Detail
-
ratio
public int ratio()
-
arrayLength
public int arrayLength(long index)
Computes the length of the array at the given index.- Parameters:
index
- an index.- Returns:
- the length of the
index
-th array.
-
get
public byte[] get(long index)
Returns the element at the specified position.This implementation delegates to
getArray(long)
.- Specified by:
get
in interfaceBigList<byte[]>
- Parameters:
index
- a position in the big list.- Returns:
- the element at the specified position.
- See Also:
List.get(int)
-
getArray
public byte[] getArray(long index)
Returns an array stored in this front-coded list.- Parameters:
index
- an index.- Returns:
- the corresponding array stored in this front-coded list.
-
get
public int get(long index, byte[] a, int offset, int length)
Stores in the given array elements from an array stored in this front-coded list.- Parameters:
index
- an index.a
- the array that will store the result.offset
- an offset intoa
where elements will be store.length
- a maximum number of elements to store ina
.- Returns:
- if
a
can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.
-
get
public int get(long index, byte[] a)
Stores in the given array an array stored in this front-coded list.- Parameters:
index
- an index.a
- the array that will store the content of the result (we assume that it can hold the result).- Returns:
- if
a
can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.
-
size64
public long size64()
Description copied from interface:Size64
Returns the size of this data structure as a long.
-
listIterator
public ObjectBigListIterator<byte[]> listIterator(long start)
Description copied from class:AbstractObjectBigList
Returns a type-specific list iterator on this type-specific big list starting at a given index.Note that this specification strengthens the one given in
BigList.listIterator(long)
.This implementation is based on the random-access methods.
- Specified by:
listIterator
in interfaceBigList<byte[]>
- Specified by:
listIterator
in interfaceObjectBigList<byte[]>
- Overrides:
listIterator
in classAbstractObjectBigList<byte[]>
- Parameters:
start
- index of first element to be returned from the big-list iterator.- Returns:
- a big-list iterator of the elements in this big list, starting at the specified position in this big list.
- See Also:
BigList.listIterator(long)
-
clone
public ByteArrayFrontCodedBigList clone()
Returns a copy of this list.- Returns:
- a copy of this list.
-
toString
public String toString()
- Overrides:
toString
in classAbstractObjectBigList<byte[]>
-
-