InformationDistances
InformationDistances.AbstractCompressorInformationDistances.ByteDataInformationDistances.CodecCompressorInformationDistances.LibDeflateCompressorInformationDistances.NormalizedCompressionDistanceInformationDistances.compressed_lengthInformationDistances.compressed_lengths
InformationDistances.ByteData — Typeconst ByteData = Union{Vector{UInt8}, Base.CodeUnits{UInt8, <: AbstractString}}Either a Vector of UInt8 or a Base.CodeUnit{UInt8} object. Compressors should be able to compress both of these types.
InformationDistances.AbstractCompressor — TypeAbstractCompressorA compressor interface type that represent string compressors.
Mandatory methods
- compressed_length( <: AbstractCompressor, ::InformationDistances.ByteData)
Optional methods
- compressed_lengths( <: AbstractCompressor, iter)
InformationDistances.CodecCompressor — TypeCodecCompressor{ <: TranscodingStreams.Codec} <: AbstractCompressorA compressor that uses a TranscodingStreams.Codec for compressing.
CodecCompressor{C <: TranscodingStreams.Codec}(;kwargs...)Create a CodecCompressor for the codec C with a additional keyword arguments passed to the constructor of that codec.
Examples
julia> using CodecXz: XzCompressor
julia> CodecCompressor{XzCompressor}(; level=6)
CodecCompressor{XzCompressor}(Base.Iterators.Pairs(:level => 6))InformationDistances.LibDeflateCompressor — TypeLibDeflateCompressor <: AbstractCompressorA compressor that uses a LibDeflate.jl for compressing.
LibDeflateCompressor(;compresslevel=12)Create a LibDeflateCompressor with compression level compresslevel.
Examples
julia> LibDeflateCompressor()
LibDeflateCompressor(12)
julia> LibDeflateCompressor(;compresslevel=8)
LibDeflateCompressor(8)InformationDistances.NormalizedCompressionDistance — TypeNormalizedCompressionDistance{<: AbstractCompressor} <: Distances.PreMetricA normalized compression distance metric between two strings.
The metric is defined by $d(x, y) := \frac{Z(xy) - \min(Z(x), Z(y))} {\max(Z(x), Z(y))}$
where Z(x) is the length when compressing the string x with a certain compression codec.
NormalizedCompressionDistance(, [compressor::AbstractCompressor])Create a NormalizedCompressionDistance.
Arguments
compressorThe compressor to use. If not specified,CodecCompressor{CodecXz.XzCompressor}(;level=9; check=CodecXz.LZMA_CHECK_NONE)is used.
Examples
julia> d1 = NormalizedCompressionDistance()
NormalizedCompressionDistance{CodecCompressor{CodecXz.XzCompressor}}(CodecCompressor{CodecXz.XzCompressor}(Base.Iterators.Pairs{Symbol,Signed,Tuple{Symbol,Symbol},NamedTuple{(:level, :check),Tuple{Int64,Int32}}}(:level => 9,:check => 0)))
julia> d1("hello", "world")
0.07142857142857142
julia> d2 = NormalizedCompressionDistance(LibDeflateCompressor())
NormalizedCompressionDistance{LibDeflateCompressor}(LibDeflateCompressor(12))
julia> d2("hello", "world")
0.5InformationDistances.compressed_length — Methodcompressed_length(compressor, s)The number of resulting bytes when s is compressed with compressor.
When implementing a subtype Compressor <: AbstractCompressor one should implement `compressed_length(compressor::Compressor, s::InformationDistances.ByteData)
Examples
julia> compressed_length(LibDeflateCompressor(), "hello")
10InformationDistances.compressed_lengths — Methodcompressed_lengths(compressor, iter)Calculate for each s in iter the number of resulting bytes when s is compressed with compressor.
Implementing this method for a specific subtype of AbstractCompressor might lead to some performance improvements as some compressors need to allocate some resources before compressing, therefore batch processing might lead to performance improvements as the resources have to be allocated only once.
It is recommended but not necessary to implement this method for a custom subtype Compressor <: AbstractCompressor. The method signature in that case should be compressed_lengths(compressor::Compressor, iter).
As Julia does not allow one to specify the eltype of an iterator, one should make at least sure, that the elements of iter can be of type InformationDistances.ByteData and optionally could also be of type AbstractString.
Examples
julia> compressed_lengths(LibDeflateCompressor(), ["hello", "world", "!"])
3-element Array{Int64,1}:
10
10
6