KBloom is a simple Bloom Filter library for Android developed with Kotlin. It helps you to efficiently check if an element might be in a set without storing all elements. Perfect for memory-constrained mobile devices!
- v1.0.0: Initial release with core Bloom Filter functionalities.
- v1.1.0: Upcoming version with support for generic data types and Counting Bloom Filter.
- ...
- Core Bloom Filter:
- Basic insert(
put
,putAll
) and membership check(mightCotain
,mightContainAll
) - Uses
hashFuction: HashFuction
and a lambdatoBytes: (T) -> ByteArray
to manage any data typeT
- Provides a private constructor plus methods
create(...)
andrestore(...)
- Basic insert(
- Modular Architecture:
hahsing
module: Different hash functions(MurmurHash3,..)loggin
module: Pluggable loggers (ConsoleLogger
,NoOpLogger
)serialization
module: Serialize/deserialize to formats (ByteArraySerializer
,JsonSerializer
,MessagePackSerializer
)configuation
module: Includes a builder pattern for flexiable Bloom Filter setup
- Builder Pattern:
BloomFilterBuilder
allows settingexpectedInsertions
,fpp
,seed
, customhashFunctio
, ortoBytes
- Support stategies: OPTIMAL (automatic
k
calculation) or CUSTOM (manualexpectedInsertions
)
- Error Handling:
- Custom exceptions (
InvalidConfigurationException
,InvalidConfigurationException
) for invalid settings or bad serialized data
- Custom exceptions (
- Performance Enhancements:
- Option to use a
LongArrayBitArray
instead ofIntArray
for higher efficiency - Potential concurrency measures if needed
- Option to use a
- Additional Features:
- Estimate Current Number of Elements : Uses bit-count to approximate how many items the Bloom Filter already contains
- Probability Estimation : Calculates a current false positive rate based on how many bit are set
- **Integration and Utils **:
integration
module: Placeholders or adapters for frameworks(future feature)utils
for helper extension
- Scalable Bloom Filter implementation
- Counting Bloom Filter:
- Allow removal of elements.
- Kotlin Multiplatform:
- Use KBloom on iOS and JVM platforms.
- Advanced Hash Functions:
- Integrate more hash functions for better performance.
-
Add JitPack repository to your project-level
build.gradle
:allprojects { repositories { google() mavenCentral() maven { url 'https://jitpack.io' } } }
-
Add KBloom dependency to your app-level
build.gradle
:dependencies { implementation 'com.github.moclam1905:KBloom:1.2.1' }
val mockLogger = MockLogger() // for testing
val bf = BloomFilterBuilder<String>()
.expectedInsertions(1000)
.falsePositiveProbability(0.01)
.seed(42)
.hashFunction(MurmurHash3)
.toBytes { it.toByteArray(Charsets.UTF_8) }
.logger(mockLogger)
.build()
Note: See more about exaple toBytes
in BloomFilterGenericTypeTest.kt
// Add single elements
bloomFilter.put("apple")
bloomFilter.put("banana")
// Add multiple elements at once
val fruits = listOf("cherry", "date", "elderberry")
bloomFilter.putAll(fruits)
// Check single elements
val containsApple = bloomFilter.mightContain("apple") // true
val containsGrape = bloomFilter.mightContain("grape") // false (or true with low probability)
// Check multiple elements at once
val checkFruits = listOf("apple", "banana", "cherry")
val allContain = bloomFilter.mightContainAll(checkFruits) // true if all are present
val serializedJson = bf.serialize(SerializationFormat.JSON)
val bfDeserialized = BloomFilter.deserialize<String>(
byteArray = serializedJson,
format = SerializationFormat.JSON,
hashFunction = MurmurHash3,
toBytes = { it.toByteArray(Charsets.UTF_8) }
)
enum class SerializationFormat {
BYTE_ARRAY,
JSON,
MESSAGEPACK
// maybe support more serialization format in future
}
-testEstimateCurrentNumberOfElements
val elementsToAdd = listOf("apple", "banana", "cherry", "date", "elderberry")
elementsToAdd.forEach { bloomFilter.put(it) }
val estimatedElements = bloomFilter.estimateCurrentNumberOfElements()
// n ≈ -(m/k) * ln(1 - x/m)
val actualElements = elementsToAdd.size
// allowed tolerance is 2 elements
val tolerance = 2.0
// Check the difference between estimated and actual elements
assertTrue(
"Estimated elements ($estimatedElements) should be within $tolerance of actual elements ($actualElements)",
kotlin.math.abs(estimatedElements - actualElements) <= tolerance
)
testEstimateFalsePositiveRate
val elementsToAdd = (1..500).map { "element_$it" }
elementsToAdd.forEach { bloomFilter.put(it) }
val estimatedFpp = bloomFilter.estimateFalsePositiveRate()
// fpp current should be close to the expected fpp
// use n = 500
val expectedFpp =
(1 - exp(-bloomFilter.getNumHashFunctions() * 500.0 / bloomFilter.getBitSetSize())).pow(
bloomFilter.getNumHashFunctions()
)
val tolerance = 0.001 // 0.1%
assertEquals(
"Estimated false positive rate ($estimatedFpp) should be approximately $expectedFpp within tolerance $tolerance",
expectedFpp,
estimatedFpp,
tolerance
)