Skip to content

Commit

Permalink
Merge remote-tracking branch 'spc/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
altavir committed Jan 19, 2025
2 parents d80846d + 6551df2 commit a39634f
Show file tree
Hide file tree
Showing 101 changed files with 2,103 additions and 1,228 deletions.
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,34 @@

### Security

## 0.10.0 - 2025-01-19

### Added

- Coroutine exception logging in context
- `ObservableMutableMetaSerializer`
- `MutableMetaView` - a Meta wrapper that creates nodes only when its or its children are changed.

### Changed

- Simplify inheritance logic in `MutableTypedMeta`
- Full rework of `DataTree` and associated interfaces (`DataSource`, `DataSink`, etc.).
- Filter data by type is moved from `dataforge-data` to `dataforge-workspace` to avoid reflection dependency.

### Deprecated

- MetaProvider `spec` is replaced by `readable`. `listOfSpec` replaced with `listOfReadable`

### Removed

- Remove implicit io format resolver in `IOPlugin` and `FileWorkspaceCache`. There are no guarantees that only one format is present in the contrxt for each type.
- Dependencies on `atomicfu` and `kotlin.reflect` from dataforge-data to improve performance.

### Fixed

- Fixed NameToken parsing.
- Top level string list meta conversion.

## 0.9.0 - 2024-06-04

### Added
Expand Down
89 changes: 86 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,70 @@
[![JetBrains Research](https://jb.gg/badges/research.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
[![DOI](https://zenodo.org/badge/148831678.svg)](https://zenodo.org/badge/latestdoi/148831678)

![Gradle build](https://github.com/mipt-npm/dataforge-core/workflows/Gradle%20build/badge.svg)
## Publications

* [A general overview](https://doi.org/10.1051/epjconf/201817705003)
* [An application in "Troitsk nu-mass" experiment](https://doi.org/10.1088/1742-6596/1525/1/012024)

## Video

* [A presentation on application of DataForge (legacy version) to Troitsk nu-mass analysis.](https://youtu.be/OpWzLXUZnLI?si=3qn7EMruOHMJX3Bc)

## Questions and Answers

In this section, we will try to cover DataForge main ideas in the form of questions and answers.

### General

**Q**: I have a lot of data to analyze. The analysis process is complicated, requires a lot of stages, and data flow is not always obvious. Also, the data size is huge, so I don't want to perform operation I don't need (calculate something I won't need or calculate something twice). I need it to be performed in parallel and probably on remote computer. By the way, I am sick and tired of scripts that modify other scripts that control scripts. Could you help me?

**A**: Yes, that is precisely the problem DataForge was made to solve. It allows performing some automated data manipulations with optimization and parallelization. The important thing that data processing recipes are made in the declarative way, so it is quite easy to perform computations on a remote station. Also, DataForge guarantees reproducibility of analysis results.

**Q**: How does it work?

**A**: At the core of DataForge lies the idea of metadata processor. It utilizes the fact that to analyze something you need data itself and some additional information about what does that data represent and what does user want as a result. This additional information is called metadata and could be organized in a regular structure (a tree of values similar to XML or JSON). The important thing is that this distinction leaves no place for user instructions (or scripts). Indeed, the idea of DataForge logic is that one does not need imperative commands. The framework configures itself according to input meta-data and decides what operations should be performed in the most efficient way.

**Q**: But where does it take algorithms to use?

**A**: Of course algorithms must be written somewhere. No magic here. The logic is written in specialized modules. Some modules are provided out of the box at the system core, some need to be developed for a specific problem.

**Q**: So I still need to write the code? What is the difference then?

**A**: Yes, someone still needs to write the code. But not necessary you. Simple operations could be performed using provided core logic. Also, your group can have one programmer writing the logic and all other using it without any real programming expertise. The framework organized in a such way that one writes some additional logic, they do not need to think about complicated thing like parallel computing, resource handling, logging, caching, etc. Most of the things are done by the DataForge.

### Platform

**Q**: Which platform does DataForge use? Which operating system is it working on?

**A**: The DataForge is mostly written in Kotlin-multiplatform and could be used on JVM, JS and native targets. Some modules and functions are supported only on JVM

**Q**: Can I use my C++/Fortran/Python code in DataForge?

**A**: Yes, as long as the code could be called from Java. Most common languages have a bridge for Java access. There are completely no problems with compiled C/Fortran libraries. Python code could be called via one of existing python-java interfaces. It is also planned to implement remote method invocation for common languages, so your Python, or, say, Julia, code could run in its native environment. The metadata processor paradigm makes it much easier to do so.

### Features

**Q**: What other features does DataForge provide?

**A**: Alongside metadata processing (and a lot of tools for metadata manipulation and layering), DataForge has two additional important concepts:

* **Modularisation**. Contrary to lot other frameworks, DataForge is intrinsically modular. The mandatory part is a rather tiny core module. Everything else could be customized.

* **Context encapsulation**. Every DataForge task is executed in some context. The context isolates environment for the task and also works as dependency injection base and specifies interaction of the task with the external world.

### Misc

**Q**: So everything looks great, can I replace my ROOT / other data analysis framework with DataForge?

**A**: One must note that DataForge is made for analysis, not for visualization. The visualization and user interaction capabilities of DataForge are rather limited compared to frameworks like ROOT, JAS3 or DataMelt. The idea is to provide reliable API and core functionality. [VisionForge](https://git.sciprog.center/kscience/visionforge) project aims to provide tools for both 2D and 3D visualization both locally and remotely.

**Q**: How does DataForge compare to cluster computation frameworks like Apache Spark?

**A**: It is not the purpose of DataForge to replace cluster computing software. DataForge has some internal parallelism mechanics and implementations, but they are most certainly worse than specially developed programs. Still, DataForge is not fixed on one single implementation. Your favourite parallel processing tool could be still used as a back-end for the DataForge. With full benefit of configuration tools, integrations and no performance overhead.

**Q**: Is it possible to use DataForge in notebook mode?

**A**: [Kotlin jupyter](https://github.com/Kotlin/kotlin-jupyter) allows using any JVM program in a notebook mode. The dedicated module for DataForge is work in progress.


### [dataforge-context](dataforge-context)
Expand All @@ -14,23 +77,43 @@
> **Maturity**: EXPERIMENTAL
### [dataforge-io](dataforge-io)
> IO module
> Serialization foundation for Meta objects and Envelope processing.
>
> **Maturity**: EXPERIMENTAL
>
> **Features:**
> - [IO format](dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/IOFormat.kt) : A generic API for reading something from binary representation and writing it to Binary.
> - [Binary](dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/Binary.kt) : Multi-read random access binary.
> - [Envelope](dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/Envelope.kt) : API and implementations for combined data and metadata format.
> - [Tagged envelope](dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/TaggedEnvelope.kt) : Implementation for binary-friendly envelope format with machine readable tag and forward size declaration.
> - [Tagged envelope](dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/TaglessEnvelope.kt) : Implementation for text-friendly envelope format with text separators for sections.

### [dataforge-meta](dataforge-meta)
> Meta definition and basic operations on meta
> Core Meta and Name manipulation module
>
> **Maturity**: DEVELOPMENT
>
> **Features:**
> - [Meta](dataforge-meta/src/commonMain/kotlin/space/kscience/dataforge/meta/Meta.kt) : **Meta** is the representation of basic DataForge concept: Metadata, but it also could be called meta-value tree.
> - [Value](dataforge-meta/src/commonMain/kotlin/space/kscience/dataforge/meta/Value.kt) : **Value** a sum type for different meta values.
> - [Name](dataforge-meta/src/commonMain/kotlin/space/kscience/dataforge/names/Name.kt) : **Name** is an identifier to access tree-like structure.

### [dataforge-scripting](dataforge-scripting)
> Scripting definition fow workspace generation
>
> **Maturity**: PROTOTYPE
### [dataforge-workspace](dataforge-workspace)
>
> **Maturity**: EXPERIMENTAL
### [dataforge-io/dataforge-io-proto](dataforge-io/dataforge-io-proto)
> ProtoBuf Meta representation
>
> **Maturity**: PROTOTYPE
### [dataforge-io/dataforge-io-yaml](dataforge-io/dataforge-io-yaml)
> YAML meta converters and Front Matter envelope format
>
Expand Down
10 changes: 8 additions & 2 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ plugins {

allprojects {
group = "space.kscience"
version = "0.9.0"
version = "0.10.0"
}

subprojects {
Expand All @@ -22,6 +22,12 @@ subprojects {
}
}

dependencies{
subprojects.forEach {
dokka(it)
}
}

readme {
readmeTemplate = file("docs/templates/README-TEMPLATE.md")
}
Expand All @@ -32,7 +38,7 @@ ksciencePublish {
useSPCTeam()
}
repository("spc", "https://maven.sciprog.center/kscience")
sonatype("https://oss.sonatype.org")
central()
}

apiValidation {
Expand Down
4 changes: 2 additions & 2 deletions dataforge-context/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Context and provider definitions

## Artifact:

The Maven coordinates of this project are `space.kscience:dataforge-context:0.9.0-dev-1`.
The Maven coordinates of this project are `space.kscience:dataforge-context:0.10.0`.

**Gradle Kotlin DSL:**
```kotlin
Expand All @@ -16,6 +16,6 @@ repositories {
}

dependencies {
implementation("space.kscience:dataforge-context:0.9.0-dev-1")
implementation("space.kscience:dataforge-context:0.10.0")
}
```
1 change: 1 addition & 0 deletions dataforge-context/api/dataforge-context.api
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ public final class space/kscience/dataforge/provider/Path : java/lang/Iterable,

public final class space/kscience/dataforge/provider/Path$Companion {
public final fun parse-X5wN5Vs (Ljava/lang/String;)Ljava/util/List;
public final fun serializer ()Lkotlinx/serialization/KSerializer;
}

public final class space/kscience/dataforge/provider/PathKt {
Expand Down
5 changes: 2 additions & 3 deletions dataforge-context/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,10 @@ kscience {
useSerialization()
commonMain {
api(projects.dataforgeMeta)
api(spclibs.atomicfu)
}
jvmMain{
api(kotlin("reflect"))
api("org.slf4j:slf4j-api:1.7.30")
api(spclibs.kotlin.reflect)
api(spclibs.slf4j)
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package space.kscience.dataforge.context

import kotlinx.coroutines.CoroutineExceptionHandler
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.Job
import kotlinx.coroutines.SupervisorJob
Expand Down Expand Up @@ -67,7 +68,9 @@ public open class Context internal constructor(

override val coroutineContext: CoroutineContext by lazy {
(parent ?: Global).coroutineContext.let { parenContext ->
parenContext + coroutineContext + SupervisorJob(parenContext[Job])
parenContext + coroutineContext + SupervisorJob(parenContext[Job]) + CoroutineExceptionHandler { _, throwable ->
logger.error(throwable) { "Exception in context $name" }
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,37 @@
*/
package space.kscience.dataforge.provider

import kotlinx.serialization.KSerializer
import kotlinx.serialization.Serializable
import kotlinx.serialization.builtins.serializer
import kotlinx.serialization.descriptors.SerialDescriptor
import kotlinx.serialization.encoding.Decoder
import kotlinx.serialization.encoding.Encoder
import space.kscience.dataforge.names.Name
import space.kscience.dataforge.names.parseAsName
import kotlin.jvm.JvmInline

private object PathSerializer : KSerializer<Path> {

override val descriptor: SerialDescriptor
get() = String.serializer().descriptor

override fun serialize(encoder: Encoder, value: Path) {
encoder.encodeString(value.toString())
}

override fun deserialize(decoder: Decoder): Path {
return Path.parse(decoder.decodeString())
}
}


/**
* Path interface.
*
*/
@JvmInline
@Serializable(PathSerializer::class)
public value class Path(public val tokens: List<PathToken>) : Iterable<PathToken> {

override fun iterator(): Iterator<PathToken> = tokens.iterator()
Expand All @@ -33,6 +55,7 @@ public value class Path(public val tokens: List<PathToken>) : Iterable<PathToken
public companion object {
public const val PATH_SEGMENT_SEPARATOR: String = "/"


public fun parse(path: String): Path = Path(path.split(PATH_SEGMENT_SEPARATOR).map { PathToken.parse(it) })
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ package space.kscience.dataforge.context

import java.util.*
import kotlin.reflect.KClass
import kotlin.reflect.full.cast
import kotlin.reflect.cast

public class ClassLoaderPlugin(private val classLoader: ClassLoader) : AbstractPlugin() {
override val tag: PluginTag = PluginTag("classLoader", PluginTag.DATAFORGE_GROUP)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,30 @@ import space.kscience.dataforge.misc.DfType
import space.kscience.dataforge.misc.Named
import space.kscience.dataforge.names.Name
import kotlin.reflect.KClass
import kotlin.reflect.KType
import kotlin.reflect.full.findAnnotation

import kotlin.reflect.typeOf

@DFExperimental
public val KClass<*>.dfType: String
get() = findAnnotation<DfType>()?.id ?: simpleName ?: ""

@DFExperimental
public val KType.dfType: String
get() = findAnnotation<DfType>()?.id ?: (classifier as? KClass<*>)?.simpleName ?: ""

/**
* Provide an object with given name inferring target from its type using [DfType] annotation
*/
@DFExperimental
public inline fun <reified T : Any> Provider.provideByType(name: String): T? {
val target = T::class.dfType
val target = typeOf<T>().dfType
return provide(target, name)
}

@DFExperimental
public inline fun <reified T : Any> Provider.top(): Map<Name, T> {
val target = T::class.dfType
val target = typeOf<T>().dfType
return top(target)
}

Expand All @@ -35,15 +40,15 @@ public inline fun <reified T : Any> Provider.top(): Map<Name, T> {
*/
@DFExperimental
public inline fun <reified T : Any> Context.gather(inherit: Boolean = true): Map<Name, T> =
gather<T>(T::class.dfType, inherit)
gather<T>(typeOf<T>().dfType, inherit)


@DFExperimental
public inline fun <reified T : Any> PluginBuilder.provides(items: Map<Name, T>) {
provides(T::class.dfType, items)
provides(typeOf<T>().dfType, items)
}

@DFExperimental
public inline fun <reified T : Any> PluginBuilder.provides(vararg items: Named) {
provides(T::class.dfType, *items)
provides(typeOf<T>().dfType, *items)
}
4 changes: 2 additions & 2 deletions dataforge-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## Artifact:

The Maven coordinates of this project are `space.kscience:dataforge-data:0.9.0-dev-1`.
The Maven coordinates of this project are `space.kscience:dataforge-data:0.10.0`.

**Gradle Kotlin DSL:**
```kotlin
Expand All @@ -16,6 +16,6 @@ repositories {
}

dependencies {
implementation("space.kscience:dataforge-data:0.9.0-dev-1")
implementation("space.kscience:dataforge-data:0.10.0")
}
```
3 changes: 0 additions & 3 deletions dataforge-data/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@ kscience{
wasm()
useCoroutines()
dependencies {
api(spclibs.atomicfu)
api(projects.dataforgeMeta)
//Remove after subtype moved to stdlib
api(kotlin("reflect"))
}
}

Expand Down
Loading

0 comments on commit a39634f

Please sign in to comment.