Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix#178 #181

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Change Log

## [unreleased]
* added fn 'map-column->columns' ([#178])(https://github.com/scicloj/tablecloth/issues/178)

## [7.029]

### Added
Expand Down
4,885 changes: 2,419 additions & 2,466 deletions docs/index.html

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions notebooks/index.clj
Original file line number Diff line number Diff line change
Expand Up @@ -3084,6 +3084,24 @@ and the other way around:
(tc/columns->array-column [0 1] :y))


(md "

#### Map column conversion

A dataset can have as well columns of type seq of maps.


We can convert them to separate columns (one new column per key) where
ev. missing keys will be filled with nil.


The new column names will be formed as 'oldName-key'
")

(->
(tc/dataset {:m [{:a 1 :b 2} {:a 3 :b 4} {:a 5}]})
(tc/map-column->columns :m))


(md "

Expand Down
5 changes: 5 additions & 0 deletions src/tablecloth/api.clj
Original file line number Diff line number Diff line change
Expand Up @@ -1504,6 +1504,11 @@ column-names function returns names according to columns-selector
(tablecloth.api.operators/magnitude-squared ds columns-selector)))


(defn map-column->columns
([ds src-col]
(tablecloth.api.join-separate/map-column->columns ds src-col)))


(defn map-columns
"Map over rows using a map function. The arity should match the columns selected."
([ds column-name map-fn]
Expand Down
4 changes: 3 additions & 1 deletion src/tablecloth/api/api_template.clj
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,9 @@
join-columns
separate-column
array-column->columns
columns->array-column)
columns->array-column
map-column->columns
)

(exporter/export-symbols tablecloth.api.fold-unroll
fold-by
Expand Down
65 changes: 63 additions & 2 deletions src/tablecloth/api/join_separate.clj
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
(:refer-clojure :exclude [pmap])
(:require [tech.v3.dataset :as ds]
[tech.v3.dataset.column :as col]

[tech.v3.tensor :as tens]
[tech.v3.datatype :as dtt]
[clojure.string :as str]
[tech.v3.parallel.for :refer [pmap]]
[tech.v3.dataset.tensor]
[tablecloth.api.utils :refer [iterable-sequence? column-names grouped? process-group-data ->str]]
[tablecloth.api.columns :refer [select-columns drop-columns add-column]]))
[tablecloth.api.dataset :as tc-dataset]
[tablecloth.api.utils :refer [iterable-sequence? column-names grouped? process-group-data ->str ]]
[tablecloth.api.columns :refer [select-columns drop-columns add-column rename-columns]]))

(defn- process-join-columns
[ds target-column join-function col-names drop-columns?]
Expand Down Expand Up @@ -136,6 +138,65 @@
(keyword with-prefix)
with-prefix)))


(defn- combine-with-dash [arg1 arg2]
(let [to-string (fn [x]
(cond
(string? x) x
(keyword? x) (name x)
(symbol? x) (name x)
:else (str x)))
combined-str (str (to-string arg1) "-" (to-string arg2))]
(cond
(keyword? arg1) (keyword combined-str)
(symbol? arg1) (symbol combined-str)
(string? arg1) combined-str
:else combined-str)))

(defn map-column->columns
"
The map-column->columns function transforms a dataset by expanding a column containing map values into
multiple new columns. Specifically, it takes a source dataset ds and a source column src-col within that dataset (which contains map values), and performs the following operations:

- Extracts the map data from src-col.
- Creates a new dataset from this map data, where each key in the maps becomes a column.
- Generates new column names by combining the name of src-col with each of the original map keys, using a dash (-) as a separator. The type (keyword, symbol, or string) of the new column names matches the type of src-col.
- Appends these new columns to the original dataset ds.
- Removes the original src-col from ds.

The result is a new dataset that includes all original columns (except src-col) and the newly expanded columns derived from the maps in src-col.
Parameters


'ds': The input dataset, expected to be a Tablecloth dataset or any dataset compatible with the functions used.
'src-col': The name (keyword, symbol, or string) of the source column in ds that contains map values.

Return Value

A new dataset with the following characteristics:

Contains all columns from the original dataset ds, except the src-col.
Includes new columns derived from the keys of the maps in src-col, with names formed by combining src-col and the map keys.
The new columns are appropriately named and typed, maintaining the type consistency with src-col.
"
[ds src-col]
(let [columns-ds
(tc-dataset/dataset (get ds src-col))

new-col-names
(map #(combine-with-dash src-col %)
(column-names columns-ds))

renamed-columns-ds
(rename-columns columns-ds
(zipmap
(column-names columns-ds)
new-col-names))]
(->
(ds/append-columns ds (tc-dataset/columns renamed-columns-ds))
(ds/remove-column src-col))))


(defn array-column->columns
"Converts a column of type java array into several columns,
one for each element of the array of all rows. The source column is dropped afterwards.
Expand Down
24 changes: 24 additions & 0 deletions test/tablecloth/api/join_separate_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,27 @@
(api/rows)
(flatten))
=> ["foo-true" "bar-false"])

(fact "map-column->columns work"
(->
(api/dataset {:m [{:a 1 :b 2} {:a 3 :b 4}]
"n" [{:a 10 :b 20} {:a 30 :b 40}]})
(api/map-column->columns :m)
(api/rows :as-maps))
=> [{"n" {:a 10, :b 20}, :m-a 1, :m-b 2} {"n" {:a 30, :b 40}, :m-a 3, :m-b 4}]



(->
(api/dataset {:m [{:a 1 :b 2} {:a 3 :b 4}]
"n" [{:a 10 :b 20} {:a 30 :b 40}]})
(api/map-column->columns "n")
(api/rows :as-maps))
=> [{:m {:a 1, :b 2}, "n-a" 10, "n-b" 20} {:m {:a 3, :b 4}, "n-a" 30, "n-b" 40}]

(->
(api/dataset {:m [{:a 1 :b 2 :d 4} {:a 3 :c "hello"}]})
(api/map-column->columns :m)
(api/rows :as-maps))
=> [{:m-a 1, :m-b 2, :m-d 4, :m-c nil}
{:m-a 3, :m-b nil, :m-d nil, :m-c "hello"}])
Loading