Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Marko's API #2

Merged
merged 35 commits into from
Sep 7, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
41cdab5
Add bed serialization
GabrielSimonetto Jul 3, 2022
17a7479
Add bed serialization test
GabrielSimonetto Jul 3, 2022
6b7c622
Add bed from_reader deserialization
GabrielSimonetto Jul 3, 2022
8500ffb
Add basic deserializer (directly from serde docs)
GabrielSimonetto Jul 6, 2022
a83fcbd
Add Deserialization for bed::Record<3>
GabrielSimonetto Jul 7, 2022
d9a5ddc
Add de tests and vec tests
GabrielSimonetto Jul 8, 2022
14960da
Fail from_reader if any record is rejected
GabrielSimonetto Jul 12, 2022
d84c9e5
Introduce test_bed_deserialization_with_whitespaces (currently fails)
GabrielSimonetto Jul 12, 2022
fb52d40
Change expect's to unwrap's
GabrielSimonetto Jul 12, 2022
2ea2769
Fix comments
GabrielSimonetto Jul 13, 2022
114abb1
Remove unwraps and map to custom error
GabrielSimonetto Jul 18, 2022
c900009
Unnest impl blocks
GabrielSimonetto Jul 18, 2022
afb9130
ERASEME: showcasing derive for Deserialize
GabrielSimonetto Jul 19, 2022
07bd3d0
Revert "ERASEME: showcasing derive for Deserialize"
GabrielSimonetto Jul 19, 2022
b5e959b
Example of bed serialization and deserialization.
mmalenic Jul 25, 2022
f5a0dbd
Update example to use noodles Record.
mmalenic Jul 25, 2022
9b465d3
Update comments
mmalenic Jul 25, 2022
e0a892e
Update comments
mmalenic Jul 25, 2022
0484112
Implement verbose ser.rs
GabrielSimonetto Aug 14, 2022
bf0ddad
Add serde_with and wrapper to use Display as serialization
GabrielSimonetto Aug 16, 2022
afcaf12
wip: architecture suggestion
GabrielSimonetto Aug 16, 2022
3877beb
Using a custom Enum to allow all bed record types on AuxiliarBedRecor…
GabrielSimonetto Aug 17, 2022
de846c3
Revert "Using a custom Enum to allow all bed record types on Auxiliar…
GabrielSimonetto Aug 18, 2022
17fe14d
wip: failed attempt to force Display for Record<N>
GabrielSimonetto Aug 18, 2022
46b4400
Revert "wip: failed attempt to force Display for Record<N>"
GabrielSimonetto Aug 18, 2022
658ebed
Generalize for Record<N> using trait stacking
GabrielSimonetto Aug 18, 2022
5afcb5f
wip: loop on deserializer test
GabrielSimonetto Aug 25, 2022
d8edca5
Change wrapper to be newtype_struct, deserializaton works
GabrielSimonetto Aug 29, 2022
eeb9f52
Adapt deserializer for sequences
GabrielSimonetto Aug 30, 2022
d3cc40b
Add test Record<4> deserialization
GabrielSimonetto Aug 30, 2022
530aec3
Cleanup
GabrielSimonetto Aug 30, 2022
80f6b38
Add all serde Record<N> tests
GabrielSimonetto Aug 30, 2022
167c8ab
Clean code:
GabrielSimonetto Sep 1, 2022
9975017
Adapt ser.rs to better showcase the inner workings of the code
GabrielSimonetto Sep 1, 2022
ce6fb3c
Apply change requests
GabrielSimonetto Sep 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion noodles-bed/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,6 @@ documentation = "https://docs.rs/noodles-bed"

[dependencies]
noodles-core = { path = "../noodles-core", version = "0.7.0" }
serde = { version = "1" }
serde = { version = "1" }
serde_json = "1.0"
serde_with = "2.0.0"
375 changes: 375 additions & 0 deletions noodles-bed/src/de.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,375 @@
// use crate::de::RecordState::ExpectingChrom;
use crate::record::{AuxiliarBedRecordWrapper, BedN};
use crate::{error, Reader, Record};
use error::{Error, Result};
use serde::de::{DeserializeSeed, MapAccess, SeqAccess, Visitor};
use serde::{de, forward_to_deserialize_any, Deserialize, Deserializer};
use std::io;
use std::iter::Peekable;

pub fn from_bytes<'a, T>(records: &'a [u8]) -> Result<T>
where
T: Deserialize<'a>,
{
todo!()
}

// // I will use this auxiliar later to emulate whats going on in the boilerplate on test
// pub fn record_from_string<T>(record: T) -> Result<String>
// where
// T: BedN<3> + std::str::FromStr + std::fmt::Display,
// <T as std::str::FromStr>::Err: std::fmt::Display,
// {
// let abrw = AuxiliarBedRecordWrapper { record };
// from_str(&abrw)
// }

pub fn from_str<'a, T>(s: &'a str) -> Result<T>
where
T: Deserialize<'a>,
{
let mut deserializer = RecordDeserializer::from_str(s);
GabrielSimonetto marked this conversation as resolved.
Show resolved Hide resolved
let t = T::deserialize(&mut deserializer)?;
if deserializer.input.is_empty() {
Ok(t)
} else {
panic!()
GabrielSimonetto marked this conversation as resolved.
Show resolved Hide resolved
}
}

pub struct RecordDeserializer<'de> {
input: &'de str,
}

impl<'de> RecordDeserializer<'de> {
pub fn from_str(input: &'de str) -> Self {
GabrielSimonetto marked this conversation as resolved.
Show resolved Hide resolved
RecordDeserializer { input }
}

// It seems I am implementing things i wasn't expecting to implement.
fn next_field(&mut self) -> Result<&'de str> {
// todo!()
Ok(self.input)
// match self.it.next() {
// Some(field) => {
// self.field += 1;
// Ok(field)
// }
// None => Err(DeserializeError {
// field: None,
// kind: DEK::UnexpectedEndOfRow,
// }),
// }
}
}

impl<'de, 'a> de::Deserializer<'de> for &'a mut RecordDeserializer<'de> {
type Error = Error;

fn deserialize_any<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
unimplemented!()
}

/// We will likely need to handle deserializing a sequence.
fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
visitor.visit_seq(self)
}

/// We also need to deserialize a struct in a similar way, as the BED
/// format doesn't have a concept of a struct. It just processes records
/// line by line similar to CSV, where fields are separated by some separator.
fn deserialize_struct<V>(
self,
name: &'static str,
fields: &'static [&'static str],
visitor: V,
) -> Result<V::Value>
where
V: Visitor<'de>,
{
visitor.visit_map(self)
}

fn deserialize_map<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
dbg!("map");
visitor.visit_map(self)
}

// fn deserialize_string<V>(self, visitor: V) -> Result<V::Value>
// where
// V: Visitor<'de>,
// {
// dbg!("string");
// visitor.visit_str(self.input)
// }

fn deserialize_string<V: Visitor<'de>>(self, visitor: V) -> Result<V::Value> {
dbg!("string");
self.next_field().and_then(|f| visitor.visit_str(f.into()))
}

fn deserialize_identifier<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
dbg!("identifier");
self.deserialize_string(visitor)
}

fn deserialize_ignored_any<V>(self, visitor: V) -> Result<V::Value>
where
V: Visitor<'de>,
{
dbg!(self.input);
dbg!("deserialize_ignored_any");
drop(self.input);
// panic!(); // uncomment to stop loop
visitor.visit_unit()
}

// /// We can probably use forward_to_deserialize_any for many types, as bed Records are self-describing.
// forward_to_deserialize_any! {
// bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
// bytes byte_buf option unit unit_struct newtype_struct tuple
// tuple_struct map enum identifier ignored_any
// }

forward_to_deserialize_any! {
bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str
bytes byte_buf option unit unit_struct newtype_struct tuple
tuple_struct enum
}
}

/// SeqAccess will be needed to deserialize sequences and structs.
impl<'de> SeqAccess<'de> for RecordDeserializer<'de> {
type Error = Error;

fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>>
where
T: DeserializeSeed<'de>,
{
seed.deserialize(&mut *self).map(Some)
}

// fn next_element<T>(&mut self) -> Result<Option<T>>
// where
// T: Deserialize<'de>,
// {
// self.next_element_seed(std::marker::PhantomData)
// }

fn size_hint(&self) -> Option<usize> {
None
}
}

impl<'de, 'a> MapAccess<'de> for RecordDeserializer<'de> {
type Error = Error;

fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>>
where
K: DeserializeSeed<'de>,
{
// // Check if there are no more entries.
// if self.de.peek_char()? == '}' {
// return Ok(None);
// }
// // Comma is required before every entry except the first.
// if !self.first && self.de.next_char()? != ',' {
// return Err(Error::ExpectedMapComma);
// }
// self.first = false;
// // Deserialize a map key.
// seed.deserialize(&mut *self.de).map(Some)
seed.deserialize(&mut *self).map(Some)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be getting stuck here, as this function is called over and over again, without returning Ok(None). next_key_seed should return Ok(None) when there are no more entries left.

}

fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value>
where
V: DeserializeSeed<'de>,
{
// It doesn't make a difference whether the colon is parsed at the end
// of `next_key_seed` or at the beginning of `next_value_seed`. In this
// case the code is a bit simpler having it here.
// if self.de.next_char()? != ':' {
// return Err(Error::ExpectedMapColon);
// }
// Deserialize a map value.
seed.deserialize(&mut *self)
}
}

// /// A user would likely use a function such as the following to deserialize a set of records.
// pub fn from_bytes<'a, T>(records: &'a [u8]) -> Result<T>
// where
// T: Deserialize<'a>,
// {
// let mut reader = Reader::new(records);
// let records = reader.records();
// let mut deserializer = Record3Deserializer::new(records);
// T::deserialize(&mut deserializer)
// }

// // I believe this function shouldn't be user facing
// fn from_string<'a, T>(value: String) -> Result<T>
// where
// T: Deserialize<'a>,
// {
// // TODO: How to generalize Bed<N>
// let mut serializer = Record3Deserializer {
// input: String::new(),
// };
// value.deserialize(&mut serializer)?;
// Ok(serializer.output)
// }

// pub fn record_from_string<T>(record: String) -> Result<T>
// where
// T: BedN<3> + std::str::FromStr + std::fmt::Display,
// <T as std::str::FromStr>::Err: std::fmt::Display,
// {
// let abrw = AuxiliarBedRecordWrapper { record };
// from_string(&abrw)
// }

// /// This leeps track of the state associated with the next record that is parsed.
// /// We need to keep track of the state as records are returned whole from the iterator, whereas
// /// the deserializer progresses one field at a time.
// enum RecordState {
// ExpectingChrom,
// ExpectingChromStart(Record<3>),
// ExpectingChromEnd(Record<3>),
// }

// /// The Record Deserializer. This struct implements Deserializer and associated Traits. It stores
// /// an iterator over the records that are given to it.
// pub struct Record3Deserializer<I>
// where
// I: Iterator<Item = io::Result<Record<3>>>,
// {
// records: Peekable<I>,
// state: RecordState,
// }

// impl<I> Record3Deserializer<I>
// where
// I: Iterator<Item = io::Result<Record<3>>>,
// {
// fn new(records: I) -> Self {
// Self {
// records: records.peekable(),
// state: ExpectingChrom,
// }
// }
// }

// impl<'de, 'a, I> Deserializer<'de> for &'a mut Record3Deserializer<I>
// where
// I: Iterator<Item = io::Result<Record<3>>>,
// {
// type Error = Error;

// /// This function could handle deserializing an individual record, matching the deserializer's
// /// RecordState, and progressing it to the next state.
// fn deserialize_any<V>(self, visitor: V) -> Result<V::Value>
// where
// V: Visitor<'de>,
// {
// todo!()
// // match self.state {
// // ExpectingChrom => deserialize_string,
// // ExpectingChromStart(Record<3>) => ,
// // ExpectingChromEnd(Record<3>) => ,
// // }
// }

// // fn deserialize_string<V: Visitor<'de>>(self, visitor: V) -> Result<V::Value> {
// // self.next_field().and_then(|f| visitor.visit_str(f.into()))
// // }

// /// We will likely need to handle deserializing a sequence.
// fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value>
// where
// V: Visitor<'de>,
// {
// visitor.visit_seq(self)
// }

// /// We also need to deserialize a struct in a similar way, as the BED
// /// format doesn't have a concept of a struct. It just processes records
// /// line by line similar to CSV, where fields are separated by some separator.
// fn deserialize_struct<V>(
// self,
// name: &'static str,
// fields: &'static [&'static str],
// visitor: V,
// ) -> Result<V::Value>
// where
// V: Visitor<'de>,
// {
// visitor.visit_seq(self)
// }

// /// We can probably use forward_to_deserialize_any for many types, as bed Records are self-describing.
// // forward_to_deserialize_any! {
// // bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str
// // bytes byte_buf option unit unit_struct newtype_struct tuple
// // tuple_struct map enum identifier ignored_any
// // }

// forward_to_deserialize_any! {
// bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
// bytes byte_buf option unit unit_struct newtype_struct tuple
// tuple_struct map enum identifier ignored_any
// }
// }

// /// SeqAccess will be needed to deserialize sequences and structs.
// impl<'de, I> SeqAccess<'de> for Record3Deserializer<I>
// where
// I: Iterator<Item = io::Result<Record<3>>>,
// {
// type Error = Error;

// fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>>
// where
// T: DeserializeSeed<'de>,
// {
// todo!()
// }
// }

#[cfg(test)]
mod serde_tests {
use crate::{record::Name, Record};

use super::*;

#[test]
fn test_from_string_single_auxiliar_bed_record_wrapper() {
let input = "sq0\t7\t13\n";

let record = Record::<3>::builder()
.set_reference_sequence_name("sq0")
.set_start_position(noodles_core::Position::try_from(8).unwrap())
.set_end_position(noodles_core::Position::try_from(13).unwrap())
.build()
.unwrap();

// let expected = AuxiliarBedRecordWrapper { record };
let result: AuxiliarBedRecordWrapper<Record<3>> = from_str(input).unwrap();
let result_2 = result.record;

assert_eq!(result_2, record);
}
}
Loading