-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
113 lines (82 loc) · 2.87 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
Proof of Concept DNA storage
======================
SNA is a DNA digital storage encoder/decoder for base2 (binary) to base4 (DNA)
4ROLL method is for encoding Digital information in Biological format
Use this format for encoding, and 'roll' through a table to change the binary to DNA order
00 = A -> 00 = C ...
01 = C -> 01 = G ...
10 = G -> 10 = T ...
11 = T -> 11 = A ...
1byte = 4bases
Uses hardcoded values. Focus of code is a PoC, not usability or arguments.
(Willing to increase functionality if proven useful)
4ROLL.c uses a hardcode file /tmp/test to encode data
data is printed to stdout
4ROLLdec.c uses a hardcoded file /tmp/test.dec to decode data
data is automatically written to /tmp/output
/tmp/output will be overwritten
gentoo live iso pulled from https://gentoo.org/downloads
MLK mp3 pulled from EBI https://www.ebi.ac.uk/
https://www.ebi.ac.uk/goldman-srv/DNA-storage/orig_files/
===========
COMPILE
===========
gcc -o 4ROLL 4ROLL.c
gcc -o 4ROLLdec 4ROLLdec.c
===========
RUNNING
===========
./4ROLL > /path/to/output.extension
./4ROLLdec (automatically outputs to /tmp/output, can change in code for windows or symlink for *nix based systems)
======================
GENTOO ISO RUN & STATS
======================
Gentoo live iso is about 1.4GB
Encoded into DNA is about 5.9GB
$ date; ./4ROLL > gentoo.dna;date
Sat Mar 17 09:00:45 MST 2018
Sat Mar 17 09:02:15 MST 2018
$ date; ./4ROLLdec ; date
Sat Mar 17 09:03:12 MST 2018
outputting file to /tmp/output with size 1490615928
had 270563811 junk nucleotides
Sat Mar 17 09:04:40 MST 2018
bytes
1422974976 - gen2.new (gentoo_live.iso decoded)
5962463715 - gentoo.dna
1422974976 - gentoo_live.iso
encoded / decoded
5962463715 / 1422974976
4.190139542552293 x original size
junk / total size
270563811 / 5962463715
0.04537785451328319 (4.5% junk)
===========
MLK STATS
===========
bytes
168539 - MLK_excerpt_VBR_45-85.mp3
705890 - new (MLK mp3 encoded)
outputting file to /tmp/output with size 176472
had 31734 junk nucleotides
encoded / decoded
705890 / 168539
4.188288764024944 x original size
70589
junk / total size
31734 / 705890.
0.044956012976526086 (4.5% junk)
===========
PROS
===========
-fast
-no libraries or fancy gadgets
-can avoid defined length of repeating homopolymer nucleotides
===========
CONS
===========
-not fully tested
-uses 'garbage' value to compensate for homopolymer repeats (possible to use these for parity bit for data integrity?)
-single nucleotide error means data could not be reliable (possibly 'statically roll' table instead of 'rolling' based on nucleotides? using a data value outside the DNA could also allow for multi-threading)
-proof of concept quality (e.g. code has little error checking, will crash if you don't have enough memory to encode the file you're using)
-may have a size error in decoding (see code. bs_offset not checked for size properly)