Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpiler rewrite #14

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
30f78a3
Transpiler: Add initial.
ceyhunsen Mar 29, 2024
8293e65
Riscv_code_generator: Add bunch of instructions.
ceyhunsen Mar 29, 2024
6b97104
Riscv_code_generator: Add rest of the instructions.
ceyhunsen Apr 1, 2024
a4defd3
riscv_code_generator: Add initial tests.
ceyhunsen Apr 1, 2024
06cc6b6
riscv_code_generator: Add tests for generate_code().
ceyhunsen Apr 1, 2024
ec809d6
riscv_code_generator: Fix infinite loop for SW instruction.
ceyhunsen Apr 1, 2024
43f38b7
output: Add JSON file creator.
ceyhunsen Apr 1, 2024
db7dc75
riscv_code_generator: Reorganize tests and add new integer tests.
ceyhunsen Apr 1, 2024
6ede142
riscv_code_generator: Pass binary information to code generator.
ceyhunsen Apr 1, 2024
42fc8cc
riscv_code_generator: Update PC relative instructions.
ceyhunsen Apr 2, 2024
2e89028
riscv_code_generator: Add data page to output.
ceyhunsen Apr 2, 2024
c534b2e
elf_parser: Save every loadable section to memory.
ceyhunsen Apr 3, 2024
c112dae
memory: Add memory generator for data sections.
ceyhunsen Apr 3, 2024
18500bf
json_out: Don't use pretty, it wastes space. Instead, use jq after if…
ceyhunsen Apr 3, 2024
ba6cdcb
riscv_code_generator: Add comments and fix wrong instructions.
ceyhunsen Apr 4, 2024
e9a4752
code_generators: Add unresolved BitVM instruction type.
ceyhunsen Apr 4, 2024
4e04be9
resolver: Add initial resolver.
ceyhunsen Apr 5, 2024
2c8f4ba
comments: Update program flow.
ceyhunsen Apr 5, 2024
a3a10b6
riscv_code_generator: Add labels to relative jumps.
ceyhunsen Apr 5, 2024
65c993b
resolver: Add program end label resolver.
ceyhunsen Apr 5, 2024
633d8b1
riscv_decoder: Add add* tests.
ceyhunsen Apr 5, 2024
7872076
riscv_decoder: Add beq and jal tests.
ceyhunsen Apr 8, 2024
ff49786
riscv_code_generator: Add tests again.
ceyhunsen Apr 9, 2024
a547190
riscv_code_generator: Uncomment rest of the tests.
ceyhunsen Apr 9, 2024
495d9a2
riscv_code_generator: Add jal tests.
ceyhunsen Apr 9, 2024
5678490
riscv_code_generator: Add more tests.
ceyhunsen Apr 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"rust-analyzer.linkedProjects": [
"./Cargo.toml",
"./examples/bitcoin-pow/methods/guest/Cargo.toml",
"./examples/bitcoin-pow/Cargo.toml"
"./examples/bitcoin-pow/Cargo.toml",
"./bitvm-transpiler/Cargo.toml"
]
}
3 changes: 3 additions & 0 deletions bitvm-transpiler/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/target
**target/
.vscode
12 changes: 12 additions & 0 deletions bitvm-transpiler/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[package]
name = "bitvm-transpiler"
description = "Binary to BitVM assembly transpiler"
version = "0.1.0"
edition = "2021"

[dependencies]
clap = { version = "4.5.3", features = ["derive"] }
elf = "0.7.4"
file-format = "0.24.0"
serde = { version = "1.0.197", features = ["derive"] }
serde_json = "1.0.115"
617 changes: 617 additions & 0 deletions bitvm-transpiler/LICENSE

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions bitvm-transpiler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# BitVM Transpiler

This is a **WIP** transpiler for converting other assembly languages to BitVM
assembly. This transpiler accepts different input file formats and ISA's.

## Running the Transpiler

```bash
cargo run --release $INPUTFILE
```

## Accepted Input File Formats

This transpiler can read and parse instructions only in executable and linkable
format (ELF).

## Supported Instruction Set Architectures and Extensions

Currently, BitVM tries to emulate rv32i instruction set. But this transpiler can
support more instruction sets or extension sets as input. Supported ISA's
resides in `src/*_parser`.

* rv32i + m extension

## References

* Carsten Munk's rv32i to BitVM transpiler - https://github.com/zippiehq/rv32i-to-bitvm/.

## License

This project is licensed under the GNU General Public License v3.0 - see the
[LICENSE](LICENSE) file for details. By using, distributing, or contributing to
this software, you agree to the terms and conditions of the GPLv3.
107 changes: 107 additions & 0 deletions bitvm-transpiler/src/bitvm.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
// bitvm-transpiler - Convert other assemblies to BitVM assembly.
// Copyright (C) 2024 Chainway Labs
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.

//! BitVM machine description. Parsers must target these descriptions. Most of
//! the stuff here are copied from the official BitVM implementation.

use serde::{Deserialize, Serialize};

pub const ASM_ADD: u8 = 1;
pub const ASM_SUB: u8 = 2;
pub const _ASM_MUL: u8 = 3;
pub const ASM_AND: u8 = 4;
pub const ASM_OR: u8 = 5;
pub const ASM_XOR: u8 = 6;
pub const ASM_ADDI: u8 = 7;
pub const ASM_SUBI: u8 = 8;
pub const ASM_ANDI: u8 = 9;
pub const ASM_ORI: u8 = 10;
pub const ASM_XORI: u8 = 11;
pub const ASM_JMP: u8 = 12;
pub const ASM_BEQ: u8 = 13;
pub const ASM_BNE: u8 = 14;
pub const ASM_RSHIFT1: u8 = 15;
pub const ASM_SLTU: u8 = 16;
pub const ASM_SLT: u8 = 17;
pub const _ASM_SYSCALL: u8 = 18;
pub const ASM_LOAD: u8 = 19;
pub const ASM_STORE: u8 = 20;

#[derive(Copy, Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct Instruction {
pub asm_type: u8,
pub address_a: u32,
pub address_b: u32,
pub address_c: u32,
}

/// Possible labels for a RISC-V instruction.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum Labels {
None,
/// Represents program counter of the real RISC-V instruction.
Pc(u64),
/// Jump to an address relative to current pc. This can help creating loops.
Relative(i64),
/// Can be used to exit program early.
ProgramEnd,
}

/// This struct will be used by the resolver to generate final BitVM assembly
/// instructions.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct UnresolvedInstruction {
pub instr: Instruction,
/// Metadata for resolver. If set other than Labels::None, it will be
/// resolved by the resolver.
pub label: Labels,
/// Real program counter in the actual binary.
pub real_pc: u64,
/// After code generation, this will be updated.
pub bitvm_pc: u64,
}

impl UnresolvedInstruction {
/// Creates a new unresolved instruction.
pub fn _new() -> Self {
Self {
instr: Instruction {
asm_type: 0,
address_a: 0,
address_b: 0,
address_c: 0,
},
real_pc: 0,
bitvm_pc: 0,
label: Labels::None,
}
}

/// Creates an unresolved instruction, using a BitVM instruction.
pub fn from(instr: Instruction) -> Self {
Self {
instr: instr,
real_pc: 0,
bitvm_pc: 0,
label: Labels::None,
}
}

/// Returns the generated BitVM instruction, without actually resolving.
pub fn _lazy_resolve(&self) -> Instruction {
self.instr
}
}
62 changes: 62 additions & 0 deletions bitvm-transpiler/src/cli.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// bitvm-transpiler - Convert other assemblies to BitVM assembly.
// Copyright (C) 2024 Chainway Labs
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.

//! This module parses CLI arguments and options.

use clap::Parser;
use file_format::FileFormat;
use std::path::PathBuf;

/// Command line arguments and flags. Clap will handle these.
#[derive(Parser, Debug)]
#[command(version, about)]
struct Args {
/// Input file to be converted to BitVM assembly.
input: Option<PathBuf>,
}

/// Main function to parse all command line inputs.
pub fn parse_cli() -> PathBuf {
let cli = Args::parse();
let mut input_file = PathBuf::new();

match cli.input.as_deref() {
Some(file) => input_file = file.to_owned(),
None => fatal_error("Input file not given."),
}

check_file_format(&input_file);

input_file
}

/// Checks if file format is supported. Exits program in case of any
/// incompatibility.
fn check_file_format(input_file: &PathBuf) {
let fmt = FileFormat::from_file(input_file);

match fmt {
Ok(FileFormat::ExecutableAndLinkableFormat) => (),
Err(e) => fatal_error(e.to_string().as_str()),
_ => fatal_error("Input file format not supported."),
}
}

/// Prints a simple error message and exits program with an error code in case
/// of any error while parsing CLI arguments.
fn fatal_error(error: &str) {
panic!("Fatal error while parsing CLI arguments: {}", error);
}
136 changes: 136 additions & 0 deletions bitvm-transpiler/src/elf_parser.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
// bitvm-transpiler - Convert other assemblies to BitVM assembly.
// Copyright (C) 2024 Chainway Labs
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.

//! # Executable and Linkable Format Parser
//!
//! This module reads an ELF input binary file's contents and saves important
//! bits to memory.
//!
//! We only need actual program code in an ELF file. Needed sections include:
//! * .text
//! * .data
//!
//! Note: This parser heavily depends on a 3rd party elf library.

use elf::endian::AnyEndian;
use elf::file::Class;
use elf::section::SectionHeader;
use elf::ElfStream;
use std::path::PathBuf;

/// Parsed binary file content, which is ready for transpiling.
#[derive(Clone, Debug)]
pub struct InputBinary {
pub isa: Isa,
pub code_page: Vec<u8>,
pub code_addr: u64,
pub data_page: Vec<Vec<u8>>,
pub data_addr: Vec<u64>,
}

impl InputBinary {
pub fn new() -> Self {
Self {
isa: Isa::None,
code_page: vec![],
code_addr: 0,
data_page: vec![],
data_addr: vec![],
}
}
}

/// Supported ISA's.
#[derive(Clone, Copy, Debug)]
pub enum Isa {
None,
Riscv32,
Riscv64,
}

/// This struct includes summary of the input binary. It is also a subset of
/// `InputBinary`. Therefore it is initialized using the `InputBinary` struct.
#[derive(Clone, Copy)]
pub struct BinaryInfo {
pub isa: Isa,
pub pc: u64,
}

impl BinaryInfo {
pub fn from(input_binary: InputBinary) -> Self {
Self {
isa: input_binary.isa,
pc: input_binary.code_addr,
}
}
}

/// Reads file and section headers. Returns ELF target machine information and
/// actual code.
pub fn read_elf_file(input_file: PathBuf) -> InputBinary {
let mut binary = InputBinary::new();
let path = std::path::PathBuf::from(input_file);
let io = std::fs::File::open(path).expect("Could not open ELF file.");
let mut file =
ElfStream::<AnyEndian, _>::open_stream(io).expect("Could not open ELF file as a stream.");

binary.isa = match file.ehdr.e_machine {
0xF3 => match file.ehdr.class {
Class::ELF32 => Isa::Riscv32,
Class::ELF64 => Isa::Riscv64,
},
_ => panic!("Unsupported ISA!"),
};

let text_section = get_a_section(&mut file, ".text");
binary.code_page = text_section.0.to_vec();
binary.code_addr = text_section.1;

get_data_sections(&mut file, &mut binary);

binary
}

/// Gets a raw section data and it's offset.
fn get_a_section<'a, T>(file: &'a mut ElfStream<AnyEndian, T>, section: &str) -> (&'a [u8], u64)
where
T: std::io::Read + std::io::Seek,
{
let shdr: SectionHeader = *file
.section_header_by_name(section)
.expect("section table should be parsable")
.expect(format!("file should have a {} section", section).as_str());

(file.section_data(&shdr).unwrap().0, shdr.sh_addr)
}

/// Gets multiple loadable data sections.
fn get_data_sections<'a, T>(file: &'a mut ElfStream<AnyEndian, T>, binary: &mut InputBinary)
where
T: std::io::Read + std::io::Seek,
{
// Get both actual section data and section name table.
let shdr = file.section_headers().clone();

for hdr in shdr {
if hdr.sh_addr != binary.code_addr && hdr.sh_type == 1 && hdr.sh_addr != 0 {
binary.data_addr.push(hdr.sh_addr);
binary
.data_page
.push(file.section_data(&hdr).unwrap().0.to_vec());
}
}
}
Loading