Skip to content

Commit

Permalink
GH-38589: [C++][Gandiva] Support registering external C functions (#3…
Browse files Browse the repository at this point in the history
…8632)

### Rationale for this change
This PR tries to enhance Gandiva by supporting registering external C functions to its function registry, so that developers can author third party functions with complex dependency and expose them as C functions to be used in Gandiva expression. See more details in GH-38589.

### What changes are included in this PR?
This PR primarily adds a new API to the `FunctionRegistry` so that developers can use it to register external C functions:
```C++
arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);
```

### Are these changes tested?
* The changes are tested via unit tests in this PR, and the unit tests include several C functions written using C++ and we confirm this kind of functions can be used by Gandiva after registration using the above mentioned new API.
* Additionally, locally I wrote some Rust based functions, and integrate the Rust based functions into a C++ program by using the new registration API and verified this approach did work, but this piece of work is not included in the PR.

### Are there any user-facing changes?
There are several new APIs added to `FunctionRegistry` class:
```C++
/// \brief register a C function into the function registry
  /// @ param func the registered function's metadata
  /// @ param c_function_ptr the function pointer to the
  /// registered function's implementation
  /// @ param function_holder_maker this will be used as the function holder if the
  /// function requires a function holder
  arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);

  /// \brief get a list of C functions saved in the registry
  const std::vector<std::pair<NativeFunction, void*>>& GetCFunctions() const;

  const FunctionHolderMakerRegistry& GetFunctionHolderMakerRegistry() const;
```

* Closes: #38589

### Notes
* This PR is related with #38116, which adds the initial support for registering LLVM IR based external functions into Gandiva.

Authored-by: Yue Ni <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
niyue authored Nov 17, 2023
1 parent e543ee6 commit c353c81
Show file tree
Hide file tree
Showing 25 changed files with 550 additions and 121 deletions.
2 changes: 2 additions & 0 deletions cpp/src/gandiva/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,9 @@ set(SRC_FILES
expression_registry.cc
exported_funcs_registry.cc
exported_funcs.cc
external_c_functions.cc
filter.cc
function_holder_maker_registry.cc
function_ir_builder.cc
function_registry.cc
function_registry_arithmetic.cc
Expand Down
3 changes: 2 additions & 1 deletion cpp/src/gandiva/cast_time.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

namespace gandiva {

void ExportedTimeFunctions::AddMappings(Engine* engine) const {
arrow::Status ExportedTimeFunctions::AddMappings(Engine* engine) const {
std::vector<llvm::Type*> args;
auto types = engine->types();

Expand All @@ -42,6 +42,7 @@ void ExportedTimeFunctions::AddMappings(Engine* engine) const {
engine->AddGlobalMappingForFunc("gdv_fn_time_with_zone",
types->i32_type() /*return_type*/, args,
reinterpret_cast<void*>(gdv_fn_time_with_zone));
return arrow::Status::OK();
}

} // namespace gandiva
Expand Down
3 changes: 2 additions & 1 deletion cpp/src/gandiva/context_helper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

namespace gandiva {

void ExportedContextFunctions::AddMappings(Engine* engine) const {
arrow::Status ExportedContextFunctions::AddMappings(Engine* engine) const {
std::vector<llvm::Type*> args;
auto types = engine->types();

Expand All @@ -50,6 +50,7 @@ void ExportedContextFunctions::AddMappings(Engine* engine) const {

engine->AddGlobalMappingForFunc("gdv_fn_context_arena_reset", types->void_type(), args,
reinterpret_cast<void*>(gdv_fn_context_arena_reset));
return arrow::Status::OK();
}

} // namespace gandiva
Expand Down
3 changes: 2 additions & 1 deletion cpp/src/gandiva/decimal_xlarge.cc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

namespace gandiva {

void ExportedDecimalFunctions::AddMappings(Engine* engine) const {
arrow::Status ExportedDecimalFunctions::AddMappings(Engine* engine) const {
std::vector<llvm::Type*> args;
auto types = engine->types();

Expand Down Expand Up @@ -93,6 +93,7 @@ void ExportedDecimalFunctions::AddMappings(Engine* engine) const {

engine->AddGlobalMappingForFunc("gdv_xlarge_compare", types->i32_type() /*return_type*/,
args, reinterpret_cast<void*>(gdv_xlarge_compare));
return arrow::Status::OK();
}

} // namespace gandiva
Expand Down
8 changes: 6 additions & 2 deletions cpp/src/gandiva/engine.cc
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Engine::Engine(const std::shared_ptr<Configuration>& conf,
Status Engine::Init() {
std::call_once(register_exported_funcs_flag, gandiva::RegisterExportedFuncs);
// Add mappings for global functions that can be accessed from LLVM/IR module.
AddGlobalMappings();
ARROW_RETURN_NOT_OK(AddGlobalMappings());

return Status::OK();
}
Expand Down Expand Up @@ -447,7 +447,11 @@ void Engine::AddGlobalMappingForFunc(const std::string& name, llvm::Type* ret_ty
execution_engine_->addGlobalMapping(fn, function_ptr);
}

void Engine::AddGlobalMappings() { ExportedFuncsRegistry::AddMappings(this); }
arrow::Status Engine::AddGlobalMappings() {
ARROW_RETURN_NOT_OK(ExportedFuncsRegistry::AddMappings(this));
ExternalCFunctions c_funcs(function_registry_);
return c_funcs.AddMappings(this);
}

std::string Engine::DumpIR() {
std::string ir;
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/gandiva/engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ class GANDIVA_EXPORT Engine {
Status LoadExternalPreCompiledIR();

// Create and add mappings for cpp functions that can be accessed from LLVM.
void AddGlobalMappings();
arrow::Status AddGlobalMappings();

// Remove unused functions to reduce compile time.
Status RemoveUnusedFunctions();
Expand Down
26 changes: 19 additions & 7 deletions cpp/src/gandiva/exported_funcs.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#pragma once

#include <vector>
#include "gandiva/function_registry.h"
#include "gandiva/visibility.h"

namespace gandiva {
Expand All @@ -29,37 +30,48 @@ class ExportedFuncsBase {
public:
virtual ~ExportedFuncsBase() = default;

virtual void AddMappings(Engine* engine) const = 0;
virtual arrow::Status AddMappings(Engine* engine) const = 0;
};

// Class for exporting Stub functions
class ExportedStubFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

// Class for exporting Context functions
class ExportedContextFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

// Class for exporting Time functions
class ExportedTimeFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

// Class for exporting Decimal functions
class ExportedDecimalFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

// Class for exporting String functions
class ExportedStringFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

// Class for exporting Hash functions
class ExportedHashFunctions : public ExportedFuncsBase {
void AddMappings(Engine* engine) const override;
arrow::Status AddMappings(Engine* engine) const override;
};

class ExternalCFunctions : public ExportedFuncsBase {
public:
explicit ExternalCFunctions(std::shared_ptr<FunctionRegistry> function_registry)
: function_registry_(std::move(function_registry)) {}

arrow::Status AddMappings(Engine* engine) const override;

private:
std::shared_ptr<FunctionRegistry> function_registry_;
};

GANDIVA_EXPORT void RegisterExportedFuncs();
Expand Down
5 changes: 3 additions & 2 deletions cpp/src/gandiva/exported_funcs_registry.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@

namespace gandiva {

void ExportedFuncsRegistry::AddMappings(Engine* engine) {
arrow::Status ExportedFuncsRegistry::AddMappings(Engine* engine) {
for (const auto& entry : *registered()) {
entry->AddMappings(engine);
ARROW_RETURN_NOT_OK(entry->AddMappings(engine));
}
return arrow::Status::OK();
}

const ExportedFuncsRegistry::list_type& ExportedFuncsRegistry::Registered() {
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/gandiva/exported_funcs_registry.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class GANDIVA_EXPORT ExportedFuncsRegistry {
using list_type = std::vector<std::shared_ptr<ExportedFuncsBase>>;

// Add functions from all the registered classes to the engine.
static void AddMappings(Engine* engine);
static arrow::Status AddMappings(Engine* engine);

static bool Register(std::shared_ptr<ExportedFuncsBase> entry) {
registered()->emplace_back(std::move(entry));
Expand Down
8 changes: 5 additions & 3 deletions cpp/src/gandiva/expr_decomposer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,12 @@

#include "gandiva/annotator.h"
#include "gandiva/dex.h"
#include "gandiva/function_holder_registry.h"
#include "gandiva/function_holder_maker_registry.h"
#include "gandiva/function_registry.h"
#include "gandiva/function_signature.h"
#include "gandiva/in_holder.h"
#include "gandiva/node.h"
#include "gandiva/regex_functions_holder.h"

namespace gandiva {

Expand Down Expand Up @@ -81,9 +82,10 @@ Status ExprDecomposer::Visit(const FunctionNode& in_node) {
std::shared_ptr<FunctionHolder> holder;
int holder_idx = -1;
if (native_function->NeedsFunctionHolder()) {
auto status = FunctionHolderRegistry::Make(desc->name(), node, &holder);
auto function_holder_maker_registry = registry_.GetFunctionHolderMakerRegistry();
ARROW_ASSIGN_OR_RAISE(holder,
function_holder_maker_registry.Make(desc->name(), node));
holder_idx = annotator_.AddHolderPointer(holder.get());
ARROW_RETURN_NOT_OK(status);
}

if (native_function->result_nullable_type() == kResultNullIfNull) {
Expand Down
79 changes: 79 additions & 0 deletions cpp/src/gandiva/external_c_functions.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License

#include <llvm/IR/Type.h>

#include "gandiva/engine.h"
#include "gandiva/exported_funcs.h"

namespace {
// calculate the number of arguments for a function signature
size_t GetNumArgs(const gandiva::FunctionSignature& sig,
const gandiva::NativeFunction& func) {
auto num_args = 0;
num_args += func.NeedsContext() ? 1 : 0;
num_args += func.NeedsFunctionHolder() ? 1 : 0;
for (auto const& arg : sig.param_types()) {
num_args += arg->id() == arrow::Type::STRING ? 2 : 1;
}
num_args += sig.ret_type()->id() == arrow::Type::STRING ? 1 : 0;
return num_args;
}

// map from a NativeFunction's signature to the corresponding LLVM signature
arrow::Result<std::pair<std::vector<llvm::Type*>, llvm::Type*>> MapToLLVMSignature(
const gandiva::FunctionSignature& sig, const gandiva::NativeFunction& func,
gandiva::LLVMTypes* types) {
std::vector<llvm::Type*> arg_llvm_types;
arg_llvm_types.reserve(GetNumArgs(sig, func));

if (func.NeedsContext()) {
arg_llvm_types.push_back(types->i64_type());
}
if (func.NeedsFunctionHolder()) {
arg_llvm_types.push_back(types->i64_type());
}
for (auto const& arg : sig.param_types()) {
arg_llvm_types.push_back(types->IRType(arg->id()));
if (arg->id() == arrow::Type::STRING) {
// string type needs an additional length argument
arg_llvm_types.push_back(types->i32_type());
}
}
if (sig.ret_type()->id() == arrow::Type::STRING) {
// for string output, the last arg is the output length
arg_llvm_types.push_back(types->i32_ptr_type());
}
auto ret_llvm_type = types->IRType(sig.ret_type()->id());
return std::make_pair(std::move(arg_llvm_types), ret_llvm_type);
}
} // namespace

namespace gandiva {
Status ExternalCFunctions::AddMappings(Engine* engine) const {
auto const& c_funcs = function_registry_->GetCFunctions();
auto const types = engine->types();
for (auto& [func, func_ptr] : c_funcs) {
for (auto const& sig : func.signatures()) {
ARROW_ASSIGN_OR_RAISE(auto llvm_signature, MapToLLVMSignature(sig, func, types));
auto& [args, ret_llvm_type] = llvm_signature;
engine->AddGlobalMappingForFunc(func.pc_name(), ret_llvm_type, args, func_ptr);
}
}
return Status::OK();
}
} // namespace gandiva
72 changes: 72 additions & 0 deletions cpp/src/gandiva/function_holder_maker_registry.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include "gandiva/function_holder_maker_registry.h"

#include <functional>

#include "arrow/util/string.h"
#include "gandiva/function_holder.h"
#include "gandiva/interval_holder.h"
#include "gandiva/random_generator_holder.h"
#include "gandiva/regex_functions_holder.h"
#include "gandiva/to_date_holder.h"

namespace gandiva {

using arrow::internal::AsciiToLower;

FunctionHolderMakerRegistry::FunctionHolderMakerRegistry()
: function_holder_makers_(DefaultHolderMakers()) {}

arrow::Status FunctionHolderMakerRegistry::Register(const std::string& name,
FunctionHolderMaker holder_maker) {
function_holder_makers_.emplace(AsciiToLower(name), std::move(holder_maker));
return arrow::Status::OK();
}

template <typename HolderType>
static arrow::Result<FunctionHolderPtr> HolderMaker(const FunctionNode& node) {
std::shared_ptr<HolderType> derived_instance;
ARROW_RETURN_NOT_OK(HolderType::Make(node, &derived_instance));
return derived_instance;
}

arrow::Result<FunctionHolderPtr> FunctionHolderMakerRegistry::Make(
const std::string& name, const FunctionNode& node) {
auto lowered_name = AsciiToLower(name);
auto found = function_holder_makers_.find(lowered_name);
if (found == function_holder_makers_.end()) {
return Status::Invalid("function holder not registered for function " + name);
}

return found->second(node);
}

FunctionHolderMakerRegistry::MakerMap FunctionHolderMakerRegistry::DefaultHolderMakers() {
static const MakerMap maker_map = {
{"like", HolderMaker<LikeHolder>},
{"to_date", HolderMaker<ToDateHolder>},
{"random", HolderMaker<RandomGeneratorHolder>},
{"rand", HolderMaker<RandomGeneratorHolder>},
{"regexp_replace", HolderMaker<ReplaceHolder>},
{"regexp_extract", HolderMaker<ExtractHolder>},
{"castintervalday", HolderMaker<IntervalDaysHolder>},
{"castintervalyear", HolderMaker<IntervalYearsHolder>}};
return maker_map;
}
} // namespace gandiva
Loading

0 comments on commit c353c81

Please sign in to comment.