Skip to content

Commit

Permalink
Define and Implement C++ API for negative sampling (#4523)
Browse files Browse the repository at this point in the history
Defines and implements the PLC/C/C++ APIs for negative sampling.

Closes #4497

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Joseph Nke (https://github.com/jnke2016)
  - Seunghwa Kang (https://github.com/seunghwak)

URL: #4523
  • Loading branch information
ChuckHastings authored Aug 21, 2024
1 parent 3db77b3 commit 97d1641
Show file tree
Hide file tree
Showing 33 changed files with 2,888 additions and 173 deletions.
7 changes: 7 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,12 @@ set(CUGRAPH_SOURCES
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/negative_sampling_sg_v32_e64.cu
src/sampling/negative_sampling_sg_v32_e32.cu
src/sampling/negative_sampling_sg_v64_e64.cu
src/sampling/negative_sampling_mg_v32_e64.cu
src/sampling/negative_sampling_mg_v32_e32.cu
src/sampling/negative_sampling_mg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v32_e32.cu
src/sampling/sampling_post_processing_sg_v64_e64.cu
Expand Down Expand Up @@ -657,6 +663,7 @@ add_library(cugraph_c
src/c_api/louvain.cpp
src/c_api/triangle_count.cpp
src/c_api/neighbor_sampling.cpp
src/c_api/negative_sampling.cpp
src/c_api/labeling_result.cpp
src/c_api/weakly_connected_components.cpp
src/c_api/strongly_connected_components.cpp
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cugraph/graph_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
/* (edge_srcs, edge_dsts) should be pre-shuffled */
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(
raft::handle_t const& handle,
Expand Down Expand Up @@ -945,7 +945,7 @@ class graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu, std::enable_if
rmm::device_uvector<bool> has_edge(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
raft::device_span<vertex_t const> edge_dsts,
bool do_expensive_check = false);
bool do_expensive_check = false) const;

rmm::device_uvector<edge_t> compute_multiplicity(raft::handle_t const& handle,
raft::device_span<vertex_t const> edge_srcs,
Expand Down
57 changes: 57 additions & 0 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -743,4 +743,61 @@ lookup_endpoints_from_edge_ids_and_types(
raft::device_span<edge_t const> edge_ids_to_lookup,
raft::device_span<edge_type_t const> edge_types_to_lookup);

/**
* @brief Negative Sampling
*
* This function generates negative samples for graph.
*
* Negative sampling is done by generating a random graph according to the specified
* parameters and optionally removing samples that represent actual edges in the graph
*
* Sampling occurs by creating a list of source vertex ids from biased samping
* of the source vertex space, and destination vertex ids from biased sampling of the
* destination vertex space, and using this as the putative list of edges. We
* then can optionally remove duplicates and remove actual edges in the graph to generate
* the final list. If necessary we will repeat the process to end with a resulting
* edge list of the appropriate size.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
*
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling for
* @param rng_state RNG state
* @param src_biases Optional bias for randomly selecting source vertices. If std::nullopt vertices
* will be selected uniformly. In multi-GPU environment the biases should be partitioned based
* on the vertex partitions.
* @param dst_biases Optional bias for randomly selecting destination vertices. If std::nullopt
* vertices will be selected uniformly. In multi-GPU environment the biases should be partitioned
* based on the vertex partitions.
* @param num_samples Number of negative samples to generate
* @param remove_duplicates If true, remove duplicate samples
* @param remove_existing_edges If true, remove samples that are actually edges in the graph
* @param exact_number_of_samples If true, repeat generation until we get the exact number of
* negative samples
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
*
* @return tuple containing source vertex ids and destination vertex ids for the negative samples
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<vertex_t>> negative_sampling(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<raft::device_span<weight_t const>> src_biases,
std::optional<raft::device_span<weight_t const>> dst_biases,
size_t num_samples,
bool remove_duplicates,
bool remove_existing_edges,
bool exact_number_of_samples,
bool do_expensive_check);

} // namespace cugraph
115 changes: 115 additions & 0 deletions cpp/include/cugraph_c/coo.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>

#ifdef __cplusplus
extern "C" {
#endif

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

#ifdef __cplusplus
}
#endif
86 changes: 1 addition & 85 deletions cpp/include/cugraph_c/graph_generators.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

#include <cugraph_c/array.h>
#include <cugraph_c/coo.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/random.h>
#include <cugraph_c/resource_handle.h>
Expand All @@ -27,91 +28,6 @@ extern "C" {

typedef enum { POWER_LAW = 0, UNIFORM } cugraph_generator_distribution_t;

/**
* @brief Opaque COO definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_t;

/**
* @brief Opaque COO list definition
*/
typedef struct {
int32_t align_;
} cugraph_coo_list_t;

/**
* @brief Get the source vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of source vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_sources(cugraph_coo_t* coo);

/**
* @brief Get the destination vertex ids
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of destination vertex ids
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_destinations(cugraph_coo_t* coo);

/**
* @brief Get the edge weights
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge weights, NULL if no edge weights in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_weights(cugraph_coo_t* coo);

/**
* @brief Get the edge id
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge id, NULL if no edge ids in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_id(cugraph_coo_t* coo);

/**
* @brief Get the edge type
*
* @param [in] coo Opaque pointer to COO
* @return type erased array view of edge type, NULL if no edge types in COO
*/
cugraph_type_erased_device_array_view_t* cugraph_coo_get_edge_type(cugraph_coo_t* coo);

/**
* @brief Get the number of coo object in the list
*
* @param [in] coo_list Opaque pointer to COO list
* @return number of elements
*/
size_t cugraph_coo_list_size(const cugraph_coo_list_t* coo_list);

/**
* @brief Get a COO from the list
*
* @param [in] coo_list Opaque pointer to COO list
* @param [in] index Index of desired COO from list
* @return a cugraph_coo_t* object from the list
*/
cugraph_coo_t* cugraph_coo_list_element(cugraph_coo_list_t* coo_list, size_t index);

/**
* @brief Free coo object
*
* @param [in] coo Opaque pointer to COO
*/
void cugraph_coo_free(cugraph_coo_t* coo);

/**
* @brief Free coo list
*
* @param [in] coo_list Opaque pointer to list of COO objects
*/
void cugraph_coo_list_free(cugraph_coo_list_t* coo_list);

/**
* @brief Generate RMAT edge list
*
Expand Down
52 changes: 52 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

#pragma once

#include <cugraph_c/coo.h>
#include <cugraph_c/error.h>
#include <cugraph_c/graph.h>
#include <cugraph_c/properties.h>
Expand Down Expand Up @@ -674,6 +675,57 @@ cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handl
cugraph_type_erased_device_array_t** vertices,
cugraph_error_t** error);

/**
* @ingroup samplingC
* @brief Perform negative sampling
*
* Negative sampling generates a COO structure defining edges according to the specified parameters
*
* @param [in] handle Handle for accessing resources
* @param [in,out] rng_state State of the random number generator, updated with each
* call
* @param [in] graph Pointer to graph
* @param [in] vertices Vertex ids for the source biases. If @p src_bias and
* @p dst_bias are not specified this is ignored. If
* @p vertices is specified then vertices[i] is the vertex
* id of src_biases[i] and dst_biases[i]. If @p vertices
* is not specified then i is the vertex id if src_biases[i]
* and dst_biases[i]
* @param [in] src_biases Bias for selecting source vertices. If NULL, do uniform
* sampling, if provided probability of vertex i will be
* src_bias[i] / (sum of all source biases)
* @param [in] dst_biases Bias for selecting destination vertices. If NULL, do
* uniform sampling, if provided probability of vertex i
* will be dst_bias[i] / (sum of all destination biases)
* @param [in] num_samples Number of negative samples to generate
* @param [in] remove_duplicates If true, remove duplicates from sampled edges
* @param [in] remove_existing_edges If true, remove sampled edges that actually exist in
* the graph
* @param [in] exact_number_of_samples If true, result should contain exactly @p num_samples. If
* false the code will generate @p num_samples and then do
* any filtering as specified
* @param [in] do_expensive_check A flag to run expensive checks for input arguments (if
* set to true)
* @param [out] result Opaque pointer to generated coo list
* @param [out] error Pointer to an error object storing details of any error.
* Will be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_negative_sampling(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* vertices,
const cugraph_type_erased_device_array_view_t* src_biases,
const cugraph_type_erased_device_array_view_t* dst_biases,
size_t num_samples,
bool_t remove_duplicates,
bool_t remove_existing_edges,
bool_t exact_number_of_samples,
bool_t do_expensive_check,
cugraph_coo_t** result,
cugraph_error_t** error);

#ifdef __cplusplus
}
#endif
Loading

0 comments on commit 97d1641

Please sign in to comment.