Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vessel IDs present in get_event(), but absent in get_vessel_info() #181

Open
Shyentist opened this issue Dec 8, 2024 · 3 comments
Open

Comments

@Shyentist
Copy link

Good morning. I believe I am having trouble wrapping my head around the documentation but I found an unexpected behavior.

When retrieving information via get_event() without giving a list of ids, I get a lot more observations than I do giving a list of ids from get_vessel_info(), with the former returning more USA vessels in a brief period of time than the latter does without specifying any start or end date.

I repeat, I may be just wrong, but in any case here's a reproducible example.


key <- Sys.getenv("GFW_TOKEN")

# this should return all the vessels with flags = 'USA'
usa_vessels <- get_vessel_info(
  where = "flag='USA'",
  search_type = "search",
  key = key
)

# and according to the README, this should return the related ids
usa_vessels_ids <- usa_vessels$selfReportedInfo$vesselId

# I can then get the events in a period of time involving those ids
usa_vessels_fishing_events_with_ids <- get_event(
  event_type = 'FISHING',
  vessels = usa_vessels_ids,
  start_date = "2023-01-01",
  end_date = "2023-02-01",
  key = key
)

# I can also get the events in the same period of time involving USA flags
usa_vessels_fishing_events_without_ids <- get_event(
  event_type = 'FISHING',
  flags = "USA",
  start_date = "2023-01-01",
  end_date = "2023-02-01",
  key = key
)

# and then extract the ids of those vessels
usa_vessels_ids_from_get_event <- usa_vessels_fishing_events_without_ids$vessel

# the two datasets retrieved are very different in number of observations
nrow(usa_vessels_fishing_events_with_ids)
nrow(usa_vessels_fishing_events_without_ids)

# and even in the number of ids
length(unique(usa_vessels_ids))
length(unique(usa_vessels_ids_from_get_event))

# unexpectedly, even just the first ship of the dataset retrieved without ids is
# not found in the list of ALL vessels with flag = 'USA', despite it having
# Name: ST. MICHAEL, Type: 'fishing', Flag: 'USA', and id: '4904902b3-3ace-eba4-cc63-6505ff890cba'
usa_vessels_ids_from_get_event[[1]][["name"]]
usa_vessels_ids_from_get_event[[1]][["type"]]
usa_vessels_ids_from_get_event[[1]][["flag"]]
usa_vessels_ids_from_get_event[[1]][["id"]]

# in fact, checking whether its id is contained in the first dataset returns false
usa_vessels_ids_from_get_event[[1]][["id"]] %in% usa_vessels_ids

Why is ST. MICHAEL's id, like many others, not present in the list of ids of all USA vessels returned by get_vessel_info()?

@AndreaSanchezTapia
Copy link
Member

Hi @Shyentist, Thanks for reaching out. I don't see the output you are receiving so I will ask a couple of things. First, you may be using a previous version of the package. If this is the case, please reinstall it because the get_vessel_info() search was corrected and now should return the whole vessel list. (In the case of your first query, that should be ~157000 vessels).

Second, due to a rate limitation in get_event(), you can't send the whole vector of vessel_Ids to the function. It only takes up to 20 vesselId so you would have to loop along the vesselId vector.

Please let me know if this is the case, update the script and we can continue from there with any question you may have.
It would be great if you paste the output of what you are running to check if we are having the same results.

@zainahmadmian
Copy link

Hi @AndreaSanchezTapia and @Shyentist, thank you for sharing your code, I reproduce it with my variables and it worked fine on the first day ( getting a csv file having required information) but now when i am reproducing it, its showing error HTTP 422 Unprocessible entity. the error comes when i try to get the events in a particular time.

Pasting my code below:

Check/install remotes

if (!require("remotes"))
install.packages("remotes")
remotes::install_github("GlobalFishingWatch/gfwr")

library(gfwr)
key <- gfw_auth()

this should return all the vessels with flags = 'ESP'

esp_vessels <- get_vessel_info(
where = "flag='ESP'",
search_type = "search",
key = key
)
esp_vessels_ids <- esp_vessels$selfReportedInfo$vesselId

getting the events in a period of time involving above ids

esp_vessels_fishing_events_with_ids <- get_event(
event_type = 'FISHING',
vessels = esp_vessels_ids,
start_date = "2021-01-01",
end_date = "2022-01-01",
key = key
)

nrow(esp_vessels_fishing_events_with_ids_2)
length(unique(esp_vessels_ids))

df<- data.frame(esp_vessels_fishing_events_with_ids_2)
path<-"C:/Users/ZainAhmad/Downloads/espship2.csv"

df <- apply(df,2,as.character)

@Shyentist
Copy link
Author

After updating the package, I am also encountering the same error as @zainahmadmian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants