Skip to content

creating_a_grabber_datafile

Hika van den Hoven edited this page May 5, 2017 · 21 revisions

Creating the Grabber datafile

Some names

#####sourceid A unique integer identifying the sources. They are defined under the "sources" keyword.

channelid

A for the source unique string identifying a channel on that source. Normally they are extracted from the source.

chanid

A for the grabber unique string identifying a channel for the grabber. Channelids are linked to a chanid under the "source_channels" keyword in your grabber datafile. If not defined for a channelid it will default to the sourceid and the channelid concatenated with a hyven. It is the basis of the xmltvid

groupid

In the configuration file channels can be grouped. This can be whatever you want like national/regional/ foreign, tv/radio, news/sports/movie, etc. The groups are defined under the "channel_groups" keyword. Often a source groups its channels and that can be the basis. On the basis of a group you can:

logoid

The commom part of a logo URL is stored under "logo_provider" keword assigning it an id. In tv_grabAPI.json a list is present, but you can also add logo providers in your grabber datafile. You should use id starting at "100".

The basics

Before you go on to create a sourcefile you need a basic grabber_datafile. Containing the keywords under Required. Part of them ("sources", "source_channels", "active_sources") you for now fill with an empty list or dict. They get filled after creating the first source. Some are simple, here you see them with the Dutch version:

	"data_version":0,
	"program_version":[1,0,0],
	"fetch-timezone":"Europe/Amsterdam",
	"language":"nl",
	"language_texts":{
		"and others":"e\\.a\\.",
		"and":"en",
		"gap text":"Programmainfo en Reclame"},
	"channel_groups":{
		"0":"Active Channels",
		"99":"Other"},
	"cattrans_unknown":"Overige",
	"unknown_program_title":"Geen Programmagegevens Beschikbaar",
	"source-url":"https://raw.githubusercontent.com/tvgrabbers/sourcematching/master/sources",
    "sources":{},
    "source_channels":{},
    "active_sources":[],

You set a timezone and language or leave them to the defaults: "UTC" and "en" and you give your language version for the others. This leaves two others: "roletrans" and "cattrans". These are default translation tables the user can adapt in tv_grab_xx_py.set.
You save the file as tv_grab_xx.json, where xx should be replaced by your country code like: nl, gb, de.

Testing

You can use tv_grab_test_json.py tv_grab_xx.json to test the syntax. It first will test for json errors like a missing comma or so. Next it will try to recognize all the keywords and check on values and keywords being of the right type. It also will check for instance if a used sourceid is defined under the sourceskeyword. Keywords are seperated in required, sugested and optional and on whether they have a default value. By default any optional keys not present are not listed in the report. In many situations several syntaxes are possible, if a valid syntax is found, this also by default is not listed. If you want to get a full report, add after the filename a second option -1 and, as this full report can be quite long, pipe it to a file like: tv_grab_test_json.py tv_grab_xx.com.json -1 > testreport.txt.

roletrans

"roletrans" you can leave basic to start with, adding values when you encounter them or you can already add some common translations in your language to the lists:

	"roletrans":{
		"director": ["director"],
		"actor": ["actor"],
		"guest": ["guest"],
		"writer": ["writer"],
		"composer": ["composer"],
		"presenter": ["presenter"],
		"reporter": ["reporter"],
		"commentator": ["commentator"],
		"adapter":[ "adapter"],
		"producer": ["producer"],
		"editor": ["editor"]},

cattrans

For "cattrans" you have to do some thinking. The process uses a two step translation with the second step being optional to the user. Both this table for the second step and the per source table in every sourcefile for the first step deliver defaults for setting up the user editable tables in tv_grab_xx_py.set. So they do not need to be complete. The tables in tv_grab_xx_py.set get updated on encountering new genre/subgenre sets, using "cattrans_unknown" for new, unknown genres found in the first step.
In general the cattrans tables in the source should translate source specific genre/subgenre sets to a common set of genres with either taking the subgenre from the source or also translating to common subgenres. The second step is next meant to translate to single genres that will be understood by the program used by the user. My set translates to English genres understood by MythTV. The lists contain sets of genre/subgenre or single genres to translate to the keywords. An empty subgenre means any subgenre will do. The last list ["", ""]under "Unknown" ensures that any undefined value gets translated to "Unknown".

	"cattrans": {
		"News":[
			["nieuws/actualiteiten", ""],
			["news", ""],
			["info", ""]],
		"Sports":[
			["sport", ""]],
		"Documentary":[
			["documentaire", ""],
			["info", "documentary"],
			["informatief", "documentaire"]],
		"Art/Music":[
			["muziek", ""],
			["magazine", "muziek"],
			["amusement", "muziekshow"],
			["amusement", "muziekprogramma"],
			["amusement", "dansprogramma"],
			["amusement", "variete"]],
		"Arts/Culture":[
			["amusement", "cabaret"],
			["amusement", "sketches"],
			["amusement", "stand-up comedy"],
			["amusement", "stand-up comedy, sketches"],
			["informatief, kunst en cultuur", ""],
			["kunst/cultuur", ""],
			["kunst en cultuur", ""]],
		"Talk":[
			["amusement", ""],
			["informatief, amusement", ""],
			["informatief", "praatprogramma"],
			["magazine", ""],
			["talks", ""],
			["talkshow", ""]],
		"Game":[
			["amusement", "quiz"],
			["amusement", "spelshow"]],
		"Children":[
			["jeugd", ""],
			["informatief", "jeugdprogramma"],
			["serie/soap", "jeugdserie"],
			["serie/soap", "animatieserie"],
			["serie/soap", "tekenfilmserie"]],
		"Educational":[
			["educatief", ""],
			["informatief", ""]],
		"Home/How-to":[
			["amusement", "klusprogramma"],
			["amusement", "hobbyprogramma"],
			["amusement", "lifestyleprogramma"],
			["amusement", "modeprogramma"]],
		"Cooking":[
			["magazine", "culinair"],
			["amusement", "kookprogramma"],
			["informatief, amusement", "kookprogramma"]],
		"Health":[
			["magazine", "medisch"],
			["informatief", "gezondheid"],
			["informatief", "fitnessprogramma"],
			["informatief", "gymnastiekprogramma"],
			["informatief", "medisch programma"],
			["informatief", "medisch praatprogramma"]],
		"Science/Nature":[
			["wetenschap", ""],
			["natuur", ""],
			["magazine", "natuur"],
			["magazine", "weten"],
			["info", "science"],
			["informatief, wetenschap", ""],
			["informatief", "wetenschappelijk programma"],
			["informatief", "techniek"]],
		"Bus./financial":[
			["magazine", "zaken"],
			["info", "business"]],
		"Religion":[
			["religieus", ""]],
		"Adult":[
			["amusement", "erotisch programma"]],
		"Film":[
			["film", ""],
			["korte film", ""]],
		"Reality":[
			["amusement", "realityserie"],
			["informatief", "docusoap"],
			["informatief", "realityprogramma"],
			["informatief", "realityserie"]],
		"Drama":[
			["serie/soap", ""],
			["serie/soap", "dramaserie"]],
		"Soap":[
			["serie/soap", "soap"]],
		"Comedy":[
			["amusement", "komedie"],
			["serie/soap", "comedyserie"],
			["serie/soap", "komedieserie"]],
		"Crime/Mystery":[
			["serie/soap", "detectiveserie"],
			["serie/soap", "misdaadserie"]],
		"Sci-fi/Fantasy":[
			["serie/soap", "sciencefictionserie"],
			["serie/soap", "fantasyserie"]],
		"Action":[
			["serie/soap", "actieserie"]],
		"Unknown":[
			["overige", ""],
			["", ""]]},

With the basic settings in place you can continue with creating your first sourcefile.

Integrating a source into the graber datafile

You start by adding a record to the "sources" dict in the grabber_datafile giving the source a sourceid. The sourceid must be an unique integer larger then 0 If you are not adding to an existing grabber or you have not jet created a grabber_datafile you should do now:

	"sources":{
    ...
		"3":{
			"json file":"source-example.com",
			"version":0,
    ...

If the url to download the file is different then the one set in "source-url", you can add a "json_url" to the record.
Next you add the channels to "source_channels" making it possible to link them to the channels from other sources:

	"source_channels":{
    ...
		"3":{"3-1": "1",
			"3-2": "2",
			"1-nickelodeon": "89",
    ...

The main-key is the sourceid you defined under "sources". The values in the contained dict are the "channelids" from this source with the keys being the chanid and normally also the xmltvid. If not defined here the chanid will default to the sourceid and the channelid concatenated with a hiven in between. By using a chanid from another source you link those channels from the two sources.
By adding the sourceid to "active_sources" you activate the source. You should also add the sourceid to "prime_source_order" and "sourceid_order". If missing in those last two the sourceid will be added automatically to the end. If missing in "sources" or "active_sources" it is removed from all others, deactivating the source.
If your source is a detail source, you'll have to add it to the "detail_sources" list:

	"active_sources":[7,3,5,9,6,8,2,4,10,11,12],
	"detail_sources":[3, 9, 1],
	"sourceid_order":[3, 9, 1, 2, 4, 7, 5, 6, 8, 10, 12, 11],
	"prime_source_order":[2,4,10,12,7,3,5,1,9,6,8,11],

Channel Names

The name of first source in the "sourceid_order" that has data about a channel is used for the name. Sometimes, for whatever reason, it gives an incorrect name. By adding a record in the "channel_rename" dict with the chanid as key, you can change this name. tv_grab_test_json.py will detect any undefined chanids.

	"channel_rename":{
		"0-7":"BBC One England",
		"0-8":"BBC Two",
		"0-9":"Das Erste",
		...

Prime Source

Every channel has a prime source. This is the source from which the data is first proccessed and that determines the start and stop times. There are four ways to define a prime source:

  • "prime_source_order"
    This is a list of all the sources with the most dependable at the start. If no prime source is defined for the group or the chanid itself, the first source in this list containing data for this channel will be the prime source.
  • "prime_source_groups" A dict with any of the channel groups as key and the prefered prime source as value. If a chanid is member of the group and has no prime source defined in either of the following two ways, the prime source for the group will be used.
  • "prime_source" Similar to the previous but here you use a chanid as key
  • By the user setting the prime source in his or her configuration file

tv_grab_test_json.py will detect any undefined sourceids, groupids or chanids.

Channel Grouping

If a groupid can not be extracted from one of its sources, you can add a chanid to a group under the "channel_grouping" keyword by adding it to the list contain under the groupid keyword. Any chanid without a groupid will be added to groupid 99.

"channel_grouping":{
	"1":["0-34","1-zappelin"],
	"2":["1-ketnet-canvas-2", "8-eenplus","8-zes"],
	"3":["1-cbeebies", "5-24443943080", "5-24443943013", "5-24443943051"],
	"4":[],
	...

Channel Logos

If a logo can not be extracted form any of the sources or you have a better logo source you can define it under the "logo_names" keyword. Add the chanid as key with a two part list containing the logoid and the logo name as value:

"logo_provider":{
	"101": "http://www.oorboekje.nl/img/logo/"},
"logo_names":{
	"--description--":["", "If no extension is provided .png will be assumed"],
	"0-1" : ["4", "npo1"],
	"0-2" : ["4", "npo2"],
	...

Return to Testing and integrating the Sourcefile to test the new "channels" data_def and continue with adding a "base" data_def

Other Settings

<under construction>