GitHub - imranece59/overbond-sde

Problem Statement:

Given a JSON file which will define an array of bond objects (of arbitrary size), write a command line tool to calculate the spread between each corporate bond and the nearest government bond benchmark, save these results in a JSON file, and express the spread in basis points, or bps. If any properties are missing from a bond object, do not include it in your calculations or output.

Spread is defined as the difference between the yield of a corporate bond and its government bond benchmark.

A government bond is a good benchmark if it is as close as possible to the corporate bond in terms of years to maturity, also known as term or tenor.

If there is a tie for closest government bond by tenor, break the tie by choosing the government bond with the largest amount outstanding.

To convert your difference to basis points, just scale your spread by 100 and display as an integer (truncate trailing decimals), e.g. if your spread comes out to 2.127, this will be expressed in your output file as "212 bps".

Sample input { "data": [ { "id": "c1", "type": "corporate", "tenor": "10.3 years", "yield": "5.30%", "amount_outstanding": 1200000 }, { "id": "c1", "type": "corporate", "tenor": "10.3 years", "yield": "6.50%", "amount_outstanding": 1300000 }, { "id": "g1", "type": "government", "tenor": "9.4 years", "yield": "3.70%", "amount_outstanding": 2500000 }, { "id": "c2", "type": "corporate", "tenor": "13.5 years", "yield": null, "amount_outstanding": 1100000 }, { "id": "g2", "type": "government", "tenor": "12.0 years", "yield": "4.80%", "amount_outstanding": 1750000 } ] }

Sample output { "data": [ { "corporate_bond_id": "c1", "government_bond_id": "g1", "spread_to_benchmark": "280 bps" } ] } Explanation Each output object in the list represents a pairing of one corporate bond to its closest government bond benchmark, and the spread between their yields.

The best benchmark for bond c1 is g1, since the absolute difference in their terms (|10.3 - 9.4|) is only 0.9, but comparing c1and g2 gets you 1.7. The spread is calculated as simply the corporate yield - government yield, you would obtain 5.30 - 3.70 = 1.60, which you must represent in basis points as "160 bps".

The bond c2 is not included in the output because it is missing a property, yield. If any properties are missing from a bond object, do not include it in the calculation and output. You may assume you will always have at least one valid government bond and at least one valid corporate bond, for all inputs.

software requirements Spark 2.2 Scala 2.11 hadoop 2.7+ sbt

Design Approach:

Load the json file using spark read json. make sure to enable multiline to true while reading
Flatten the rows in the below format

+---+----------+----------+-----+------------------+ |id |type |tenor |yield|amount_outstanding| +---+----------+----------+-----+------------------+ |c1 |corporate |10.3 years|5.30%|1200000 | |c1 |corporate |10.3 years|6.50%|1300000 | |g1 |government|9.4 years |3.70%|2500000 | |c2 |corporate |13.5 years|null |1100000 | |g2 |government|12.0 years|4.80%|1750000 | +---+----------+----------+-----+------------------+

drop out the records where any of the property is null for bonds and separate the corporate & government bonds to different dataframe. Then calculate the absolute difference in terms (if tie comes, take the record which has largest_outstanding amount)

refer:- calculateBestbenchMarkDf method in ProcessDataHelper class

Calculate the spread and convert the dataframe to json dataset. Then write the final dataset to output path

Running

sbt "runMain demo.common.BondSpreadAnalysis --input-file-path=C:\exam\sde-test\sample_input.json --output-file-path=file:///C:/exam/sde-test/output.json --master=local[*]"

Sample Output

|value | +-------------------------------------------------------------------------------------------------+ |{"data":[{"corporate_bond_id":"c1","government_bond_id":"g1","spread_to_benchmark":"280.0 bps"}]}| +-------------------------------------------------------------------------------------------------+

![alt tag]https://github.com/imranece59/overbond-sde/blob/master/Capture.PNG

Eclipse Build

Git clone to project
run sbt eclipse from the project folder
import the project into ScalaIde/IntelliJ

Notes

if you face any error while writing the final dataset to output path you must be missing the correct hadoop version or setting winutils required for hadoop write
You dont need winutils setup if you run from EMR or cluster without master as local[*]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
project		project
src/main		src/main
Capture.PNG		Capture.PNG
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

imranece59/overbond-sde

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages