Skip to content

MaksKhan/TGT-hack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

TGT-hack

⌛ Task

Most of the hydrocarbons we extract from the ground. In the production of hydrocarbons, the appearance of sand is very dangerous because it can cause expensive repairs. In order to save company money and time, we had to classify audio data in order to find which recorded sand falling on the pipe.

💰 The first place prize was 150.000 rubles. Second and third got 100.000 and 50.000 rubles.

💾 Data

Audio data were recorded in laboratory and already presented as csv file. Grains of sand had different size, also there was different speed of wind. Some grains were mixed with gases or water. Recording time was different. Values were normalized from -1 to 1 and audio sampling rate was 117.2 kHz.

🔍 EDA

💡 First of all, I found that the appearance of sand was mostly at the start of the recording. Most of records without sand had uniform amplitude and period. However, records with sand were with high volume at the beginning (fall of sand) and quiet at the other parts.

🔉 Here are the spectrograms

(second example is with the fall of sand and first is without)

image

🍭 New features

That's why I created some features, which were showing whether there was a difference of amplitudes and whether it was at the beginning. And of course I added important features like mean, median, module sums and others

Here is their corellation

image

🏆 As I saw later, 8 of 10 most important features for models were created by myself

Class distribution

image

As you can see, the count of target class was 6 times smaller. I solved this problem using library NearMiss which deletes similar samples of majoritarian class. However, some models I trained with all of the data.

Models

I used 😼CatBoost, 🦄LightGBM and 👽XGBoost

🎯 The best scores (~0.95 f1) using cross validation after tuning all of the models with GridSearch were obtained by CatBoost and XGBoost.

However, CatBoost was finding records with sand better than XGBoost (see the consusion matrix). That's why I decided to stack them and Catboost got more weight (0.6).

image image

📣 Result

Public score was 0.96433.

This solution is 16 out of more than 100.

About

Final solution to TGT hack

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published