Scope applications for Teamlyzer data #2

TSFelg · 2021-04-08T19:18:07Z

Teamlyzer has an open database with several open datasets of salaries in Portugal (not only tech). It would be interesting scope how this data can be used in Fairly. It can be used simply to train the model with more data, or it can be used to expand the coverage of job types besides tech.

ghost · 2021-04-09T03:11:43Z

We can add a link pointing/embed the current url of fairly to increase visibility of this tool. Or maybe some type of integration since we are working on the same problem.

There are also thousands of portuguese salaries shared in stackoverflow surveys or knowyourworth, the problem is the normalization as always.

TSFelg · 2021-04-09T10:18:58Z

That's quite interesting, I have to take a look into the data, but eventually some type of integration would make sense.

Also, the extra visibility would be appreciated :)

TSFelg · 2021-04-09T10:30:39Z

Also, would you mind elaborating a bit more when you say the problem is the normalization? I imagine it's the fact that different datasets collect different variables, but would appreciate your feedback on the typical issues you face.

ghost · 2021-04-09T15:04:13Z

Also, would you mind elaborating a bit more when you say the problem is the normalization? I imagine it's the fact that different datasets collect different variables, but would appreciate your feedback on the typical issues you face.

yeah, each survey has a different structure like seniority, some surveys use years of experience, others senior, junior, middle, and so on.

The same for role, a back-end golang developer earns much more than a back-end php developer , so convert both to "back-end developer" will ignore this type of details.

And from my experience all datasets needs always some manual validation especially surveys with open fields to check potential fake data like junior | 150k | lisbon

TSFelg · 2021-04-09T15:23:50Z

Thanks, that's great info! I think a a lot of those are very interesting machine learning challenges so I'm quite excited to try and tackle them :)

For example, it's possible to formulate a modelling strategy that can both leverage data wich only specifies back end as well as data that specifies the languages/frameworks. Some outlier detection can also help to detect those types of fake cases, not necessarily automatically, but at least make them stand out and then a human can just confirm if it's bad data or not.

ghost · 2021-04-09T15:43:38Z

modelling strategy that can both leverage data wich only specifies back end as well as data that specifies the languages/frameworks

In that case I think you need some Named Entity Recognition framework. Maybe this paper can be helpful.

These guys are doing an awesome work with NER https://www.glasssquid.io/try-analyze

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope applications for Teamlyzer data #2

Scope applications for Teamlyzer data #2

TSFelg commented Apr 8, 2021

ghost commented Apr 9, 2021 •

edited by ghost

Loading

TSFelg commented Apr 9, 2021

TSFelg commented Apr 9, 2021

ghost commented Apr 9, 2021 •

edited by ghost

Loading

TSFelg commented Apr 9, 2021

ghost commented Apr 9, 2021 •

edited by ghost

Loading

Scope applications for Teamlyzer data #2

Scope applications for Teamlyzer data #2

Comments

TSFelg commented Apr 8, 2021

ghost commented Apr 9, 2021 • edited by ghost Loading

TSFelg commented Apr 9, 2021

TSFelg commented Apr 9, 2021

ghost commented Apr 9, 2021 • edited by ghost Loading

TSFelg commented Apr 9, 2021

ghost commented Apr 9, 2021 • edited by ghost Loading

ghost commented Apr 9, 2021 •

edited by ghost

Loading

ghost commented Apr 9, 2021 •

edited by ghost

Loading

ghost commented Apr 9, 2021 •

edited by ghost

Loading