-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scope applications for Teamlyzer data #2
Comments
We can add a link pointing/embed the current url of fairly to increase visibility of this tool. Or maybe some type of integration since we are working on the same problem. There are also thousands of portuguese salaries shared in stackoverflow surveys or knowyourworth, the problem is the normalization as always. |
That's quite interesting, I have to take a look into the data, but eventually some type of integration would make sense. Also, the extra visibility would be appreciated :) |
Also, would you mind elaborating a bit more when you say the problem is the normalization? I imagine it's the fact that different datasets collect different variables, but would appreciate your feedback on the typical issues you face. |
yeah, each survey has a different structure like seniority, some surveys use years of experience, others senior, junior, middle, and so on. The same for role, a back-end golang developer earns much more than a back-end php developer , so convert both to "back-end developer" will ignore this type of details. And from my experience all datasets needs always some manual validation especially surveys with open fields to check potential fake data like junior | 150k | lisbon |
Thanks, that's great info! I think a a lot of those are very interesting machine learning challenges so I'm quite excited to try and tackle them :) For example, it's possible to formulate a modelling strategy that can both leverage data wich only specifies back end as well as data that specifies the languages/frameworks. Some outlier detection can also help to detect those types of fake cases, not necessarily automatically, but at least make them stand out and then a human can just confirm if it's bad data or not. |
In that case I think you need some Named Entity Recognition framework. Maybe this paper can be helpful. These guys are doing an awesome work with NER https://www.glasssquid.io/try-analyze |
Teamlyzer has an open database with several open datasets of salaries in Portugal (not only tech). It would be interesting scope how this data can be used in Fairly. It can be used simply to train the model with more data, or it can be used to expand the coverage of job types besides tech.
The text was updated successfully, but these errors were encountered: