According to Gartner, the search for skilled IT staff is global, with some organizations defining nearly a quarter of their workforce as “cross-border talent,” meaning they source talent overseas. It is said that it is based there.
But casting such a wide net makes it difficult to find the talent you want and, importantly, to hire them quickly, reliably, and safely.
This is a job built for machine intelligence, finding and analyzing people with different skills and expectations, and matching them to an infinite number of job openings, governed by a host of requirements.
Both people and titles lack anything close to a common syntax. Add to this the challenges of language and time zones.
This is a huge data analysis job, but we built our AI-driven recruitment and recruitment platform, Andela Talent Cloud (ATC), without using large-scale language models (LLMs). In fact, we took ChatGPT out of our pipeline and built a system based on structured data. Our engineers have developed a taxonomy specific to the nuances of the hiring process.
engine disassembly
Our talent decision engine (TDE) uses AI and data-driven matching algorithms to match talent and positions, analyzing thousands of data points, from skills and experience to location and language proficiency. . This engine is provided by a number of related services. The Talent Response Service prioritizes talent most likely to respond immediately. A recommendation engine that matches and ranks people based on their overall suitability for a role. AutoMatch optimizes for as many successful matches as possible while avoiding bidding wars.
TDE provides high-quality matching that ChatGPT could not provide. First of all, LLM does not handle tabular data well and can struggle to extract meaningful insights from such data representations. ChatGPT also lacks proper number processing power. The LLM deals primarily with textual data and relationships between structured data that includes numbers, such as how well a talent's time zone and working hours fit into multiple job requirements, each with different time zones and unique minimum working hours. may not be possible to guess.
Additionally, LLMs face explainability challenges that are important for decision making. Although LLM can produce text output, it is difficult to understand the reasoning behind its predictions on structured data, which is a major drawback when compared to tabular data-focused techniques such as XGBoost. For example, you can easily explain your predictions using SHAP (Shapley Additive exPlanations values).
A fourth drawback is that LLMs typically have limited context windows. This means that only a fixed number of preceding tokens can be considered when generating text. This limitation makes it difficult to understand long-range dependencies and complex relationships that exist in structured data.
These are just four reasons why we chose not to use LLM for the tabular problems we faced. Essentially, in such a scenario, compared to specialized models explicitly designed for structured data processing, such as graph neural networks and traditional machine learning algorithms such as decision trees and support vector machines, unable to perform effectively or efficiently.
As a result, we created a model based on tabular data that respects a structured taxonomy to avoid this problem. Our AI-driven approach models the unique elements of our business domain. For example, categorize skills based on need, such as required, required, or optional, to fine-tune the automated matching process. Next, analyze the match between talent skills and job requirements.
We then carefully align other role prerequisites with the characteristics of our talent pool, including considerations such as time constraints and time zone compatibility, prior experience, and the candidate's desired role. Evaluate with care. Additionally, our methodology incorporates robust protocols essential for talent curation and selection. Annotate the database essential to maintaining your training pipeline. Such meticulous steps are of paramount importance, especially when faced with occupations characterized by a lack of data.
Another challenge was quality. How do we ensure that ATC's algorithms find the best candidates? We worked with recruiters and talent matchers to determine what qualities they should be looking for. . It helped develop the so-called 'Match Fitness Score' using information on more than 10,000 individuals and around 800 skills. Scores are based on 50 attributes including time zone, years of experience, skills, and salary.
Handling incomplete data
Building a reliable match fitness score means you also need to overcome holes in people's profiles, or lack of important data. For example, some people aren't clear about what they want to earn, which is important for both matching talent and setting rates that meet the customer's budget expectations. In this particular case, we developed a talent rate recommendation service that identifies people with similar skills and then generates an approximate amount of talent that person wants based on those skills.
Building and refining the various components of ATC required the use of various techniques, including dimensionality reduction, word embeddings, one-hot encoding, and data standardization. Often, several techniques are used to compare the results and choose the most effective approach to solve one specific problem.
We've discovered that there is no shortage of useful machine learning techniques and methodologies that can be leveraged when solving technology problems. The real challenge was to ensure that project participants had complete visibility and a clear understanding of the business and processes involved in recruitment. There are so many nuances that getting the details wrong can lead to flawed search results.
To bring Match Fitness to customer-facing applications, we developed Extensible Recommendation Service (ERS), a Python-based framework designed to provide endpoints to assess the suitability of talent for various job roles. . Through ERS, our customer-facing applications gain insights including skill-based fit, talent response likelihood, estimated talent rates, and more. This enables our platform to efficiently identify, engage, and start conversations with the most suitable candidates for each role.
What's next? LLM will be included in the product roadmap, but will likely support ATC rather than be the foundation. For example, we are considering combining LLMs with talent profiles in areas such as skills to improve candidate presentation to potential recruiters. We also use LLM to parse skill job descriptions and map them to taxonomies to simplify the job creation process.
Generate insights from structured data
LLMs are getting a lot of attention. While many people are rushing to upskill, it's worth considering that success with AI comes less from the technology and more from understanding the business problem you're trying to solve. Based on our experience, for other industries to generate insights from their data, they need to:
- recognize your limits Using language models (LLMs) like ChatGPT in processing structured data. You can develop highly accurate, explainable models by investing in specialized models explicitly designed for structured data processing, such as traditional machine learning algorithms such as decision trees.
- Understand business nuances: Enables project participants to fully understand and articulate the business and processes involved in domain-specific tasks. Pay attention to nuances, as even the smallest details can have a big impact on the effectiveness of your data-driven solution.
- devise a strategy To address data quality issues, the possibility of developing structured taxonomies related to business domains, etc. This allows you to generate new insightful data types, such as categorical information that is noisy, missing, or incomplete in its original text format. Good examples of this in our field include job duties, skills, and spoken language. By properly extracting and combining them, you can build more powerful machine learning models.
- Adopt a smaller model To estimate missing critical information and feed it into other related models or services. In our domain, we did this, for example, to estimate human resource characteristics such as reactivity and rate.
- Work closely with domain experts Identify the relevant qualities needed to develop a scoring mechanism such as Match Fitness Score. Incorporate human expertise to overcome data gaps and ensure functional relevance for algorithmic decisions.
- Explore different machine learning techniques and techniques Solve a specific problem and compare the results to choose the most effective approach.
By incorporating these strategies, industries looking to leverage structured data for insights can develop robust data-driven solutions tailored to their specific needs and improve decision-making processes and outcomes.
YouTube.COM/THENEWSTACK
Technology moves fast, so don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more.
subscribe