Showing posts with label People Analytics. Show all posts
Showing posts with label People Analytics. Show all posts

Thursday, March 9, 2023

What the ETL....

We have a problem...

As part of our HR dashboard development we are putting together a time-to-fill (TTF) dashboard.  Time to fill is a measure of how long it takes to fill a position.  For our purposes, we are breaking it down to help identify where there may be bottlenecks in our process.  Specifically we are looking at the following intervals:

  • Requisition Receipt in HR to Posting
    • This is the date that the requisition is received in HR.  This may have been through a formal approval process or is not required to go through the formal approval process.  Either flow requires a requisition to be completed.  The requisition provides HR with the pertinent information required to be able to start the recruiting process.  This is measuring the time in days between when HR receives the requisition in the ERP until it is posted in our applicant tracking system (ATS).
  • Posting to Candidate Selection
    • The starting point of this measurement is when the recruiter enters the data received in the previous step and ends when the recruiter sets the status to candidate selection to indicate that the requesting department is screening and interviewing candidates for the position.  Some positions may take longer due to factors such as availability of talent in the market or internal posting requirements.
  • Candidate Selection to Offer
    • Candidate selection includes any resume screening, phone screens, interviews, etc... this process continues until there is an accepted offer.  The offer must be accepted because if we stopped this stage when an offer is made, someone may not take the offer, resulting in days or weeks being added to this step in the process.
  • Offer to Start
    • This phase remains largely outside of the organization's control.  While the organization can establish time limits on tasks such as drug testing, much of the process is within the candidate's hands.  Some candidates might get their paperwork completed and documentation submitted quickly while others will move much slower.  Some candidate want to give their current employer a longer notice period than others who may not need to give notice.  If the orientation session is full, we may be waiting an additional week for the next orientation session.
  • Time-To-Fill (TTF) is calculated from when HR receives the requisition until the offer letter is signed.
  • Time-To-Start (TTS) is calculated from when the HR receives the requisition until the employee reports to orientation.

This is great but... what is the problem?

We have a few factors working against us here.

  • Data is an afterthought.  Not only was what we wanted to calculate an afterthought, but so was what we would consider in the calculation, where the data is located, how it is stored, and how do we get it someplace where we can work with it.  In talking with my colleagues in our Data Professionals group, this is a common scenario.
  • Pooling of positions helps save the recruiting team time, but is makes the data messy.  Pooling is the process of combining several jobs or requisitions under a single posting.  This is great for recruiting because it makes the process of posting a position much quicker as they don't need to do as much work for the pooled position.  It is also nicer for applicants because they don't have to see that we might have 20 positions posted for a single title.  It is a great process and I highly support it, but it means that we have to do some extra work with our data.
  • Lack of consistency in our input.  Just because we use a number, doesn't mean it is is a number.  In an of itself, this is fine.  Unless we need to calculate a value, it is best to consider using a string rather than a number.  Strings made of numbers are great identifiers.  It is okay if we consistently use a letter to identify that a job is posted to an agency, but people aren't always consistent resulting in that agency identifier to be missing and our posting number then looks like any other posting number that isn't posted to an agency.  Considering that the field isn't a numeric field, recruiters sometimes decide that it would be helpful to have a note in the field rather than the requisition number.  Oh people, I love them.
  • Some of our data is in this system and some of it is in that system and we didn't pay to have the systems talk.  Of course, the API documentation doesn't indicate that all the data that we need is available in the API, either.  The report writer in the ATS appears to have a lot of data available that isn't available in the API.

Time for the duct tape an bubble gum, what the ETL!

Not getting the data from one system to another is not an option.  We need to figure out how to Extract, Transform, and Load (ETL) the data.  The data between the two systems are not compatible in their basic forms and need to be massaged.  This massaging process includes extracting it from the ATS, transforming it to be compatible with the data in our ERP, and then loading it into the ERP so that we can access it in Tableau.  We're still working through this process but are nearly there and I love sharing the challenges and learnings that I have had in the process.

Extraction 

Before we could even extract the data, we had to define what it is that we want.  As good as the report writer is, the way that the data is organized isn't particularly intuitive and the ATS documentation doesn't dig very deep into what data is stored where or what the data means.  As we are extracting a report that we are building, there was a lot of back and forth looking at the postings and the available attributes in the report writer.  After a lot of moving fields, verifying they are what I think they are, and more verification, we arrived at a draft report for export.  Of course, after exporting it, I realize that there was way too much data to run it through PowerQuery in Excel - so back to the report to set up some filters to limit the data and then back to Excel.  After a few of these back and forths, I was able to generate a report that was small enough to be reasonably transformed in PowerQuery in Excel.

Transformation

What can I say, the data is messy - much more messy than I anticipated.  I know enough about PowerQuery to be dangerous and I set off to make the data useful.  The biggest problem that I face is that the requisition system in our ERP generates a requisition number that is not a direct match for what is in our ATS.  The ERP's number is system generated and consistent, the ATS's is entered with human hands.  As previously mentioned, sometimes we have a number, sometimes we don't and sometimes we have additional characters or notes where the requisition identifier should be.  Let's take these one at a time.

  1. When we don't have a number for our individual position and in some cases we have a 1, I know the number is the posting number.  While consistency in procedure would be nice, we live in the real world and people are not as consistent as computers.  A simple formula helps us with this.  Creating a new column called FinalReqNo, we check to see if position number is null or 1, if it is, we copy the posting number to the new column, if not, we copy the existing value to the new column.
  2. When we have a number, it is usually straight forward and we should be able to use that number.  It should be a five digit number greater than 60000.  I hate to drop too many lines, but we have a lot of requisitions to have what I hope to be a statistically representative sample size.  To make it easy, I split the column at five characters.  I want to keep this data so that we can go back and look at what is being cut off and inform the recruiters about the records that we need to fix.
  3. At this point, running the steps in this order, we should have stripped out any text.  This lets me continue with the process and, if we get errors loading the data into Excel, I go back to see what the errors are and either ask the recruiters to fix the source data or tweak the formula to handle the errors.

Beyond this, I have procedures to reorder the columns and set data types.  Fairly simple tasks that help me when staring for this data for hours at a time.

Load

Finally, we can load the data.  If all goes well, the data will load without errors.  If we have errors, as mentioned above, I examine the errors and request the source data to be fixed or adjust the transformation steps to accommodate the errors.  At this point, I repeat the process with out ERP data but the ERP data is our source of truth when it comes to the requisition numbers so the work not so intensive and the process is largely changing some formatting.  I still run it through PowerQuery to ensure consistency in the data as I am repeating this process manually for the time being.

For now, I am merging the data in Excel using Excel's Data Model feature and creating a pivot table.  So far this is working but this is not the end game.  The end game is getting the data into Tableau with automated processes.  This ETL process serves as a blueprint for the development team.  As an analyst, I try to make "code-ready" projects to turn over to our developers.  This seems to make them happy as it is not what they usually get, but if they can develop my stuff quickly, it is likely that I can get priority in the development queue because my projects are quicker to develop.

Next steps

Unfortunately, our ATS is lacking in some desired functionality so we are stuck with duct tape, bubble gum, and bailing wire to hold things together.  While we would love to have an API or SFTP to be able to transfer this data, we are not so lucky and have to rely on the ATS delivering via an email schedule.  Once we get this process rolling and are happy with our results, we may have the ATS provider write us a query but that costs quite a bit of money and we don't want to pull the trigger on such an expense until we are happy with the data in our dashboard.

The email delivery also posed a problem because I had a table calculation, so I am rewriting some of the process to happen in a Tableau data source vs the ATS.  I am happy, however, to get to the point that we are at.  This will allow us to finish the development of our first three scorecards that focus on staffing.



Saturday, February 25, 2023

Meet my new friend Tableau

I'd like to introduce you to my new friend, Tableau.  Why am I talking about Tableau and not the EAC and IHRSM projects?  Its quite simple, as of today, there is nothing new to publicly update about our ECA and IHRSM implementation projects although there is some background action.  It is really exciting to see these project moving ahead as they will be taking our institution to a new level and the improvements to the employee experience should be tremendous and the experience for the people who work in HR and Payroll should be greatly improved, too.

 

My new friend Tableau... a little background information.

Our institution has been trying to move to a more data-centric organization.  Every department works at it's own speed, given its own needs, talents, and abilities.  The institution, as a whole, has seen enough value in being data-centric that we have invested in Tableau and a small team that really leads the charge to build dashboards for various departments.  Better yet, they are empowering departments to build their own dashboards so when I learned about that, I jumped at the opportunity to take their Tableau training.  Unfortunately, I couldn't develop the interest in my previous department to take on a dashboard project.

For the uninitiated, Tableau, according to their website, is "a visual analytics platform transforming the way we use data to solve problems—empowering people and organizations to make the most of their data."[1]

I have always been fascinated with HR metrics and have pushed companies that I have worked for to pay more attention to what the HR numbers are saying.  My typical experience is that any HR numbers that were published were to serve a financial or basic operational need, not to tell the story of what is happening with our workforce.  When the opportunity came up I volunteered and, wow, have I learned a lot along the way.

Through a series of meetings, we decided what we wanted to display.  Using Tableau wasn't off my radar, but I have done similar projects using Excel and they were typically for my own use to be able to talk about where we stand within our HR practice.  As I have done a lot of this with Excel, it was a logical starting place for me.... big mistake.  While I won't say that Excel can't handle large data sets, these are the largest data sets that I have ever worked with and Excel was sluggish, at best, to process them.  Stripping down the data sets was eliminating information that I thought I might need as we refined our requirements.  Given these struggles, I took in upon myself to start developing the dashboards in Tableau.

 

Greetings and Farewells

Our first dashboard is called "Greetings and Farewells" and presents a basic look at:

  1. Hires - last 13 months and last seven years
  2. Separations - last 13 months and last seven years
  3. Separation reasons for the last 13 months

 This is all fairly straight forward information and can easily be compiled by any organization to show if the recruiting function is keeping up with people leaving the company... after all, nobody will be staying infinitely.

As Tableau provides a reactive display, I encourage you to visit the original on the Tableau Public website to display the dashboard as I originally intended.

 

For this blog, I used a data set that I found on Kaggle.com as using my institution's data wouldn't be appropriate for this purpose.  

 

How does this differ from the real data?

Given that I am using a different data set, there are some differences in how I had to display this data.  Primarily, this data is static.  It will not change.  In real life, this data set will continue to grow.  I am lazy efficient and want to spend as little time as possible maintaining reports.  In both cases I set up a filter to limit the display to the previous seven years and the last 13 months.  With live data, we needed to stop the year display at the end of the preceding year for the annual chart and the preceding month for the monthly display.  This is established with a couple of calculated fields that were applied to the filters.

On our institutional copy, we have additional filters.  The departmental filter matters less and we are much more interested in the relation of this data to our executives, so our filters work up those hierarchies rather than a more traditional department relation.

 

Creating a data source

The Kaggle data set was also a little easier to use in that I didn't have to create it.  Our HRIS is homegrown and the ability to identify hire and term dates is a little more complex, particularly as you consider the multi-employer, non-employees, students, etc... that exist in the live data.  To create our Tableau data source I wrote a few queries in SQL, hitting multiple tables to capture the hire, separation, and promotion data, applying a category via a case statement for each type and a common set of attributes, and a union all between each query to create one final query to pull the data together.

Once the team was satisfied with the information, we created materialized views that we could then link to Tableau as a data source.  Ultimately, we are adding several queries to this data source to represent the employment life cycle, which I will discuss more in a future post.



[1] https://www.tableau.com/why-tableau/what-is-tableau

 

Sunday, February 5, 2023

The journey has begun...

Reason for this blog’s existence:

My original thought was to chronicle my journey into the world of computer and information science.  While deciding what to take during the spring 2023 semester, I started think about some of the projects that we have going on at work.  As there are fairly major projects impacting over 10,000 employees and contractors, I thought they would make an excellent educational opportunity.

This blog will include information about the implementation of two large software implementation projects plus some of my other work and random thoughts about things that interest me.

The intent is to structure the blog with the following sections:
  1. Background information
  2. Software Implementation Project 1 - ECA
  3. Software Implementation Project 2 - IHRSM
  4. HR Metrics and Analytics
  5. Tableau
  6. Other areas of interest
While I have never blogged or journaled before, I expect these aspects to be as much of the learning journey as I do the actual software implementation projects.  Furthermore, as Dr. Covey explains in his book, The 7 Habits of Highly Effective People, one way to get better at something is to teach it.  I expect that, as I explain things in this blog, I will gain a deeper knowledge of the work beyond where I would be if I was only performing the work.

I hope those who may be reading along find my journey as rewarding as I expect the journey to be.

Followers