Sunday, March 19, 2023

Oh Tableau, I want to comment my code... my mind is like a steel siv!

Commenting code

I like to comment my code.  My professors made it a point to ensure that everyone understood the importance of comments in my code and I have tried to faithfully apply what I have learned.  Unfortunately, when we drift to applications like Tableau, there really isn't a great mechanism to make comments that wouldn't be visible to the end user of the dashboard.  To be fair to Tableau, Excel doesn't do this well, and I suspect that PowerBI doesn't do it either.  While we could make comments in our underlying SQL, I'm trying to get out as many records as I possibly can so that we can have it available in our data sources for the unexpected future needs.

Our data sources are being pulled directly from Oracle.  They start my my initial generation of a query based on an entity-relation diagram that I created to provide information the looks like what you would typically find in a report writer in an HRIS system.  These queries are then turned over to our IT team who then create a view, and then a materialized view which Tableau is linked to.  As much as I wish I could control the entire process, the folks that I work with are amazing, talented, and extraordinarily responsive.  I am fortunate to work with these people because I am always learning new things from them.

The "code review"

While our HRIS team is composed of four people and a couple vacancies, currently it is only my supervisor and I who have the technical skills.  I rely on my supervisor to help with the initial reality check on our dashboards.  While we were going through some of my work, we were noticing that some of the numbers weren't matching up and I was having problems understanding and explaining why in the moment.  One of the dashboards was developed months ago with some ongoing modifications to both the underlying query and the filters have been applied in Tableau made it impossible for me to explain, with certainty, the exact changes needed to be able to run a query against it to get the same information that was shown in the Tableau dashboard.

The magic of commenting code lets you know why you made specific decisions.  You might want to know days, weeks, months, or even years after you wrote your code why your code does something.  Just because I know why I applied certain filters in Tableau now, doesn’t mean that I will remember three months down the road.  Beyond a simple conversation with my supervisor, with whom I have a great relationship, I would hate to put my Chief Human Resource Officer (CHRO) in a position where he was having a hard time defending the numbers.

The old pen and paper

This problem isn’t insurmountable.  We have the old ways of doing things.  I try to be good about keeping notes pertaining to how things are done.  Frequently, this is in a paper notebook.  Compared to code comments, this is far from ideal.  With code comments, they are concise and available where you need them.  This is never the case in my paper notebooks.  My paper notebooks don’t let me remove the comment command characters to try the original code.  While I understand that having inline comments in Tableau may be challenging from a development standpoint, it is a big miss from my perspective.

The errors

Outside of the “code review”, I was able to identify some of the errors or issue that we were coming up with.  It didn’t take a particularly long time, but it definitely took longer that I would have liked to while I was in a meeting.  The answer was pretty clear once I found it in my notes and setting up a method to validate it was a simple process.  It was a filter on five job titles but in self-imposed pressure of the meeting, I could not come up with a mechanism to verify it.  Past the meeting, I could easily identify it, look up the correct job title codes, and create a simple query to share with my supervisor to give him the correct answer.

The code review was also helpful for identifying some other issues that I had.  Sometimes it is nice to just chat about the work and get perspective from another set of eyes.  Errors might be dumb but because you made them, you might not see what is right before your eyes.  An example of this revolved around a couple formulas on my dashboard that were giving me the same result.  There was no reason for them to be the same result and I checked them a couple times, getting the same result every time.  After our meeting, I was able to check it again and I must have been clicking on the wrong formula because there it way, front and center, my formula was clearly referencing the wrong attribute in the relation.

Feedback is a gift

As an HR Manager, I have told people this very phrase for years.  Feedback is a gift.  Feedback allows you to learn the perspective of someone else.  If offers you a unique insight that you may not otherwise have.  Right or wrong matters not, it is an opinion.  You have a choice on what to do with this feedback.  I value these “code reviews” for the feedback gained.  I have learned a lot through these exercises.  Sometimes it is a matter of someone asking me the questions that I haven’t asked myself.  In the end, regardless of agreeing or not, I find the feedback extraordinarily valuable.

Feedback for the Tableau development team

Tableau development team, if I may, the ability to have comment my dashboards so that I can remember what decisions I made, when I made the, and why, would be very helpful.  I am sure that I am far from the only user who would appreciate this feature.  If it does exists, please let me know where because none of the Tableau experts at my institution were aware of such a feature in your otherwise amazing product.

Thursday, March 9, 2023

What the ETL....

We have a problem...

As part of our HR dashboard development we are putting together a time-to-fill (TTF) dashboard.  Time to fill is a measure of how long it takes to fill a position.  For our purposes, we are breaking it down to help identify where there may be bottlenecks in our process.  Specifically we are looking at the following intervals:

  • Requisition Receipt in HR to Posting
    • This is the date that the requisition is received in HR.  This may have been through a formal approval process or is not required to go through the formal approval process.  Either flow requires a requisition to be completed.  The requisition provides HR with the pertinent information required to be able to start the recruiting process.  This is measuring the time in days between when HR receives the requisition in the ERP until it is posted in our applicant tracking system (ATS).
  • Posting to Candidate Selection
    • The starting point of this measurement is when the recruiter enters the data received in the previous step and ends when the recruiter sets the status to candidate selection to indicate that the requesting department is screening and interviewing candidates for the position.  Some positions may take longer due to factors such as availability of talent in the market or internal posting requirements.
  • Candidate Selection to Offer
    • Candidate selection includes any resume screening, phone screens, interviews, etc... this process continues until there is an accepted offer.  The offer must be accepted because if we stopped this stage when an offer is made, someone may not take the offer, resulting in days or weeks being added to this step in the process.
  • Offer to Start
    • This phase remains largely outside of the organization's control.  While the organization can establish time limits on tasks such as drug testing, much of the process is within the candidate's hands.  Some candidates might get their paperwork completed and documentation submitted quickly while others will move much slower.  Some candidate want to give their current employer a longer notice period than others who may not need to give notice.  If the orientation session is full, we may be waiting an additional week for the next orientation session.
  • Time-To-Fill (TTF) is calculated from when HR receives the requisition until the offer letter is signed.
  • Time-To-Start (TTS) is calculated from when the HR receives the requisition until the employee reports to orientation.

This is great but... what is the problem?

We have a few factors working against us here.

  • Data is an afterthought.  Not only was what we wanted to calculate an afterthought, but so was what we would consider in the calculation, where the data is located, how it is stored, and how do we get it someplace where we can work with it.  In talking with my colleagues in our Data Professionals group, this is a common scenario.
  • Pooling of positions helps save the recruiting team time, but is makes the data messy.  Pooling is the process of combining several jobs or requisitions under a single posting.  This is great for recruiting because it makes the process of posting a position much quicker as they don't need to do as much work for the pooled position.  It is also nicer for applicants because they don't have to see that we might have 20 positions posted for a single title.  It is a great process and I highly support it, but it means that we have to do some extra work with our data.
  • Lack of consistency in our input.  Just because we use a number, doesn't mean it is is a number.  In an of itself, this is fine.  Unless we need to calculate a value, it is best to consider using a string rather than a number.  Strings made of numbers are great identifiers.  It is okay if we consistently use a letter to identify that a job is posted to an agency, but people aren't always consistent resulting in that agency identifier to be missing and our posting number then looks like any other posting number that isn't posted to an agency.  Considering that the field isn't a numeric field, recruiters sometimes decide that it would be helpful to have a note in the field rather than the requisition number.  Oh people, I love them.
  • Some of our data is in this system and some of it is in that system and we didn't pay to have the systems talk.  Of course, the API documentation doesn't indicate that all the data that we need is available in the API, either.  The report writer in the ATS appears to have a lot of data available that isn't available in the API.

Time for the duct tape an bubble gum, what the ETL!

Not getting the data from one system to another is not an option.  We need to figure out how to Extract, Transform, and Load (ETL) the data.  The data between the two systems are not compatible in their basic forms and need to be massaged.  This massaging process includes extracting it from the ATS, transforming it to be compatible with the data in our ERP, and then loading it into the ERP so that we can access it in Tableau.  We're still working through this process but are nearly there and I love sharing the challenges and learnings that I have had in the process.

Extraction 

Before we could even extract the data, we had to define what it is that we want.  As good as the report writer is, the way that the data is organized isn't particularly intuitive and the ATS documentation doesn't dig very deep into what data is stored where or what the data means.  As we are extracting a report that we are building, there was a lot of back and forth looking at the postings and the available attributes in the report writer.  After a lot of moving fields, verifying they are what I think they are, and more verification, we arrived at a draft report for export.  Of course, after exporting it, I realize that there was way too much data to run it through PowerQuery in Excel - so back to the report to set up some filters to limit the data and then back to Excel.  After a few of these back and forths, I was able to generate a report that was small enough to be reasonably transformed in PowerQuery in Excel.

Transformation

What can I say, the data is messy - much more messy than I anticipated.  I know enough about PowerQuery to be dangerous and I set off to make the data useful.  The biggest problem that I face is that the requisition system in our ERP generates a requisition number that is not a direct match for what is in our ATS.  The ERP's number is system generated and consistent, the ATS's is entered with human hands.  As previously mentioned, sometimes we have a number, sometimes we don't and sometimes we have additional characters or notes where the requisition identifier should be.  Let's take these one at a time.

  1. When we don't have a number for our individual position and in some cases we have a 1, I know the number is the posting number.  While consistency in procedure would be nice, we live in the real world and people are not as consistent as computers.  A simple formula helps us with this.  Creating a new column called FinalReqNo, we check to see if position number is null or 1, if it is, we copy the posting number to the new column, if not, we copy the existing value to the new column.
  2. When we have a number, it is usually straight forward and we should be able to use that number.  It should be a five digit number greater than 60000.  I hate to drop too many lines, but we have a lot of requisitions to have what I hope to be a statistically representative sample size.  To make it easy, I split the column at five characters.  I want to keep this data so that we can go back and look at what is being cut off and inform the recruiters about the records that we need to fix.
  3. At this point, running the steps in this order, we should have stripped out any text.  This lets me continue with the process and, if we get errors loading the data into Excel, I go back to see what the errors are and either ask the recruiters to fix the source data or tweak the formula to handle the errors.

Beyond this, I have procedures to reorder the columns and set data types.  Fairly simple tasks that help me when staring for this data for hours at a time.

Load

Finally, we can load the data.  If all goes well, the data will load without errors.  If we have errors, as mentioned above, I examine the errors and request the source data to be fixed or adjust the transformation steps to accommodate the errors.  At this point, I repeat the process with out ERP data but the ERP data is our source of truth when it comes to the requisition numbers so the work not so intensive and the process is largely changing some formatting.  I still run it through PowerQuery to ensure consistency in the data as I am repeating this process manually for the time being.

For now, I am merging the data in Excel using Excel's Data Model feature and creating a pivot table.  So far this is working but this is not the end game.  The end game is getting the data into Tableau with automated processes.  This ETL process serves as a blueprint for the development team.  As an analyst, I try to make "code-ready" projects to turn over to our developers.  This seems to make them happy as it is not what they usually get, but if they can develop my stuff quickly, it is likely that I can get priority in the development queue because my projects are quicker to develop.

Next steps

Unfortunately, our ATS is lacking in some desired functionality so we are stuck with duct tape, bubble gum, and bailing wire to hold things together.  While we would love to have an API or SFTP to be able to transfer this data, we are not so lucky and have to rely on the ATS delivering via an email schedule.  Once we get this process rolling and are happy with our results, we may have the ATS provider write us a query but that costs quite a bit of money and we don't want to pull the trigger on such an expense until we are happy with the data in our dashboard.

The email delivery also posed a problem because I had a table calculation, so I am rewriting some of the process to happen in a Tableau data source vs the ATS.  I am happy, however, to get to the point that we are at.  This will allow us to finish the development of our first three scorecards that focus on staffing.



Followers