Remix Institute

2 Steps to Sharpen Wisdom and Discernment

Douglas Pestana - @DougVegas — Mon, 07 Jul 2025 05:04:56 +0000

There is a barrier to discernment when seeking wisdom and knowledge.

Many people have a desire to become wise, knowledgeable, and informed.

They have a panoply of reasons for doing so including but not limited to: self-improvement, to ascend in one’s career, to secure and protect one’s family, church, or community, and to give the world a better future.

But in the search for obtaining more knowledge and wisdom, most of those same people have a barrier that prevents them from doing this.

This barrier causes them to dismiss entire ideas or belief systems due to their flawed origins or associations, rather than critically discerning and extracting valuable elements from them through a systematic process.

It’s easy to dismiss entire ideas or beliefs simply because we don’t like where they came from or who supports them. This mental bias, while emotionally satisfying, is intellectually lazy.

We all know some of the famous idioms surrounding this, each emphasizing discernment, critical thinking, separation of value from waste, or not rejecting something entirely due to a defect:

“Don’t throw the baby out with the bathwater.”
“Separate the wheat from the chaff.”
“Don’t let one bad apple spoil the bunch.”
“Take it with a grain of salt.”

The entirety of a subject, concept, belief system, religion, or philosophy should not be thrown out just because of a few bad components. This is contrary to wisdom. One must be able to have good discernment for determining what parts to absorb, what parts to reject, and what parts to make uniquely their own.

This barrier to discernment is the rejecting of ideas because of its source-origin and association.

What are the reasons we’d dismiss the entirety of a subject, concept, belief system, religion, or philosophy just because of a few bad components? It’s all due to 2 logical fallacies humans employ:

Genetic Fallacy. Rejecting an idea or belief solely based on its origin or source rather than its merit or validity
Association Fallacy. Dismissing an entire system of thought because it is associated with certain people or views with negative impressions

Basically, humans have a tendency to reject an idea because of its source-origin (Genetic Fallacy) or association (Association Fallacy). But just because something comes from a flawed source doesn’t mean it’s without value.

Wisdom requires parsing, sorting and sifting, not blind rejection.

Epistemic Parsing is a systematic process for discerning what is useful and discarding what is useless.

What’s a simple systematic process for discerning what is useful and discarding what is useless? That is where Epistemic Parsing comes in.

Our Epistemic Parsing System is a systematic method for analyzing ideas, concepts, or beliefs, evaluating them for truth, evidence, and consistency, and then constructing a belief system based on that analysis.

Epistemic refers to knowledge, beliefs, and certainty. Parsing refers to analyzing something in detail and breaking it down into component parts to understand its structure or meaning. Parsing is also used in the fields of compuational linguistics and natural language processing (NLP).

Step 1 in Epistemic Parsing: Use the “Lee Sifter” as an analytical sifter for discernment.

Bruce Lee said in his writings, “Tao of Jeet Kune Do“ (published in 1975): “Absorb what is useful, reject what is useless and add what is essentially your own.”

When evaluating a concept, ask:

“Is there anything in here worth keeping?”
“Does this belief, concept, or framework contain elements I can isolate, examine, and possibly retain even if other parts are flawed?”

When designing an Axiomatic System of Philosophies that can withstand the test of time, sometimes they’re found in places one wouldn’t consider to look.

It is ok to venture into the study of subjects that are considered controversial by some but helpful to others in order to understand the parts that are considered helpful. Saint Basil the Great wrote the following in his “Address to Young Men on the Right Use of Greek Literature” written around 367 AD:

Now, then, altogether after the manner of bees must we use these writings, for the bees do not visit all the flowers without discrimination, nor indeed do they seek to carry away entire those upon which they light, but rather, having taken so much as is adapted to their needs, they let the rest go.

So we, if wise, shall take from heathen books whatever befits us and is allied to the truth, and shall pass over the rest. And just as in culling roses we avoid the thorns, from such writings as these we will gather everything useful, and guard against the noxious.

Think of it as the skill of saying:

“This part is valuable, even if the rest is flawed.”
“This belief may be rooted in a disreputable source or view, but its logic holds.”

Step 2 in Epistemic Parsing: To discern efficiently, adopt postulates and axioms based on their predictive power.

Postulate or axiom as prediction. What does that mean?

By holding a postulate or axiom, does it help predict the future or explain an outcome? If so, that’s usually the correct postulate or axiom to hold.

Hold the correct axioms or postulates, then you’ll be able to predict the future.

If a new piece of information is obtained, align it with a known postulate or axiom in order to discern if it’s good or bad information.

If the postulate or axiom is not predictive, discard it or revise it.

The philsopher, Karl Popper, talks about this in his philosophy of Falsifiability in his 1959 book “The Logic of Scientific Discovery.”

Falsifiablitiy means that a theory, hypothesis, or idea can be proven false if it can be logically contradicted by empirical evidence. Empirical means verified through observation or experience.

An example of Falsifiability is the belief or postulate that “all swans are white” can be proven false by observing a single black swan.

If a belief cannot be falsified and helps explain and predict outcomes, adopt that as a postulate or axom. Use the “Lee Sifter” to absorb it or make it uniquely your own.

If a belief can be falsified or doesn’t help explain outcomes, it’s useless. Use the “Lee Sifter” to reject or revise it.

Remix Institute’s Epistemic Parsing System overcomes the discernment barriers when seeking wisdom and knowledge.

The steps described above, the Lee Sifter and Postulate or Axiom As Prediction, form the basis of our Epistemic Parsing System to teach you how to discern wisdom and knowledge better and to become better at critical thinking.

Discernment isn’t innate. It’s learned.

The Epistemic Parsing System is best understood as a practical, meta-cognitive skill that that filters and sifts signal from noise both externally and even within your own beliefs and assumptions. it describes a practical and essential skillset that can be used in critical thinking and analytical reasoning.

It’s a real-world cognitive process that can be used by virtually anyone: philosophers, analysts, scientists, skeptics, businesspeople, working professionals, etc.

It systemizes the breaking of complex ideas and systems into its component parts in order to separate, categorize, and evaluate those parts independently for truth, wisdom, or utility.

It involves Disaggregation: Taking a whole and splitting it into meaningful elements and Discernment: Making an informed judgment call on what to keep, reject, or modify.

It secures a signal’s integrity, increases signal-to-noise ratios, extracts truths and wisdom, and stratifies falsehoods.

With the system and tools above, you’ll avoid lazy thinking and progress towards an Axiomatic System of Philosophies that are resilient, refined like gold, and immutable to withstand the test of time.

Remix Institute Membership

If you want to improve your discernment and critical thinking skills and apply the Epistemic Parsing System tools described above, then take the “Investigation Upskilling and Root Cause Analysis” course from Remix Institute.

Sign Up For Our Membership and Access The Course Here

The post 2 Steps to Sharpen Wisdom and Discernment appeared first on Remix Institute.

AI-Powered Virtual Assistants: Beyond Siri and Alexa

Sapna Naga — Wed, 31 Jul 2024 10:36:00 +0000

“We can only see a short distance ahead, but we can see plenty there that needs to be done.”
― Alan Turing

AI virtual assistants have become indispensable tools for both personal and professional settings. While Siri and Alexa are household names, the latest advancements in AI-powered virtual assistants offer capabilities that far surpass these early pioneers.

Remix Institute’s founder, Douglas Davila-Pestana, had a good point when he posted on X: “Even though Amazon sold millions of Alexa devices, I’ve never seen anyone actually use an Alexa.”

Alexa used to be so popular that it became a go-to Christmas gift, often chosen when people had no idea what else to give. Now it’s an obsolete virtual assistant in the age of smarter AI assistants like ChatGPT and LegalMente AI’s Para.

So, let’s explore a few innovative AI virtual assistants on the market today that make Siri and Alexa look like children’s toys.

The New Wave of Virtual Assistants

Virtual assistants today are not just about setting reminders or playing music. They are integrated into sophisticated workflows, assisting in complex tasks and providing specialized support across various industries.

Healthcare: Personalized Patient Care

One of the most promising advancements is in healthcare. AI-powered virtual assistants are now capable of managing patient records, scheduling appointments, and even providing preliminary diagnoses based on patient symptoms. For instance, Docus AI Doctor offers personalized health guidance with 24/7 availability. It provides diagnoses, treatment options, and health reports based on user conversations. The AI is backed by natural language models and validated by over 300 top doctors, empowering individuals to manage their health concerns conveniently and helping those who have difficulty accessing in-person healthcare.

Financial Management: Streamlining Expense Management

In the realm of financial management, AI-powered virtual assistants are revolutionizing how businesses handle expenses, and Fyle stands out as a top recommendation for AI-driven expense management. This versatile assistant, compatible with iOS, Android, and desktop devices, offers advanced features such as an automatic data extraction engine that effortlessly organizes expense data from receipts and invoices, a custom approval hierarchy for streamlined expense approvals, and a comprehensive expense audit trail ensuring transparency and compliance, making it one of the leading solutions for intelligent expense management.

Digital Management: Pioneering Personalized AI

Delphi AI and AI.XYZ are transforming personalized AI assistance. Delphi AI creates interactive, personalized digital replicas or “clones,” encapsulating individuals’ knowledge and personality, allowing users to engage dynamically with their audience across platforms like websites and social media. Ideal for influencers, educators, and business leaders, Delphi’s clones provide tailored responses and robust analytics for refining content strategy and monetization. Users maintain control over interactions and data, ensuring security and customization. Meanwhile, AI.XYZ offers personalized AI assistance by learning about users’ lifestyles to provide proactive support, enhancing everyday activities such as planning, communication, and personal wellness for a seamless and efficient user experience.

Case Study: Para – LegalMente AI’s AI Paralegal Assistant

LegalMente AI^TM uses artificial intelligence to reduce the cost of legal work for small businesses, startups, healthcare, and individuals. LegalMente AI’s Para, is an AI Paralegal assistant that exemplifies the next generation of virtual legal assistants.

Features of Para:

Legal Question Answering:

Para can provide accurate and timely answers to legal questions so you don’t have to spend hundreds of dollars to ask a lawyer.

Business Insights:

Para offers valuable insights to support informed business decisions.

Document Analysis:

Para efficiently analyses contracts, data files, and various legal documents, regardless of file format.

Business Formation Assistance:

Para helps guide users through the process of forming a business in the US such as an LLC or Corporation.

Impact of Para:

Para is fine-tuned with specialized expertise in the legal domain and guardrailed to prevent hallucinations. It’s more reliable to ask legal questions to Para compared to Google Gemini or ChatGPT. Para also maintains political neutrality and will politely decline to answer any questions related to politics, ensuring focused and unbiased legal assistance. Para assists small businesses, startups, healthcare providers, and individuals with their legal work without the burden of huge legal bills.

Meet Para, your free AI Paralegal assistant.

Conclusion

AI-powered virtual assistants have moved beyond basic functionalities, such as playing music or telling the weather, to become critical tools in various domains. Whether it’s healthcare, digital clones, customer service, or legal assistance, these advancements are making tasks easier, more efficient, and more personalized. As AI technology continues to evolve, we can anticipate even more innovative applications that will reshape how we interact with our digital environments.

Remix Institute Membership

If you liked this article, join our free Membership as a Stádas Genesis member for access to elite professionals and exclusive courses on getting started with R, Julia, data science, AI, and six-figure job opportunities. Elevate your skills and gain an edge in the industry. Sign up now for the elite learning experience.

The post AI-Powered Virtual Assistants: Beyond Siri and Alexa appeared first on Remix Institute.

Breaking Barriers: Low-Code/No-Code and AI in Application Development

Sapna Naga — Sun, 30 Jun 2024 14:44:00 +0000

“Technology is nothing. What is important is that you have a faith in people, that they’re basically good and smart, and if you give them tools, they’ll do wonderful things with them.”
― Steve Jobs, Co-Founder of Apple Inc.

In today’s fast-paced tech world, businesses are always looking for ways to innovate and simplify their processes. Low-code/no-code platforms are game-changers, breaking down the barriers to software development and allowing everyone from experienced developers to tech-savvy professionals to create applications.

Now, imagine taking this a step further with Generative AI, which is transforming application development. The combination of low-code/no-code platforms and generative AI is reshaping how we build apps, making advanced AI capabilities accessible to everyone. Whether you’re a developer, a small business owner, or a startup enthusiast, embracing this shift is key to staying ahead in the tech landscape.

Low Code/No Code Platforms Lower The Barrier To Entry for AI

Traditionally, developing AI applications required deep technical expertise in programming languages like Python, R, or Julia. However, low-code/no-code platforms are changing this narrative. These platforms provide visual software development environments where users can drag and drop components, visually design workflows, and configure AI models through intuitive interfaces. This accessibility lowers the barrier to entry, enabling even non-technical users, including entrepreneurs and domain experts, to build, automate, and deploy applications quickly without extensive coding knowledge. By simplifying the software development process, these platforms foster rapid innovation and make advanced technology more accessible to a broader audience.

Businesses Can Use Generative AI To Accelerate Innovation and Reduce Costs

Generative AI, a subset of artificial intelligence, refers to algorithms capable of creating new content based on existing data, whether it be code, text, images, or even music. This technology is transforming low-code/no-code platforms by not only enabling the drag-and-drop functionality but also by intelligently recommending and generating code snippets, workflows, and UI components based on user input and intent. Generative AI can automate the creation of complex models, learning from data to autonomously generate new content. This reduces the need for meticulous coding and data preprocessing, thus accelerating development cycles, reducing costs, and fostering innovation. Businesses can quickly and efficiently produce creative outputs, making generative AI an invaluable tool in software development.

How Generative AI Enhances Low Code/No Code Platforms

Automated Code Generation:

Generative AI can analyse user requirements and generate code snippets automatically, significantly speeding up the development process and reducing errors. This capability extends the functionality of low-code/no-code platforms, making it easier for users to add complex features without extensive programming knowledge.

Enhanced User Interfaces:

Designing intuitive and user-friendly interfaces can be challenging. Generative AI assists in creating visually appealing layouts and designs based on user input and existing design principles, ensuring a seamless user experience.

Intelligent Suggestions:

As users build their applications, Generative AI provides real-time suggestions for improvements, alternative approaches, and predictions of next steps. This feature is particularly beneficial for those with limited coding experience, guiding them towards creating robust applications.

Advanced Data Handling:

Generative AI enhances the platform’s ability to handle and process data efficiently. Users can integrate advanced data analytics functionalities without manually setting up complex data models, making it easier to gain insights and drive decision-making.

Examples of Leading Companies and Tools in Low-Code/No-Code AI Platforms

The integration of low-code/no-code platforms with AI are democratizing AI, making it accessible to a wider range of users, from data analysts to non-developers. By simplifying the development process and reducing costs, they enable rapid prototyping and deployment of AI applications across various industry sectors.

LegalMente AI is a free no-code, user-friendly platform that helps its customers use AI and natural language processing to analyze legal documents as well as large volumes of Word, Excel, or PDF files that a business might have.

Obviously AI allows users to create AI models quickly without writing code. It simplifies data analysis and model creation, making AI accessible to those with limited technical expertise.

Azure ML Designer provides a suite of tools and a drag-and-drop interface for building custom AI and machine learning models, emphasizing ease of use and integration with Microsoft Azure’s cloud ecosystem.

Lobe is designed to help users create image recognition models easily. It features a drag-and-drop interface and integrates with more advanced tools like Azure AI for further development.

DataRobot and H2O.AI both offer a comprehensive no code/low code platform for automating AI and machine learning processes, making it easier for users to build and deploy models across various industries, including banking, retail, and healthcare.

Tailored for creatives, Runway AI provides tools for generating and manipulating media such as images, audio, and 3D models through a simple drag-and-drop interface.

KNIME is a free, open-source and low-code/no-code software designed for transforming, modelling and visualizing data. KNIME is suited for handling analyses of different type and complexity, such as predictive modelling, generative AI, and image analysis.

RapidMiner supports both data analytics workflows and machine learning models, providing a drag-and-drop interface that is versatile and user-friendly.

AI and Low Code/No Code Are Shaping the Future of Application Development

The fusion of Generative AI with Low Code/No Code platforms is set to transform the landscape of application development. By lowering the barriers to entry and enhancing the capabilities of users, these technologies empower businesses to innovate and adapt in an increasingly digital world. As AI continues to evolve, we can expect even more sophisticated tools and functionalities that will further democratize the development process.

In conclusion, the synergy between Low Code/No Code platforms and Generative AI presents an arbitrage opportunity for businesses of all sizes. By embracing these technologies, organizations can accelerate their digital transformation journeys, drive innovation, and maintain a competitive edge in the market even on a shoestring budget.

Remix Institute Membership

The post Breaking Barriers: Low-Code/No-Code and AI in Application Development appeared first on Remix Institute.

The Potential Of Generative AI In Transforming Healthcare

Sapna Naga — Mon, 29 Apr 2024 20:18:47 +0000

“We need to design and build AI that helps healthcare professionals be better at what they do. The aim should be enabling humans to become better learners and decision-makers.”
― Mihaela van der Schaar, PhD, director of the Cambridge Centre for AI in Medicine at the University of Cambridge in the U.K.
(source: The Guardian)

In the realm of modern healthcare, one technological advancement emerges as a lighthouse of transformative power: Generative Artificial Intelligence (Generative AI). It’s not merely a buzzword but rather an innovation poised to revolutionize the fabric of healthcare delivery. Picture a world where diagnoses are swift, treatment plans are tailored with unprecedented precision, and patient care transcends the ordinary. This is the realm where Generative AI reigns supreme, offering not just incremental improvements but a seismic shift in the paradigm of healthcare delivery.

Amidst the cacophony of medical advancements, Generative AI emerges as a symphony of possibility, conducting a harmonious union between human expertise and computational prowess. Imagine algorithms sifting through mountains of data with the acumen of the most seasoned clinician, unveiling insights that were once obscured by the veil of complexity. With each iteration, Generative AI learns, adapts, and refines its approach, propelling healthcare into a realm where innovation isn’t just a luxury but a lifeline. Welcome to the dawn of a new era in healthcare – where Generative AI isn’t just a tool, but a transformative force poised to elevate patient outcomes beyond imagination.

Clinical Documentation

One significant challenge identified was the considerable time commitment required by cardiologists for documenting patient encounters. This manual documentation process not only consumed valuable time but also contributed to heightened levels of provider burnout and fatigue. Notably, several clinicians emphasized that the prolonged duration spent on clinical documentation directly correlated with diminished capacity to accommodate new patients, thus impeding timely care delivery.

Example: Baptist Health’s Clinical Document Summarization App

The technological journey commenced with the capture of patient-clinician interactions, which were subsequently transmitted to the AWS service for transcription. Upon transcription completion, the text underwent processing through a comprehensive language model such as GPT-4. This advanced model, through its interpretation of the transcribed content, produced succinct summaries formatted according to clinical SOAP standards. These summaries underwent thorough validation by clinicians to ensure precision. Moreover, following the clinician review and approval of the summaries, they were seamlessly exported and integrated into our electronic health record system. This meticulous process guarantees the delivery of timely, precise, and standardized clinical documentation⁴.

Diagnostic Precision

One of the most compelling aspects of Generative AI in healthcare is its prowess in diagnostics. The technology excels in generating realistic medical images, enabling practitioners to simulate and analyze a myriad of scenarios. This, in turn, ai

ds in the identification and classification of subtle anomalies that might elude the human eye.

Example: Stanford’s CheXNet

Stanford’s CheXNet, powered by Generative AI, has demonstrated remarkable proficiency in interpreting chest X-rays. Trained on a massive dataset of X-ray images, CheXNet outperforms traditional methods in detecting pathologies, such as pneumonia and lung nodules, with unprecedented accuracy¹.

Personalized Treatment Plans

Generative AI is not limited to diagnostics; it extends its capabilities to the realm of personalized medicine. By analyzing diverse patient data, including genomic information, lifestyle factors, and historical medical records, Generative AI models can recommend tailored treatment plans that maximize efficacy while minimizing side effects.

Example: IBM Watson for Oncology

IBM Watson for Oncology utilizes Generative AI to sift through vast amounts of medical literature, clinical trials, and patient records to propose personalized cancer treatment options². This ensures that oncologists have access to the most current and relevant information when making critical decisions about patient care.

Drug Discovery and Development

The traditional drug discovery process is notoriously time-consuming and costly. Generative AI is revolutionizing this aspect of healthcare by accelerating the identification of potential drug candidates. By predicting molecular structures and simulating interactions, these models significantly reduce the time required for drug development.

Example: Atomwise

Atomwise, a company specializing in using Generative AI for drug discovery, has successfully identified promising compounds for diseases like Ebola and multiple sclerosis³. Their technology expedites the initial stages of drug discovery, offering hope for faster and more efficient development of life-saving medications.

Overcoming Challenges

In the realm of healthcare, the integration of Generative AI presents a multitude of challenges, ranging from regulatory hurdles to stakeholder hesitancy. One significant obstacle lies in the cautious approach of healthcare systems towards deploying Generative AI in clinical settings. The reluctance stems from a prudent desire to avoid applications that directly influence patient care, reflecting a conservative stance towards technologies requiring FDA approval, as outlined in the FDA’s guidance on clinical decision support systems.

Furthermore, stakeholders within the healthcare ecosystem exhibit a palpable reluctance to embrace novel technologies, particularly Generative AI, which is still perceived as nascent and unfamiliar. This hesitancy underscores the need for comprehensive strategies to instill confidence and foster acceptance among key decision-makers.

Crucially, any Generative AI solution intended for healthcare applications must undergo rigorous human evaluation. This evaluation process entails soliciting feedback and assessments from clinical experts and physicians, ensuring that the AI’s outputs align with the highest standards of accuracy, reliability, and safety. By incorporating the expertise of medical professionals, we can mitigate risks and enhance the efficacy of Generative AI in healthcare settings.

Navigating these challenges demands a concerted effort to address regulatory concerns, alleviate stakeholder reservations, and prioritize robust evaluation protocols. Through proactive measures and strategic collaborations, the healthcare industry can unlock the transformative potential of Generative AI while upholding the paramount goal of improving patient outcomes and advancing medical practice.

Conclusion

“Eventually, doctors will adopt AI and algorithms as their work partners. This leveling of the medical knowledge landscape will ultimately lead to a new premium: to find and train doctors who have the highest level of emotional intelligence.”
― Eric Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again

In closing, it’s abundantly clear that Generative AI stands poised at the precipice of revolutionizing healthcare as we know it. With its promise of precision, personalization, and unparalleled efficiency, this technology is not merely a tool but a catalyst for transformative change. As we stand on the threshold of this remarkable era, it’s imperative for the healthcare industry to not only embrace but also wield Generative AI with unwavering responsibility and ethical foresight. The future of patient outcomes and medical advancement hangs in the balance, and it is our collective duty to ensure that we harness this power for the betterment of humanity.

Footnotes

Rajpurkar, P., et al. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint arXiv:1711.05225.
Watson Health. (2022). IBM Watson for Oncology. Retrieved from https://www.ibm.com/docs/en/announcements/watson-oncology?region=CAN
Atomwise. (2022). Drug Discovery with AI. Retrieved from https://www.atomwise.com/
In the pilot, generative AI is expected to reduce clinical documentation time at Baptist Health. Retrieved from https://www.healthcareitnews.com/news/generative-ai-reduces-clinical-documentation-time-baptist-health

Remix Institute Membership

The post The Potential Of Generative AI In Transforming Healthcare appeared first on Remix Institute.

The Number 1 Thing You Need To Do When Negotiating A Job Offer

Douglas Pestana - @DougVegas — Sun, 31 Mar 2024 23:48:10 +0000

Always ask for more than what’s initially offered.

I recently read an X post from a CEO of a healthcare startup about how she considered giving a candidate 2.5X the offered comp just because he ASKED for it.

How can an average job seeker accomplish the same feat?

Now, keep in mind, this startup raised $5.8M in Seed funding in 2021 and $20M in Series A funding in 2022…for a total of $25.8M raised. In other words, they got investments during the “easy money” times so they had a boatload of cash to burn. Any company or startup in this present day economic climate wouldn’t even consider this as an option.

But what were the reasons the CEO gave? What made her consider giving the candidate more than what the initial offer was?

She listed 5 main reasons:

1 – Reasoning
2 – Candidness
3 – Excitement
4 – Good Intentions
5 – Put his money where his mouth is

Of course, those aren’t the real reasons the candidate was offered more money. Because frankly, none of that makes sense.

In her X post, she puts the real reason towards the end:

“The whole process built trust.

Speed + integrity + kindness + aggressive ask = A+ negotiation.

My thought during the process: ‘Damn, I need this person on my team!’

If this is how he negotiates and treats people, he will be a huge asset for the team.

It’s never WHAT you ask for, it’s HOW you do it.“

In other words, it was all about how the candidate made her FEEL throughout the process.

Ultimately, he didn’t get the 2.5X, but they agreed to a more lucrative offer.

If you don’t ask, you don’t get.

I worked for a CEO once who often repeated this sage advice: IF YOU DON’T ASK, YOU DON’T GET.

For every job offer I’ve been given in my career (and I’ve had wayyy more job offers than I’ve had jobs), I always ask for more than what they initially offer.

The company already has a budget, they know what that budget is, and it’s usually a range. And the company ALWAYS wants to offer you the lowest end of that range to be good financial stewards.

You, as a candidate, should NEVER accept the initial offer. You should ALWAYS ask for more than the initial offer. Not 2.5X more, but you should ask for more if you’re worth it and if it’s in their budget.

But keep in mind the advice from that CEO. It’s not WHAT you ask for, it’s HOW you ask for it.

Interviews and job offer negotiations are poker games. The best hand isn’t always the one that wins. It’s always how you make the other players FEEL. Then they’ll fold and give you what you want.

It’s a buyer’s market for employers currently. Offer negotiations are harder. So you have to make them FEEL like they’re making the wise decision in offering you more compensation.

Most people will only go into business with you if they have a good feeling about you and feel like they can trust you. This is true in almost every relationship whether that’s trying to secure investors, acquire new customers, or convince an employer to hire you.

The post The Number 1 Thing You Need To Do When Negotiating A Job Offer appeared first on Remix Institute.

LinkedIn Releases Their Report on the Top 25 Hottest Jobs in the US in 2024

Douglas Pestana - @DougVegas — Wed, 31 Jan 2024 21:43:52 +0000

Looking for a career change in 2024?

For those working as Data Scientists, we have breaking news: you are all AI Engineers now.

LinkedIn recently released their report on the top 25 fastest growing jobs in the US.

LinkedIn’s Editor-In-Chief even did an interview about it on the Today Show.

In the interview, he showed:

1⃣ Number 8 and Number 10 on the list are Artificial Intelligence Consultant and AI Engineer

2⃣ $200K+ Yearly Salary (although that depends on what city you live in and how many years of experience you have)

3⃣ 50% of AI Consultant and AI Engineer postings are remote or hybrid.

4⃣ AI roles have an expected 20X growth by 2030.

LinkedIn’s Editor-In-Chief also said (falsely) that it’s a “new industry.” Well, that’s not true. It’s been around for decades. But it is true that it’s growing.

He also said (falsely) that “it’s not like there’s been people who’ve been around for like 20 years doing this who are experts at this. You can become an expert pretty quickly. You can take online courses.”

That also is not true. There’s many AI veterans who’ve been working in the industry for decades, including myself. It’s just been referred to by different names throughout the years. And Artificial Intelligence as a research field has been around since at least the 1950s. Humans have been thinking about Artificial Intelligence for centuries before that.

So it’s not like you can become an expert quickly and then expect to make $200K+ yearly. It takes time and effort. Taking online courses is not enough. You have to make sure you’re learning the right tools for AI Engineering, such as R, Julia, and OpenAI. You also need to practice what you learned on real world use cases and look for mentorships and a Network.

You can see the interview on the Today Show by clicking the link here.

If you’re looking for a guided path on entering the AI and data science space to take advantage of this growing profession and hit six figures, consider joining Remix Institute’s Stádas Prestige Membership free for 30 days using coupon code 1MONTHFREE

The post LinkedIn Releases Their Report on the Top 25 Hottest Jobs in the US in 2024 appeared first on Remix Institute.

Remix Institute Predictions for 2024

Douglas Pestana - @DougVegas — Wed, 03 Jan 2024 13:00:00 +0000

Happy New Year 2024!

Just recently, financial uncertainty has led to many layoffs in tech, including in data science and analytics. In 2023, Google, Meta, Amazon, and Spotify (among others) all laid off data scientists and data analysts despite the profession being called “the sexiest job of the 21st century.”

In 2018, I correctly predicted that even though there was strong demand for data professionals, eventually CEOs and senior executives will question their investments in data science and AI and want an ROI on the high six-figure salaries. I also said back then that it would happen by 2023 if the data science community didn’t course-correct, and it appears they did not course-correct. Otherwise, we wouldn’t be seeing layoffs in data science and data analytics.

Sadly, there will be more to come. An article from Newsweek predicted that there will be more layoffs and hiring freezes in 2024.

But despite these ominous storm clouds, there are still things we can do to prepare. Here are a few of our thoughts and predictions for 2024 and beyond:

Remix Institute Predictions for 2024

On December 31, 2019, I sent my subscribers an email titled “AI in 2020 – Man and Machine Converge” where I made the following prediction:

“Man and machine will converge via the use of physical and digital human augmentations, which will be enabled by artificial intelligence and machine learning. The 2020s decade will be THE decade for when humans and machine work together.”

That prediction and others turned out to be correct and continues to be correct. One big development which showed this was the release of ChatGPT in November 2022 and the use of large language models to augment human capability and cognition. In fact, Harvard and Boston Consulting Group released a study in 2023 that showed that consultants who used AI had significantly higher productivity. But what’s even more stunning is their analysis showed two distinctive patterns of human-AI integration: “Centaurs” (half human/half AI) and “Cyborgs” (continual interaction with AI).

Additionally, we’ve had companies like Neuralink obtain FDA approval for human study of brain implants.

But why talk about this prediction from 2020? Because it ties into my predictions for 2024. Our mission at Remix Institute is to help businesses and people grow through the use of AI automation and using machines to augment human cognition, productivity, and decision-making. Man and machine, working together.

With that said, here are a few of my predictions for AI and machine learning in 2024 and beyond:

1 – AI will not replace jobs; AI-augmented humans will replace other humans. Jason Calacanis, investor and one of the hosts of the All-In Podcast, tweeted back in October 2023 that startup pre-seed and seed rounds are back down to 2009-2014 levels. He also said Founders are getting more done with less, with much smaller teams. AI-augmented development and AI-assisted productivity tools are going to be the AI Augmentations humans will need to equip to themselves to increase their throughput. AI Augs like ChatGPT, Microsoft Copilot, Canva Magic Studio, and LegalMente AI are all going to be part of data science, business, and startup professionals’ arsenals to get more done with less.

2 – The Rise of Julia in AI and Machine Learning. Let’s face it: Python (released in 1991) and R (released in 1993) are old and slow. In this new era of AI and machine learning (where compute performance is becoming critical), faster and newer languages like Julia (released in 2012) are best positioned to handle the performance requirements.Just recently, Julia cracked the top 20 in the TIOBE Index of popular programming languages. Also, Julia is as easy to use and learn as R and Python.

3 – Profit Focused Data Science or PFDS becomes the new gold standard for data scientists. CEOs have lost their patience with paying high six figure salaries for data scientists and receiving a lack of results and ROI. In 2024, data science teams will be expected to be profit centers, not cost centers. Else, they will be easy targets for layoffs by CEOs and senior executives. This is going to require a paradigm shift which Remix Institute is embracing and has been preparing for.

4 – Digital Clones and Digital Twins. Digital cloning will be able to capture your way of thinking and embed your knowledge, experience, and personality digitally so that others can access. Basically, it’s Black Mirror Season 2, Episode 1 “Be Right Back.” In fact, a company called Delphi recently raised $2.7M from Founders Fund to “build the AI version of you to scale your expertise and availability, infinitely”. And companies I’ve worked at want LLMs to be fine-tuned using their company-specific playbooks so the AI learns the knowledge and personality of their company.

5 – AI Agents will increase in usage. AI Agents that can do autonomous research on the internet, integrate with apps, and complete tasks you specify have already existed with several coming out last year. But there haven’t been many user-friendly applications for mass public consumption. That will change, and AI Agents will increase in adoption for non-technical audiences. Best case scenario, it’s small scale, autonomous company concepts like Delamain from Cyberpunk 2077. Worst case scenario is rogue AI causing damage to systems.

6 – Humanizing of AI via Avatars and Voice. With many Western countries distrustful and fearful of AI, more web, software, and mobile applications will try to humanize AI by creating AI avatars and adding natural-sounding voice to their AI chatbots. Startups like AI.XYZ and Inflection AI are developing examples of this.

7 – Fear of AI will continue to be amplified by AI Doomers and “AI Safety” Hall Monitors with ulterior motives for control. There are many non-technical and mediocre people (such as journalists and politicians) who don’t know anything about AI or machine learning, but they constantly like to highlight the dangers of AI and not its benefits. Because they have nothing else of value to add. Through their constant narratives being told to the public that AI will replace jobs, become sentient and turn into Terminators and T-1000s, and kill humans, it has led many people in the US to be distrustful and fearful of AI.

8 – AI will attempt to become regulated, but it’ll become important to ensure the government doesn’t try to control it or put it only in the hands of a few big corporations. The key will be to ensure AI development continues to be democratized and that there’s still a free market for AI. It’s crucial to shift the discussion from focusing on the dangers of AI to recognizing its benefits. For those in the AI field, share your insights, contribute to balanced discussions, and help steer conversations towards a future where AI’s accuracy and benefits are recognized and maximized.

9 – AI Worshippers. As AI becomes more advanced, people will start to love, deify, and may even worship AI (like Brendan in Cyperpunk 2077 and the 2023 sci-fi action film, The Creator).

Remix Institute Membership

Remix Institute is helping you build a platinum shield for job security. I have been building up a Network of the best business, data, and AI professionals in the world and created a Membership that lets you gain access to courses, job and money opportunities, and a place to learn and share resources together from The Network.

Our Members are going to require help and support in this economic tempest. It is critical we uplift our own and stand by when our Members reach out needing that introduction to a hiring manager or recruiter. We will come out stronger on the other side.

In 2024, I’m committed to creating more consistent content and resources to share with our Remix Institute Members to prepare you so that you become a citadel against uncertainty. Follow me on LinkedIn and X as well so I can share content that’s valuable and helpful for you there.

Si vis pacem, para bellum.

If you liked this article, join our free Membership as a Stádas Genesis member for access to elite professionals and exclusive courses on getting started with R, Julia, data science, AI, and six-figure job opportunities. Elevate your skills and gain an edge in the industry. Sign up now for the elite learning experience.

The post Remix Institute Predictions for 2024 appeared first on Remix Institute.

Effortless Cool Is A Detailed Process – How To Influence, Tell Better Stories, and Become More Persuasive

Douglas Pestana - @DougVegas — Mon, 18 Dec 2023 07:09:27 +0000

Why Attention To Detail Is An Important Trait

Many years ago when I lived in Las Vegas, I was walking inside the Miracle Mile Shops at the Planet Hollywood, Las Vegas and passed by a Ben Sherman store. Towards the back of the store, nearby the sales counter, there was a quote display on the wall that said:

Effortless cool is a detailed process.
-Ben Sherman

That quote stuck with me to this day. It’s a powerful quote that describes how most things that appear easy and effortless actually took a lot of work and detail to get to that level.

Most people don’t know that attention to detail is one of the things that separates excellence from mediocrity.

People become mediocre for one reason only: they’re lazy. And lazy people don’t pay attention to detail.

Data Scientists, AI Engineers, and data-driven business professionals often run into the problem of trying to present data to audiences in order to persuade them about some story or insight, but then their presentation falls flat. This leads to frustration especially when the one armed with the data has the knowledge about the truth, but the audience still isn’t convinced.

What’s even more frustrating is when one is armed with this data and knowledge about the truth, but this gives them no power or leverage in the company, and then the company lays them off. In the 2016 movie, “Batman v Superman: Dawn of Justice“, Lex Luthor (played by Jesse Eisenberg) masterfully said: “The bittersweet pain among men is having knowledge with no power…because that is paradoxical.”

In the end, CEOs eventually question their investments in data science and AI and will layoff these types of roles if they don’t feel they are getting a return on their investment.

So how can you turn this tide and get leverage and power to prevent being laid off and increase job security as a data professional? After all, creating simple, effective data stories, visualizations, and products—seems deceptively easy. However, this apparent simplicity often leads to underestimating the complexity involved in achieving it.

Behind every elegant data visualization or intuitive data product lies a hidden, painstaking process. Crafting a compelling data story requires more than just compiling numbers, copying and pasting output from Jupyter Choke-books, and slapping it on some unformatted PowerPoint slides. It involves weaving a narrative that resonates with the audience and having an understanding of visual perception and meticulous attention to detail, from color choices to layout.

In the realm of data science, the teaching of “Effortless cool is a detail process” refers to seemingly simple yet effective data storytelling, data visualization, and/or data product building which underneath is a deep, detailed process. This article goes into some of what that process entails as I share my experiences in my decade and half experience in data science and AI and managing large teams of Data Scientists, Data Analysts, and AI Engineers.

Photo Credit: Le Fanatique De Mode (Link: https://le-fanatique-de-mode.blogspot.com/2011/04/effortless-cool-is-detailed-process-ben.html)

Data Storytelling: More Than Numbers and Raw Output

Creating an engaging data story involves more than just presenting facts, numbers, and raw output (especially if it’s not been formatted or straight out of a Jupyter Choke-book). It requires a strategic selection of data, understanding the audience and knowing what their pain points are, and weaving a narrative that makes complex information relatable and easy-to-consume. It’s also about finding the balance between too much information and just enough to tell a compelling story. This process is far from effortless.

There are a few Axioms from Remix Institute’s Axiomatic System of Philosophies that can help guide you in crafting a better data story. These Axioms are things I’ve learned in my decade and a half experience presenting to senior executive and C-Suite audiences that have allowed me to succeed in influencing and persuading them.

1 – Occam’s Razor and Principle of Parsimony. Simplest explaination is the best explanation possible. Less is more. Make your data story easy to consume and understand. This concept was first introduced in the 14th Century by William of Ockham, a Franciscan friar and theologican, and it has withstood the test of time.

2 – Entertain Your Audience. Let’s be honest, most people are just looking to be entertained, including senior executives and C-Suite professionals. It’s why sports, movies, and social media are the dominant content that gets consumed. People just want entertainment, and they’ll always just want entertainment. The sport of gladiator fighting ran from about 105 BC to 404 AD, which is proof of its popularity in the Roman Empire’s entertainment calendar. The concept that people want to be entertained is also another Axiom that has withstood the test of time. Remember when you were a kid and you were entertained from stories in kindergarten pop-up books? That’s what you need to do with your data stories or you will put your audience to sleep and lose interest.

3 – ELI5 Model or Explain It Like I’m 5 Years Old. Distill complex information into easy-to-understand concepts. Do not assume your audience has any prior knowledge about what you’re talking about especially if it pertains to artificial intelligence, machine learning, statistics, mathematics, databases, etc.

4 – Show Them Only What’s Necessary. Presentations get derailed for one reason. If you memorize this reason, I guarantee you that your presentations will have a much higher success rate. Here is the reason: Anchoring leads to the Law of Triviality which subsequently leads to Analysis Paralysis and then no decisions get made. Put simply, do not tell the audience to think about an elephant because then they will think about an elephant. And then they will anchor on the elephant and give disproportionate weight to the elephant even though the elephant is trivial. And once they’ve given disproportionate importance to the trivial elephant, then they’ll send you down rabbit holes and have you complete analysis after analysis until they finally reach paralysis. Then no decision gets made, and progress gets stalled.

Data Visualization: The Detailed Art of Clarity, Simplicity, and Minimalism

An effective data visualization might look simple, even obvious after observing it. However, the path to this simplicity is anything but. It involves a deep understanding of visual perception, color theory, and the art of highlighting what’s important while eliminating noise. Each element in a visualization – from font and color choices to the scale of axes – is a deliberate decision aimed at making the data accessible and understandable. This process demands a keen eye for detail.

In the 2020 book “Shikake: The Japanese Art of Shaping Behavior Through Design” by Naohiro Matsumura, the author teaches about shikake (or “device” in Japanese) which is a design that exerts influence on people through subtlety, rather than a direct prompt. By combining traditional and minimalist Japanese aesthetics with lessons from behavioral economics, shikake is designed to encourage a certain behavior without telling the (often unwitting) person the primary purpose behind that behavior.

IKEA stores and Swedish architecture are also able to subtly nudge customers to perform certain behaviors. This level of detail is far from easy.

Let’s give an example. Let’s say I wanted to present a simple visualization to my audience to get them to look at my chart. I could prompt them to say, “take a look at this chart and analyze these numbers.” Or I could spend just a little more time sprucing up the design of the chart while still keeping it minimal in order to make it easy for them and subtly nudge them where I want them to go. The latter will get the audience to stare at the visualization longer, forcing them to actually look at the chart, versus just telling them to look at it. Audiences want to be entertained: they will not look if they get bored.

It’ll also prevent then from just glancing at your chart and not grasping whatever story you wanted to tell. The key is to guide their eyes where you want them to go and have them stare at your art longer.

Most Data Scientists and Data Analysts will build something like the chart below because it’s the easy and lazy to do. I’ve seen this type of chart countless times when I had to review presentations from employees I managed before meeting with senior or C-Suite executives. It’s also common to see on tutorials and blogs so people tend to copy it:

But if you added just a few more details to the chart to make it easier to consume and add just a few more lines of code, you get something that will cause the audience to stare at it for longer:

What is the difference between the two charts?

The first chart is full of common mistakes I see being built by Data Scientists and Data Analysts who then try to present it to their audience:

Not labeling the x and y axes in user friendly labels; they just use the raw value names. The improved second chart explicitly labels the axis in easy-to-read labels.
Using the standard, out-of-the-box theme from ggplot2, matplotlib, et al without customization. The improved second chart uses themes and customization.
Not changing colors and fonts to match brand colors. The improved second chart changes the color, fonts, and uses branding.
Not using the chart title to tell the story so that the audience doesn’t have to think. The improved second chart just tells the story without making the audience think about what’s being displayed.

If you avoid these mistakes and just add a little attention to detail about how your audience will consume your data visualization, then you’ll become effortlessly cool and influence more decisions.

Data Product Building: Intuitive Design and Ease-Of-Use

Building user-friendly data products is no less challenging, necessitating continuous refinement to ensure seamless functionality and design. This involves taking a deep dive into user needs and behaviors. The aim is to mask the complexity of data behind an intuitive and user-friendly interface.

The main thing I want to emphasize is regularly getting feedback from your users to enhance your data product. This means communicating frequently with your users. You can’t just set-it-and-forget-it. Continuous improvement of your data product based on your users’ feedback will increase its usage and adoption.

You literally have no choice but to communicate regularly with your users or your data product is dead in the water. When I managed teams of Data Scientists, Data Analysts, and AI Engineers, this was the key differentiator that set apart my top performing employees from my bottom performing employees. The top performers communicated regularly with their users and built for their users in mind. The bottom performers just wanted to build in isolation and not talk with their users.

It was also one of the key differentiators that set apart top data science teams I was a part of. The top data science teams had a direct line of access to their users and stakeholders, especially the C-Suite and senior executives. You have to get your work known and get your users involved in its creation.

Here’s some tips I’ve learned about building great data products over the past decade and a half that have helped ensure the success of my data science team’s initiatives and elevate their reputation:

Single page applications trump multiple-paged and multiple-tabbed applications every time.
Pleasing aesthetics increase user adoption. Design for the user in mind.
Do not duplicate information on your dashboard or data product.
Minimize the number of clicks a user has to make on your dashboard or data product.
People consume data products visually in the following direction: Top-to-bottom and left-to-right.
Do not use complex visuals. Don’t create work for your users.
Don’t use Pie Charts. Ever.
Don’t ever have more than 4 time series lines on a Time Series Line Chart.
If you’d like design feedback on a dashboard or app, consult a UI/UX expert at your company. Many times, they may even help you wireframe something that looks top notch.
Collect feedback from at least 5 users of your dashboard or data product. Research has shown that 85% of your usability issues will be uncovered here. Links to that research can be found here and here.

Conclusion: Effortless Cool Will Make You Highly Valuable

Achieving “Effortless Cool” is indeed a detailed process, but if you master it, it will make you more valuable, more in-demand, and most importantly, more persuasive. Effortless Cool represents the mastery of making something complex more intuitive, engaging, and seemingly simple.

Aspiring data professionals should appreciate the meticulous work behind this façade of effortlessness, recognizing that true skill lies in the ability to make the complex appear beautifully simple. To truly master the art of ‘Effortless Cool’ in data science and AI, professionals must embrace the detailed process behind it which means recognizing the importance of crafting narratives in data storytelling, understanding the principles of design in data visualization, and focusing on user experience in data product development.

It’s not easy to do, but the payoff is huge.

Remix Institute Membership

R Code

				
					# DATA VISUALIZATION VERSION 1 - LAZY VERSION  -----------


library(ggplot2)

# Basic ggplot2 scatter plot
ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +
  labs(title = "Standard Scatter Plot - Lazy Version You Often See")



# DATA VISUALIZATION VERSION 2 - EFFORTLESS COOL, ATTENTION-TO-DETAIL VERSION  --------


library(ggplot2)
library(extrafont) # for font customization
library(magick) # to help with branded logos
library(grid) # to help with branded logos
loadfonts(device = "win") # This loads the fonts for Windows. For Mac or Linux, use different methods.


logo_path = "https://www.remixinstitute.com/wp-content/uploads/2023/12/Remix_Institute_No-Tagline.png"


# Enhanced ggplot2 scatter plot with branding
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "#1C1C1C", size = 3) +
  theme_minimal() +
  theme(text = element_text(family = "Roboto Condensed", color = "#005B80"),
        plot.title = element_text(hjust = 0, face = "bold", size = 22),
        plot.subtitle = element_text(hjust = 0, face = "italic"),
        plot.caption = element_text(size = 12),
        axis.title = element_text(face = "bold"),
        axis.text = element_text(color = "#005B80", size = 14)) +
  labs(title="When A Car's Weight Increases, Miles Per Gallon Decreases",
       subtitle = "Minimal Aesthetic using Themes and Customization in ggplot2. This is an Enhanced Scatter Plot With Attention to Detail.",
       caption = "Source: mtcars dataset",
       x = "Weight", y = "Miles per Gallon")


# Remix Institute branding
logo = magick::image_read(logo_path)
grid::grid.raster(logo, x = .04, y = .02, just = c('left', 'bottom'), width = 0.1)

Julia Code

				
					# DATA VISUALIZATION VERSION 1 - LAZY VERSION  -----------

using Plots
using RDatasets

# Load the dataset
mtcars = dataset("datasets", "mtcars")

# Basic scatter plot
scatter(mtcars[!, :WT], mtcars[!, :MPG], 
        title = "Standard Scatter Plot - Lazy Version You Often See",
        xlabel = :WT,
        ylabel = :MPG,
        legend = false)



# DATA VISUALIZATION VERSION 2 - EFFORTLESS COOL, ATTENTION-TO-DETAIL VERSION  --------

using Gadfly
using RDatasets


# Load the dataset
mtcars = dataset("datasets", "mtcars")

# Enhanced scatter plot with Gadfly
p = Gadfly.plot(layer(mtcars, x=:WT, y=:MPG, Geom.point, color=[colorant"#005b80"], size=[1mm]),
         Guide.title("When A Car's Weight Increases, Miles Per Gallon Decreases.\nMinimal Aesthetic using Themes and Customization.\nEnhanced Scatter Plot With Attention to Detail."),
         Guide.xlabel("Weight"),
         Guide.ylabel("Miles per Gallon"),
         Theme(
             background_color="white",
             major_label_font_size=11pt,
             minor_label_font_size=10pt,
             key_title_font_size=10pt
         ))


# Display the plot
display(p)

The post Effortless Cool Is A Detailed Process – How To Influence, Tell Better Stories, and Become More Persuasive appeared first on Remix Institute.

Miami-Dade County Public Employee Salary Research – An Analysis in R, Python, and Julia

Douglas Pestana - @DougVegas — Wed, 21 Jun 2023 04:15:49 +0000

This analysis was co-authored by data scientists, Scott Fisher and Douglas Davila-Pestana. Mr. Fisher developed the Python code for the analysis, and Mr. Davila-Pestana developed the R and Julia code for the analysis.

Miami-Dade County Employee Pay: A Comprehensive Data Analysis for Better Transparency

Miami is known for its beaches and its warm weather all year round. There are over 2.6 million people living in sunny Miami-Dade County, Florida (according to the US Census Bureau), and it’s the largest county in Florida in terms of population. It has a two-tier system of government: city and county. The city government is the first tier, providing local services like police and fire protection, and enforcing city codes, and there are 34 municipalities in Miami-Dade County. Each city, including the largest one, the City of Miami, pays for these services with its own city taxes. The county government is the second tier, handling metropolitan services such as running airports and seaports, emergency management, providing public housing and healthcare, transportation, environmental services, and disposing of solid waste. These services are funded by county taxes.

As such, the county government plays an important role in serving the public, and the Miami-Dade County Government contains numerous departments and employs thousands of employees.

Have you ever wondered how government employee salaries compare across departments, roles, and against national averages? We’ll explore the latest employee pay data from Miami-Dade County for a deep dive into these questions. The dataset we’ll look at is public data from opendata.miamidade.gov which was last updated on February 22, 2023.

Transparency is a cornerstone of republic governance and trust-building between citizens and their government. With this in mind, we delve into this open, public data. Unraveling the details of this data not only increases transparency but also allows us to evaluate the efficiency of how taxpayer dollars are allocated.

A Closer Look at Departments and High Salaries

The dataset comprises salary information from employees of various Miami-Dade County departments. Using this data, we can calculate several statistical summaries by job title and department name, including:

employee count
average and median salaries
salary standard deviation
salary percentiles
total sum of department salaries
department’s ranking by median salary
department’s ranking by employee count
Budget Bloat Composite Score Rank – a composite score ranking that signals the department has a high count of employees with high median salaries

Here are a few metrics about Miami-Dade County Government employees based on those summaries:

Average Salary is $75,722
Median Salary is $69,562
There are 45 government departments
There are 29,407 employees
The total annual salary budget is $2.22B

For comparison purposes, the median household income of everyone who lives in Miami-Dade County is $57,815, according to the US Census Bureau. Similar median household income figures can be found from the Federal Reserve Economic Data. And that’s household income, which means it’s the combined income total of everyone living in the household. This means that a single Miami-Dade County government employee has a median salary that is 20% higher than the median household income of residents living in Miami-Dade County.

The top 5 departments with the highest median salaries are the County Attorney’s Office, Independent Civilian Panel, Citizens Independent Transportation Trust, Information Technology, and Inspector General department. These top 5 departments all have a median salary of $120,000 or higher. 10 out of the 45 government departments have a median salary of $100,000 or more. You can view the median salary by department in the chart below:

There’s only a few departments of the Miami Dade-County government which make up the majority of the annual salary budget. The top 2 departments which make up the largest percent of the annual salary budget are the Police department and Fire Rescue department, which combined comprise almost 1/3 of the total annual salary budget (30%). The top 6 departments comprise almost 2/3 of the total annual salary budget (64%): Police, Fire Rescue, Transportation and Public Works, Corrections and Rehabilitation, Water and Sewer, and Aviation.

Departments with high count of employees and with high median salaries are going to have the highest salary budget bloat. That’s where we calculated the Budget Bloat Composite Score Rank that was discussed previously in the article. Based on the Budget Bloat Composite Score Rank, the top 5 departments with the highest salary budget bloat are: Fire Rescue, Police, Information Technology, Corrections and Rehabilitation, and County Attorney’s Office.

The $100K Club

We also analyzed how many government employees for Miami-Dade County make over $100,000 annually. The figure stands at 6,598, representing 22% of the total 29,407 Miami-Dade County government employees. This shows that while a six-figure salary is not uncommon, it’s only a reality for a minority of employees.

Top Paid Positions

We also examined the highest-paid roles across the Miami-Dade County government. Of the total unique 3,290 job positions, the top 96 positions emerged as having the highest median salaries of $200,000 or more per year. These lucrative roles are not confined to specific departments. The table below shows the top 40 annual median salaries by title and department name.

Average Salaries of Data and Analyst Positions

Data and analyst roles have been garnering increased attention in the media due to their pivotal role in driving insights, supporting optimal decision-making, and developing AI-based solutions. Our analysis found that these roles earn an average of $91,126 as an employee in the Miami-Dade County government. The table below shows the top 40 annual median salaries by department for job titles with the word “data” or “analyst” in its name.

Salary Comparisons with National, State, and City Averages

Comparing the salaries within the Miami-Dade County government to overall national, state, and city averages offers a broader perspective.

For instance, the average attorney working for the Miami-Dade County government earns $219,297, which is over $100,000 higher than the Miami city average from Payscale of $91,573 and the Miami city average from Glassdoor of $103, 515.

Similarly, the average Police Officer working for the Miami-Dade County government earns $90,098, if you look at the table by Title and Department above. This is over $15,000 higher than both the Florida state and national average Police Officer salaries of $73,350 and $71,380 respectively, according the US Bureau of Labor Statistics.

The average Firefighter working for the Miami-Dade County government earns $96,744, if you look at the table by Title and Department above. This is over $40,000 higher than both the Florida state and national average Firefighter salaries of $56,560 and $56,310 respectively, according to the US Bureau of Labor Statistics.

It’s worth noting that some of these salaries, particularly within the police and fire rescue departments, may be higher due to the influence of labor unions such as the South Florida Police Benevolent Association (PBA) and the Metro-Dade Firefighters Local 1403. These unions often negotiate for better pay, benefits, and working conditions for their members. In fact, the South Florida PBA represents the following units: Miami-Dade Police Department, Miami-Dade County Department of Corrections and Rehabilitation, Commission on Ethics and Public Trust Employees, and Miami-Dade Animal Services. The presence of police and firefighter unions in Miami-Dade County could contribute to higher-than-average salaries for these roles, a factor that is important to consider when comparing salaries across different regions and sectors.

North Side of Stephen P. Clark Government Center Building, Miami, FL. Miami-Dade County Government HQs.

“Peasant” Labor: Income Inequality Within Departments

An intriguing facet of our analysis is the income inequality metrics we calculated: the 20/20 ratio, the Palma ratio, the 10/50 ratio, and the Coefficient of Variation.

Let’s break down these inequality metrics in layman’s terms:

20/20 Ratio: This metric compares the top 20th percentile salary to the bottom 20th percentile salary within a department. If the 20/20 ratio is significantly greater than 1, it suggests that higher earners in the department make much more than the lower earners, indicating wage disparity.
Palma Ratio: Similar to the 20/20 Ratio, the Palma Ratio compares the top 10th percentile salary to the bottom 40th percentile. This metric focuses more on the disparity between the very top earners and the middle band.
10/50 Ratio: Similar to both the 20/20 Ratio and the Palma Ratio, this metric compares the top 10th percentile salary to the bottom 50th percentile salary within a department. If the 10/50 ratio is significantly greater than 1, it suggests that very top earners in the department make much more than the median earner, indicating significant wage disparity. The 10/50 Ratio is what is displayed in the chart below.
Coefficient of Variation (CV): The CV is the ratio of the standard deviation of the salaries to their mean. A higher CV indicates more significant disparity in the salary range, while a lower CV shows more uniformity.

These metrics provide a robust picture of income distribution within each department, highlighting areas of wage disparity that may warrant further investigation. The departments towards the top of the list may mean that the bottom 50th percentile of employees there may be treated as peons or “peasant” labor. The top 5 departments with the highest income inequality among employees in that department are: Cultural Affairs, Board of County Commissioners, Elections, Office of the Mayor, and the County Attorney’s Office. In these departments, the top 10% of earners make about 2x more than the bottom 50% of earners.

Concluding Thoughts

In conclusion, delving into public salary data reveals vital insights about income distribution, wage disparity, and overall payroll spending within the Miami-Dade County government. Increased transparency and careful analysis of these data points can lead to better accountability and more effective use of taxpayer dollars. Ensuring equitable wage distribution and efficient allocation of public funds is an ongoing process, and openness of data plays a crucial role in facilitating this endeavor.

Remix Institute Membership

If you liked this article, join our free Membership as a Stádas Genesis member for access to exclusive courses on getting started with R, Julia, data science, and AI. Elevate your skills and gain an edge in the industry. Sign up now for an elite learning experience.

R Code

				
					# start time
start_time = Sys.time()

# Load required data preparation libraries
library(data.table)
library(dplyr)
library(magrittr)
# Load required visualization libraries
library(ggplot2)
library(extrafont)
library(scales)
library(grid)
library(magick)




# Data Source: opendata.miamidade.gov ------

# Link: https://gis-mdc.opendata.arcgis.com/datasets/employee-pay-information-1/explore
# Dataset name: "Employee Pay Information"



# IMPORT EMPLOYEE PAY DATA FROM MIAMI DADE COUNTY --------

employee_dataframe = data.table::fread("~/Employee_Pay_Information.csv", header = TRUE, stringsAsFactors = FALSE)
# add salary group
employee_dataframe$AnnualSalaryGroup = ifelse(employee_dataframe$AnnualSalary < 50000, "A - $0 to $49,999",
                                              ifelse(employee_dataframe$AnnualSalary < 75000, "B - $50,000 to $74,999",
                                                     ifelse(employee_dataframe$AnnualSalary < 100000, "C - $75,000 to $99,999",
                                                            ifelse(employee_dataframe$AnnualSalary < 200000, "D - $100,000 to $199,999",
                                                                   ifelse(employee_dataframe$AnnualSalary >= 200000, "E - $200,000+", "F - Unknown")))))
# what is the average salary?
print(paste0("The average salary of Miami-Dade County government employees is: ", mean(employee_dataframe$AnnualSalary, na.rm = TRUE) %>% round(0)))
# what is the median salary?
print(paste0("The median salary of Miami-Dade County government employees is: ", median(employee_dataframe$AnnualSalary, na.rm = TRUE) %>% round(0)))
# how many employees?
print(paste0("The total number of Miami-Dade County government employees is: ", nrow(employee_dataframe)))
# what is the total salary?
print(paste0("The total salary of Miami-Dade County government employees is: ", sum(employee_dataframe$AnnualSalary, na.rm = TRUE) %>% round(2)))


# compare median salary of miami dade county govt employees to entire labor force in miami-dade county
# US Census Bureau: https://www.census.gov/quickfacts/fact/table/miamidadecountyflorida/POP060210
# Federal Reserve Economic Data: https://fred.stlouisfed.org/series/MHIFL12086A052NCEN



# DATA PREPARATION AND ANALYSIS ------


# percentile distributions - total and by dept

# summary by dept
employee_dataframe_dept_summ = employee_dataframe %>% dplyr::group_by(DeptName) %>% 
  dplyr::summarize(Employee_Counts = n(), 
                   Avg_Salary = mean(AnnualSalary, na.rm = TRUE),
                   Median_Salary = median(AnnualSalary, na.rm = TRUE),
                   StdDev_Salary = sd(AnnualSalary, na.rm = TRUE),
                   Salary_CoefficientOfVariation = sd(AnnualSalary, na.rm = TRUE) / mean(AnnualSalary, na.rm = TRUE),
                   Salary_20_20_Ratio = (quantile(AnnualSalary, c(.80)) / quantile(AnnualSalary, c(.20))) %>% as.numeric(), # https://en.wikipedia.org/wiki/Income_inequality_metrics
                   Salary_Palma_Ratio = (quantile(AnnualSalary, c(.90)) / quantile(AnnualSalary, c(.40))) %>% as.numeric(), # https://en.wikipedia.org/wiki/Income_inequality_metrics
                   Salary_10_50_Ratio = (quantile(AnnualSalary, c(.90)) / quantile(AnnualSalary, c(.50))) %>% as.numeric(),
                   Min_Salary = min(AnnualSalary, na.rm = TRUE),
                   Salary_20thPercentile = quantile(AnnualSalary, c(.20)) %>% as.numeric(),
                   Salary_40thPercentile = quantile(AnnualSalary, c(.40)) %>% as.numeric(),
                   Salary_80thPercentile = quantile(AnnualSalary, c(.80)) %>% as.numeric(),
                   Salary_90thPercentile = quantile(AnnualSalary, c(.90)) %>% as.numeric(),
                   Max_Salary = max(AnnualSalary, na.rm = TRUE),
                   Sum_Salary = sum(AnnualSalary, na.rm = TRUE)
  )
# sort it by median salary
employee_dataframe_dept_summ = employee_dataframe_dept_summ[order(employee_dataframe_dept_summ$Median_Salary, decreasing = TRUE), ]
# add % of total Miami-Dade County salary
employee_dataframe_dept_summ$Pct_Of_TotalMDCounty_Salary = round(employee_dataframe_dept_summ$Sum_Salary / sum(employee_dataframe_dept_summ$Sum_Salary, na.rm = TRUE), 4)
# add % of total Miami-Dade County employees
employee_dataframe_dept_summ$Pct_Of_TotalMDCounty_Employees = round(employee_dataframe_dept_summ$Employee_Counts / sum(employee_dataframe_dept_summ$Employee_Counts, na.rm = TRUE), 4)
# add a ranking of Median_Salary and Employee_Counts in descending order
employee_dataframe_dept_summ$Median_Salary_Rank = data.table::frank(-employee_dataframe_dept_summ$Median_Salary) %>% round(0)
employee_dataframe_dept_summ$Employee_Counts_Rank = data.table::frank(-employee_dataframe_dept_summ$Employee_Counts) %>% round(0)
# created a Composite Score by taking an average of the Median_Salary_Rank and Employee_Counts_Rank: a higher ranking represents a more bloated budget
employee_dataframe_dept_summ$Budget_Bloat_Composite_Score = rowMeans(cbind(employee_dataframe_dept_summ$Median_Salary_Rank, employee_dataframe_dept_summ$Employee_Counts_Rank), na.rm = TRUE) %>% round(1)
employee_dataframe_dept_summ$Budget_Bloat_Composite_Score_Rank = data.table::frank(employee_dataframe_dept_summ$Budget_Bloat_Composite_Score) %>% round(0)
# export to csv
data.table::fwrite(employee_dataframe_dept_summ, "~/Miami_Dade_County_Employee_Pay_Information_By_Dept_for_Chart.csv")



# export to csv
employee_dataframe_dept_summ_subset = subset(employee_dataframe_dept_summ, select = c(
  DeptName, 
  Budget_Bloat_Composite_Score_Rank, 
  Employee_Counts, 
  Median_Salary, 
  Avg_Salary, 
  Median_Salary_Rank, 
  Employee_Counts_Rank, 
  Budget_Bloat_Composite_Score, 
  Salary_10_50_Ratio, 
  Pct_Of_TotalMDCounty_Salary
)) %>%
  dplyr::rename(
    `Budget Bloat Composite Score Rank` =  Budget_Bloat_Composite_Score_Rank, 
    `Employee Counts` = Employee_Counts, 
    `Median Salary` = Median_Salary,
    `Average Salary` = Avg_Salary,
    `Median Salary Rank` = Median_Salary_Rank, 
    `Employee Counts Rank` = Employee_Counts_Rank, 
    `Budget Bloat Composite Score` = Budget_Bloat_Composite_Score, 
    `Salary 10/50 Ratio` = Salary_10_50_Ratio, 
    `% of Total Miami-Dade County Salary` = Pct_Of_TotalMDCounty_Salary
  )
# sort by Budget Bloat Composite Score Rank
employee_dataframe_dept_summ_subset = employee_dataframe_dept_summ_subset[order(employee_dataframe_dept_summ_subset$`Budget Bloat Composite Score Rank`, decreasing = FALSE), ]
data.table::fwrite(employee_dataframe_dept_summ_subset, "~/Miami_Dade_County_Employee_Pay_Information_By_Dept.csv")






# how many are employees make over $100K
employee_dataframe_gte100K = subset(employee_dataframe, AnnualSalary >= 100000)
print(paste0(nrow(employee_dataframe_gte100K), " of ", nrow(employee_dataframe), " Miami-Dade County government employees make over $100K in salary."))
# summary of employee salary ranges
employee_dataframe_salary_range_summ = employee_dataframe %>% dplyr::group_by(AnnualSalaryGroup) %>%
  dplyr::summarize(`Employee Counts` = n())
# add % of total Miami-Dade County employees
employee_dataframe_salary_range_summ$`Percent of Total Miami-Dade County Govt Employees` = round(employee_dataframe_salary_range_summ$`Employee Counts` / sum(employee_dataframe_salary_range_summ$`Employee Counts`, na.rm = TRUE), 4)
# add totals row
employee_dataframe_salary_range_summ[nrow(employee_dataframe_salary_range_summ) + 1, ] = cbind(NA, NA, NA)
employee_dataframe_salary_range_summ[nrow(employee_dataframe_salary_range_summ), "AnnualSalaryGroup"] = "TOTAL"
employee_dataframe_salary_range_summ[nrow(employee_dataframe_salary_range_summ), "Employee Counts"] = sum(employee_dataframe_salary_range_summ$`Employee Counts`, na.rm = TRUE)
employee_dataframe_salary_range_summ[nrow(employee_dataframe_salary_range_summ), "Percent of Total Miami-Dade County Govt Employees"] = sum(employee_dataframe_salary_range_summ$`Percent of Total Miami-Dade County Govt Employees`, na.rm = TRUE)


# export to csv
data.table::fwrite(employee_dataframe_salary_range_summ, "~/Miami_Dade_County_Employee_Pay_Information_By_Salary_Range.csv")




# top paid positions

# summary by title and dept
employee_dataframe_title_and_dept_summ = employee_dataframe %>% dplyr::group_by(Title, DeptName) %>% 
  dplyr::summarize(`Employee Counts` = n(), 
                   `Average Salary` = mean(AnnualSalary, na.rm = TRUE),
                   `Median Salary` = median(AnnualSalary, na.rm = TRUE),
                   `Max Salary` = max(AnnualSalary, na.rm = TRUE),
  )
# sort it by median salary
employee_dataframe_title_and_dept_summ = employee_dataframe_title_and_dept_summ[order(employee_dataframe_title_and_dept_summ$`Median Salary`, decreasing = TRUE), ]


# export to csv
data.table::fwrite(employee_dataframe_title_and_dept_summ, "~/Miami_Dade_County_Employee_Pay_Information_By_Title_And_Dept.csv")



# compare with average lawyers salary in miami

# Miami (Payscale): https://www.payscale.com/research/US/Job=Attorney_%2F_Lawyer/Salary/b37de149/Miami-FL
# Miami (Glassdoor): https://www.glassdoor.com/Salaries/miami-lawyer-salary-SRCH_IL.0,5_IC1154170_KO6,12.htm
employee_dataframe_lawyer = subset(employee_dataframe, tolower(Title) %like% 'attorney')
# average salary of attorney positions
print(paste0("The average salary of of an attorney in Miami-Dade County govt is: ", mean(employee_dataframe_lawyer$AnnualSalary, na.rm = TRUE) %>% round(0)))


# compare with average policy salary in the us and florida
# https://www.bls.gov/oes/current/oes333051.htm


# compare with average firefighter salary in the us and florida
# https://www.bls.gov/oes/current/oes332011.htm





# find "DATA" or "ANALYST" positions and see how much they're paid. 
employee_dataframe_data_and_analyst_jobs = subset(employee_dataframe_title_and_dept_summ, tolower(Title) %like% 'analyst' | tolower(Title) %like% 'data')
employee_dataframe_data_and_analyst = subset(employee_dataframe, tolower(Title) %like% 'analyst' | tolower(Title) %like% 'data')
# average salary of "DATA" or "ANALYST" positions
print(paste0("The average salary of a data or analyst employee in Miami-Dade County govt is: ", mean(employee_dataframe_data_and_analyst$AnnualSalary, na.rm = TRUE) %>% round(0)))


# export to csv
data.table::fwrite(employee_dataframe_data_and_analyst_jobs, "~/Miami_Dade_County_Data_And_Analyst_Employee_Pay_Information_By_Title_And_Dept.csv")



# end time
end_time = Sys.time()

# how long did it take
duration = end_time - start_time
print("The process took ")
print(duration)


# DATA VISUALIZATION CHARTS -----------


# Font Import
extrafont::font_import(pattern = "Roboto")
y
extrafont::loadfonts(device = "win")


logo_path <- "https://www.remixinstitute.com/wp-content/uploads/2023/05/Remix_Institute_Inline_White.png" 



# Sorted Bar Chart for Median Salaries by Department
median_salary_plot = ggplot2::ggplot(employee_dataframe_dept_summ, aes(reorder(DeptName, Median_Salary), Median_Salary)) +
  geom_bar(stat = 'identity', fill = '#F3E600') +
  geom_text(aes(label=scales::dollar(Median_Salary)), hjust=1, color="black", fontface = "bold") +
  geom_text(aes(label=DeptName), hjust=-0.1, color="white", fontface = "bold") +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar, expand = expansion(mult = c(0, 0.1)), limits = c(0, 200000)) + # set max limit of Median_Salary axis to 200,000
  labs(title = "Median Annual Salary of Miami-Dade County Government Employees",
       subtitle = "By Department. Based on Public Data.",
       caption = "Source: opendata.miamidade.gov") +
  theme_minimal() +
  theme(plot.background = element_rect(fill = '#1c1c1c'),
        plot.margin = margin(5.5, 5.5, 5.5, 10, "pt"), # add a 10pt margin to the left of the plot
        text = element_text(family = "Roboto", color = "white"),
        axis.text.y = element_blank(),  # remove DeptName axis text
        axis.text.x = element_text(color = "white", size = 14),  # increase Median_Salary axis text font size
        plot.title = element_text(size = 20, hjust = 0, face = "bold"),
        plot.subtitle = element_text(size = 12, hjust = 0),
        plot.caption = element_text(size = 12, hjust = 0),  # increase caption font size
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank(),
        axis.ticks.y = element_blank())

# Remix Institute branding
logo = magick::image_read(logo_path)
median_salary_plot
grid::grid.raster(logo, x = .87, y = .93, just = c('left', 'bottom'), width = 0.12)





# Sorted Bar Chart for Total Miami-Dade County Salary Consumption by Department
salary_percentage_plot = ggplot2::ggplot(employee_dataframe_dept_summ, aes(reorder(DeptName, Pct_Of_TotalMDCounty_Salary), Pct_Of_TotalMDCounty_Salary*100)) +
  geom_bar(stat = 'identity', fill = '#00aa9d') +
  geom_text(aes(label=sprintf("%.2f%%", Pct_Of_TotalMDCounty_Salary*100)), hjust=1, color="white", fontface = "bold") +
  #geom_text(aes(label=DeptName), hjust=-0.1, color="white", fontface = "bold") +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0.03, 0.2))) + # 3% space below the bars but 20% above them 
  labs(title = "Which Miami-Dade County Government Employees Have The Highest Percentage\nof the Annual Salary Budget?",
       subtitle = "Total Annual Salary Distribution by Department. Based on Public Data",
       caption = "Source: opendata.miamidade.gov") +
  theme_minimal() +
  theme(plot.background = element_rect(fill = '#1c1c1c'),
        plot.margin = margin(5.5, 5.5, 5.5, 10, "pt"), # add a 10pt margin to the left of the plot
        text = element_text(family = "Roboto", color = "white"),
        #axis.text.y = element_blank(),  # remove DeptName axis text
        axis.text.y = element_text(color = "white"),
        axis.text.x = element_text(color = "white", size = 14),  # increase Pct_Of_TotalMDCounty_Salary axis text font size
        plot.title = element_text(size = 20, hjust = 0, face = "bold"),
        plot.subtitle = element_text(size = 12, hjust = 0),
        plot.caption = element_text(size = 12, hjust = 0), # increase caption font size
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank(),
        axis.ticks.y = element_blank())

# Remix Institute branding
logo = magick::image_read(logo_path)
salary_percentage_plot
grid::grid.raster(logo, x = .87, y = .93, just = c('left', 'bottom'), width = 0.12)






# Sorted Bar Chart for Miami-Dade County Salary Income Inequality by Department
income_inequality_plot = ggplot2::ggplot(subset(employee_dataframe_dept_summ, Median_Salary >= 50000 & Employee_Counts >= 10), aes(reorder(DeptName, Salary_10_50_Ratio), Salary_10_50_Ratio)) +
  geom_bar(stat = 'identity', fill = '#ed2590') +
  geom_text(aes(label = paste0(round(Salary_10_50_Ratio,1),"x")), hjust=1, color="white", fontface = "bold") +
  #geom_text(aes(label=DeptName), hjust=-0.1, color="white", fontface = "bold") +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0.02, 0.2)), limits = c(1, 3), oob = scales::squish) + # 2% space below the bars but 20% above them, limit from 1 to 4
  labs(title = "Miami-Dade County Government Departments with Highest Income Inequalities",
       subtitle = "What the top 10% Annual Salary Earners make versus the bottom 50% Annual Salary Earners. For Departments\nwith Median Annual Salary >= $50K and More Than 10 Employees. Based on Public Data.",
       caption = "Source: opendata.miamidade.gov") +
  theme_minimal() +
  theme(plot.background = element_rect(fill = '#1c1c1c'),
        plot.margin = margin(5.5, 5.5, 5.5, 10, "pt"), # add a 10pt margin to the left of the plot
        text = element_text(family = "Roboto", color = "white"),
        axis.text.y = element_text(color = "white"),
        axis.text.x = element_text(color = "white", size = 14),  # increase Salary_10_50_Ratio axis text font size
        plot.title = element_text(size = 20, hjust = 0, face = "bold"),
        plot.subtitle = element_text(size = 12, hjust = 0),
        plot.caption = element_text(size = 12, hjust = 0), # increase caption font size
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank(),
        axis.ticks.y = element_blank())


# Remix Institute branding
logo = magick::image_read(logo_path)
income_inequality_plot
grid::grid.raster(logo, x = .87, y = .93, just = c('left', 'bottom'), width = 0.12)

Julia Code

				
					# start time
using Dates
start_time = now()

# Load required data preparation packages
using CSV
using DataFrames
using Statistics
using Printf
using Missings
using DataFramesMeta
using Pipe
using Glob
using Chain


# Data Source: opendata.miamidade.gov ------

# Link: https://gis-mdc.opendata.arcgis.com/datasets/employee-pay-information-1/explore
# Dataset name: "Employee Pay Information"



# IMPORT EMPLOYEE PAY DATA FROM MIAMI DADE COUNTY --------

employee_dataframe = CSV.read("~/Employee_Pay_Information.csv", DataFrame)

# Add Salary Group
employee_dataframe.AnnualSalaryGroup = ifelse.(employee_dataframe.AnnualSalary .< 50000, "A - \$0 to \$49,999",
                                               ifelse.(employee_dataframe.AnnualSalary .< 75000, "B - \$50,000 to \$74,999",
                                                      ifelse.(employee_dataframe.AnnualSalary .< 100000, "C - \$75,000 to \$99,999",
                                                             ifelse.(employee_dataframe.AnnualSalary .< 200000, "D - \$100,000 to \$199,999",
                                                                    ifelse.(employee_dataframe.AnnualSalary .>= 200000, "E - \$200,000+", "F - Unknown")))))

# What is the average salary?
println("The average salary of Miami-Dade County government employees is: ", @sprintf("%.0f", mean(skipmissing(employee_dataframe.AnnualSalary))))

# What is the median salary?
println("The median salary of Miami-Dade County government employees is: ", @sprintf("%.0f", median(skipmissing(employee_dataframe.AnnualSalary))))

# How many employees?
println("The total number of Miami-Dade County government employees is: ", nrow(employee_dataframe))

# What is the total salary?
println("The total salary of Miami-Dade County government employees is: ", @sprintf("%.2f", sum(skipmissing(employee_dataframe.AnnualSalary))))

# Compare median salary of Miami Dade County govt employees to entire labor force in Miami-Dade County
# US Census Bureau: https://www.census.gov/quickfacts/fact/table/miamidadecountyflorida/POP060210
# Federal Reserve Economic Data: https://fred.stlouisfed.org/series/MHIFL12086A052NCEN



# DATA PREPARATION AND ANALYSIS ------

# custom rank function
function rank(v)
    n = length(v)
    r = sortperm(v)
    r_inv = Array{Int64}(undef, n)
    r_inv[r] = 1:n
    return r_inv
end

# percentile distributions - total and by dept

# Summary by Dept
group_by = groupby(employee_dataframe, :DeptName)
employee_dataframe_dept_summ = combine(group_by,
  :AnnualSalary => (x -> count(!ismissing, x)) => :Employee_Counts,
  :AnnualSalary => (x -> mean(skipmissing(x))) => :Avg_Salary,
  :AnnualSalary => (x -> median(skipmissing(x))) => :Median_Salary,
  :AnnualSalary => (x -> std(skipmissing(x))) => :StdDev_Salary,
  :AnnualSalary => (x -> std(skipmissing(x)) / mean(skipmissing(x))) => :Salary_CoefficientOfVariation,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.8) / quantile(skipmissing(x), 0.2)) => :Salary_20_20_Ratio,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.9) / quantile(skipmissing(x), 0.4)) => :Salary_Palma_Ratio,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.9) / quantile(skipmissing(x), 0.5)) => :Salary_10_50_Ratio,
  :AnnualSalary => (x -> minimum(skipmissing(x))) => :Min_Salary,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.2)) => :Salary_20thPercentile,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.4)) => :Salary_40thPercentile,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.8)) => :Salary_80thPercentile,
  :AnnualSalary => (x -> quantile(skipmissing(x), 0.9)) => :Salary_90thPercentile,
  :AnnualSalary => (x -> maximum(skipmissing(x))) => :Max_Salary,
  :AnnualSalary => (x -> sum(skipmissing(x))) => :Sum_Salary
)


# Sort it by Median Salary
sort!(employee_dataframe_dept_summ, :Median_Salary, rev=true)

# Add % of Total Miami-Dade County Salary
employee_dataframe_dept_summ.Pct_Of_TotalMDCounty_Salary = round.(
    employee_dataframe_dept_summ.Sum_Salary / sum(employee_dataframe_dept_summ.Sum_Salary), digits=4)

# Add % of Total Miami-Dade County Employees
employee_dataframe_dept_summ.Pct_Of_TotalMDCounty_Employees = round.(
    employee_dataframe_dept_summ.Employee_Counts / sum(employee_dataframe_dept_summ.Employee_Counts), digits=4)

# Add a Ranking of Median Salary and Employee Counts in Descending Order
employee_dataframe_dept_summ.Median_Salary_Rank = round.(rank(-employee_dataframe_dept_summ.Median_Salary), digits=0)
employee_dataframe_dept_summ.Employee_Counts_Rank = round.(rank(-employee_dataframe_dept_summ.Employee_Counts), digits=0)

# Created a Composite Score by Taking an Average of the Median Salary Rank and Employee Counts Rank: a Higher Ranking Represents a More Bloated Budget
employee_dataframe_dept_summ.Budget_Bloat_Composite_Score = round.([mean([employee_dataframe_dept_summ.Median_Salary_Rank[i], employee_dataframe_dept_summ.Employee_Counts_Rank[i]]) for i in 1:nrow(employee_dataframe_dept_summ)], digits=1)
employee_dataframe_dept_summ.Budget_Bloat_Composite_Score_Rank = round.(rank(employee_dataframe_dept_summ.Budget_Bloat_Composite_Score), digits=0)
# Export to CSV
CSV.write("~/Miami_Dade_County_Employee_Pay_Information_By_Dept_for_Chart_JULIA.csv", employee_dataframe_dept_summ)

# Export to CSV
employee_dataframe_dept_summ_subset = select(employee_dataframe_dept_summ, 
    :DeptName, :Budget_Bloat_Composite_Score_Rank, :Employee_Counts, :Median_Salary, :Avg_Salary, 
    :Median_Salary_Rank, :Employee_Counts_Rank, :Budget_Bloat_Composite_Score, :Salary_10_50_Ratio, 
    :Pct_Of_TotalMDCounty_Salary)

rename!(employee_dataframe_dept_summ_subset, [
    :DeptName => "Department Name", 
    :Budget_Bloat_Composite_Score_Rank => "Budget Bloat Composite Score Rank", 
    :Employee_Counts => "Employee Counts", 
    :Median_Salary => "Median Salary", 
    :Avg_Salary => "Average Salary",
    :Median_Salary_Rank => "Median Salary Rank", 
    :Employee_Counts_Rank => "Employee Counts Rank", 
    :Budget_Bloat_Composite_Score => "Budget Bloat Composite Score",
    :Salary_10_50_Ratio => "Salary 10/50 Ratio", 
    :Pct_Of_TotalMDCounty_Salary => "% of Total Miami-Dade County Salary"
])
# sort by Budget Bloat Composite Score Rank
sort!(employee_dataframe_dept_summ_subset, "Budget Bloat Composite Score Rank")
CSV.write("~/Miami_Dade_County_Employee_Pay_Information_By_Dept_JULIA.csv", employee_dataframe_dept_summ_subset)






# how many employees make over $100K
employee_dataframe_gte100K = @subset(employee_dataframe, :AnnualSalary .>= 100000)
@printf("%d of %d Miami-Dade County government employees make over \$100K in salary.\n", nrow(employee_dataframe_gte100K), nrow(employee_dataframe))

# summary of employee salary ranges
employee_dataframe_salary_range_summ = @chain employee_dataframe begin
    groupby(:AnnualSalaryGroup)
    combine(:AnnualSalaryGroup => length => "Employee Counts")
end

# add % of total Miami-Dade County employees
employee_dataframe_salary_range_summ."Percent of Total Miami-Dade County Govt Employees" = round.(employee_dataframe_salary_range_summ."Employee Counts" ./ sum(employee_dataframe_salary_range_summ."Employee Counts"), digits=4)

# add totals row
push!(employee_dataframe_salary_range_summ, ["TOTAL", sum(employee_dataframe_salary_range_summ."Employee Counts"), sum(employee_dataframe_salary_range_summ."Percent of Total Miami-Dade County Govt Employees")])

# export to csv
CSV.write("~/Miami_Dade_County_Employee_Pay_Information_By_Salary_Range_JULIA.csv", employee_dataframe_salary_range_summ)




# top paid positions

# summary by title and dept
group_by = groupby(employee_dataframe, [:Title, :DeptName])
employee_dataframe_title_and_dept_summ = combine(group_by,
  :AnnualSalary => (x -> count(!ismissing, x)) => "Employee Counts",
  :AnnualSalary => (x -> mean(skipmissing(x))) => "Average Salary",
  :AnnualSalary => (x -> median(skipmissing(x))) => "Median Salary",
  :AnnualSalary => (x -> maximum(skipmissing(x))) => "Max Salary"
)
# sort it by median salary
sort!(employee_dataframe_title_and_dept_summ, "Median Salary", rev=true)

# export to csv
CSV.write("~/Miami_Dade_County_Employee_Pay_Information_By_Title_And_Dept_JULIA.csv", employee_dataframe_title_and_dept_summ)



# compare with average lawyers salary in miami
employee_dataframe_lawyer = @subset(employee_dataframe, occursin.(lowercase("attorney"), lowercase.(employee_dataframe.Title)))

# average salary of attorney positions
@printf("The average salary of an attorney in Miami-Dade County govt is: %d\n", round(mean(employee_dataframe_lawyer.AnnualSalary)))


# compare with average policy salary in the us and florida
# https://www.bls.gov/oes/current/oes333051.htm


# compare with average firefighter salary in the us and florida
# https://www.bls.gov/oes/current/oes332011.htm





# find "DATA" or "ANALYST" positions and see how much they're paid. 
employee_dataframe_data_and_analyst_jobs = @subset(employee_dataframe_title_and_dept_summ, occursin.(lowercase("analyst"), lowercase.(employee_dataframe_title_and_dept_summ.Title)) .| occursin.(lowercase("data"), lowercase.(employee_dataframe_title_and_dept_summ.Title)))
employee_dataframe_data_and_analyst = @subset(employee_dataframe, occursin.(lowercase("analyst"), lowercase.(employee_dataframe.Title)) .| occursin.(lowercase("data"), lowercase.(employee_dataframe.Title)))

# average salary of "DATA" or "ANALYST" positions
@printf("The average salary of a data or analyst employee in Miami-Dade County govt is: %d\n", round(mean(employee_dataframe_data_and_analyst.AnnualSalary)))

# export to csv
CSV.write("~/Miami_Dade_County_Data_And_Analyst_Employee_Pay_Information_By_Title_And_Dept_JULIA.csv", employee_dataframe_data_and_analyst_jobs)



# end time
end_time = now()

# how long did it take
duration = end_time - start_time
println("The process took ", duration)

Python Code

				
					# start time
import pandas as pd
start_time = pd.Timestamp.now()

# Load required data preparation libraries
import pandas as pd
import numpy as np


# Data Source: opendata.miamidade.gov ------

# Link: https://gis-mdc.opendata.arcgis.com/datasets/employee-pay-information-1/explore
# Dataset name: "Employee Pay Information"


# IMPORT EMPLOYEE PAY DATA FROM MIAMI DADE COUNTY --------

employee_dataframe = pd.read_csv("~/Employee_Pay_Information.csv")
# add salary group
employee_dataframe["AnnualSalaryGroup"] = np.select(
    [
        employee_dataframe["AnnualSalary"] < 50000,
        employee_dataframe["AnnualSalary"] < 75000,
        employee_dataframe["AnnualSalary"] < 100000,
        employee_dataframe["AnnualSalary"] < 200000,
        employee_dataframe["AnnualSalary"] >= 200000,
    ],
    [
        "A - $0 to $49,999",
        "B - $50,000 to $74,999",
        "C - $75,000 to $99,999",
        "D - $100,000 to $199,999",
        "E - $200,000+",
    ],
    default="F - Unknown",
)

# what is the average salary?
avg_salary = employee_dataframe["AnnualSalary"].mean()
print(f"The average salary of Miami-Dade County government employees is: {avg_salary.round(0)}")

# what is the median salary?
median_salary = employee_dataframe["AnnualSalary"].median()
print(f"The median salary of Miami-Dade County government employees is: {median_salary.round(0)}")

# how many employees?
num_employees = employee_dataframe.shape[0]
print(f"The total number of Miami-Dade County government employees is: {num_employees}")

# what is the total salary?
total_salary = employee_dataframe["AnnualSalary"].sum()
print(f"The total salary of Miami-Dade County government employees is: {total_salary.round(2)}")

# compare median salary of miami dade county govt employees to entire labor force in miami-dade county
# US Census Bureau: https://www.census.gov/quickfacts/fact/table/miamidadecountyflorida/POP060210
# Federal Reserve Economic Data: https://fred.stlouisfed.org/series/MHIFL12086A052NCEN



# DATA PREPARATION AND ANALYSIS ------


# percentile distributions - total and by dept

# summary by dept
employee_dataframe_dept_summ = (employee_dataframe.groupby('DeptName')
                                .agg(Employee_Counts=('DeptName', 'size'),
                                     Avg_Salary=('AnnualSalary', 'mean'),
                                     Median_Salary=('AnnualSalary', 'median'),
                                     StdDev_Salary=('AnnualSalary', 'std'),
                                     Salary_CoefficientOfVariation=('AnnualSalary', lambda x: np.std(x) / np.mean(x)),
                                     Min_Salary=('AnnualSalary', 'min'),
                                     Salary_20thPercentile=('AnnualSalary', lambda x: np.percentile(x, 20)),
                                     Salary_40thPercentile=('AnnualSalary', lambda x: np.percentile(x, 40)),
                                     Salary_80thPercentile=('AnnualSalary', lambda x: np.percentile(x, 80)),
                                     Salary_90thPercentile=('AnnualSalary', lambda x: np.percentile(x, 90)),
                                     Max_Salary=('AnnualSalary', 'max'),
                                     Sum_Salary=('AnnualSalary', 'sum')))
# sort it by median salary
employee_dataframe_dept_summ = employee_dataframe_dept_summ.sort_values(by='Median_Salary', ascending=False)
# add % of total Miami-Dade County salary
employee_dataframe_dept_summ['Pct_Of_TotalMDCounty_Salary'] = (employee_dataframe_dept_summ['Sum_Salary'] / employee_dataframe_dept_summ['Sum_Salary'].sum()).round(4)
# add % of total Miami-Dade County employees
employee_dataframe_dept_summ['Pct_Of_TotalMDCounty_Employees'] = (employee_dataframe_dept_summ['Employee_Counts'] / employee_dataframe_dept_summ['Employee_Counts'].sum()).round(4)
# add a ranking of Median_Salary and Employee_Counts in descending order
employee_dataframe_dept_summ['Median_Salary_Rank'] = employee_dataframe_dept_summ['Median_Salary'].rank(ascending=False).round(0)
employee_dataframe_dept_summ['Employee_Counts_Rank'] = employee_dataframe_dept_summ['Employee_Counts'].rank(ascending=False).round(0)
# created a Composite Score by taking an average of the Median_Salary_Rank and Employee_Counts_Rank: a higher ranking represents a more bloated budget
employee_dataframe_dept_summ['Budget_Bloat_Composite_Score'] = (employee_dataframe_dept_summ[['Median_Salary_Rank', 'Employee_Counts_Rank']].mean(axis=1)).round(1)
employee_dataframe_dept_summ['Budget_Bloat_Composite_Score_Rank'] = employee_dataframe_dept_summ['Budget_Bloat_Composite_Score'].rank().round(0)
# export to csv
employee_dataframe_dept_summ.to_csv('~/Miami_Dade_County_Employee_Pay_Information_By_Dept_for_Chart_PYTHON.csv', index=False)



# export to csv
employee_dataframe_dept_summ_subset = employee_dataframe_dept_summ[['DeptName',
                                                                    'Budget_Bloat_Composite_Score_Rank',
                                                                    'Employee_Counts',
                                                                    'Median_Salary',
                                                                    'Avg_Salary',
                                                                    'Median_Salary_Rank',
                                                                    'Employee_Counts_Rank',
                                                                    'Budget_Bloat_Composite_Score',
                                                                    'Salary_10_50_Ratio',
                                                                    'Pct_Of_TotalMDCounty_Salary']]
employee_dataframe_dept_summ_subset = employee_dataframe_dept_summ_subset.rename(columns={'Budget_Bloat_Composite_Score_Rank': 'Budget Bloat Composite Score Rank',
                                                                                          'Employee_Counts': 'Employee Counts',
                                                                                          'Median_Salary': 'Median Salary',
                                                                                          'Avg_Salary': 'Average Salary',
                                                                                          'Median_Salary_Rank': 'Median Salary Rank',
                                                                                          'Employee_Counts_Rank': 'Employee Counts Rank',
                                                                                          'Budget_Bloat_Composite_Score': 'Budget Bloat Composite Score',
                                                                                          'Salary_10_50_Ratio': 'Salary 10/50 Ratio',
                                                                                          'Pct_Of_TotalMDCounty_Salary': '% of Total Miami-Dade County Salary'})
# sort by Budget Bloat Composite Score Rank
employee_dataframe_dept_summ_subset = employee_dataframe_dept_summ_subset.sort_values(by='Budget Bloat Composite Score Rank', ascending=True)
employee_dataframe_dept_summ_subset.to_csv('~/Miami_Dade_County_Employee_Pay_Information_By_Dept_PYTHON.csv', index=False)





# How many employees make over $100K
employee_dataframe_gte100K = employee_dataframe[employee_dataframe['AnnualSalary'] >= 100000]
print(f"{len(employee_dataframe_gte100K)} of {len(employee_dataframe)} Miami-Dade County government employees make over $100K in salary.")

# Summary of employee salary ranges
employee_dataframe_salary_range_summ = (employee_dataframe.groupby('AnnualSalaryGroup')
                                        .size()
                                        .reset_index(name='Employee Counts'))
# Add % of total Miami-Dade County employees
employee_dataframe_salary_range_summ['Percent of Total Miami-Dade County Govt Employees'] = (employee_dataframe_salary_range_summ['Employee Counts'] / employee_dataframe_salary_range_summ['Employee Counts'].sum()).round(4)

# Add totals row
totals_row = pd.DataFrame({'AnnualSalaryGroup': ['TOTAL'],
                           'Employee Counts': [employee_dataframe_salary_range_summ['Employee Counts'].sum()],
                           'Percent of Total Miami-Dade County Govt Employees': [employee_dataframe_salary_range_summ['Percent of Total Miami-Dade County Govt Employees'].sum()]})
employee_dataframe_salary_range_summ = pd.concat([employee_dataframe_salary_range_summ, totals_row])

# Export to CSV
employee_dataframe_salary_range_summ.to_csv('~/Miami_Dade_County_Employee_Pay_Information_By_Salary_Range_PYTHON.csv', index=False)



# top paid positions

# Summary by title and dept
employee_dataframe_title_and_dept_summ = (employee_dataframe.groupby(['Title', 'DeptName'])
                                          .agg(Employee_Counts=('AnnualSalary', 'size'),
                                               Average_Salary=('AnnualSalary', 'mean'),
                                               Median_Salary=('AnnualSalary', 'median'),
                                               Max_Salary=('AnnualSalary', 'max'))
                                          .reset_index())

# Sort it by median salary
employee_dataframe_title_and_dept_summ = employee_dataframe_title_and_dept_summ.sort_values(by='Median Salary', ascending=False)

# Export to CSV
employee_dataframe_title_and_dept_summ.to_csv('~/Miami_Dade_County_Employee_Pay_Information_By_Title_And_Dept_PYTHON.csv', index=False)



# Compare with average lawyers salary in Miami
# Miami (Payscale): https://www.payscale.com/research/US/Job=Attorney_%2F_Lawyer/Salary/b37de149/Miami-FL
# Miami (Glassdoor): https://www.glassdoor.com/Salaries/miami-lawyer-salary-SRCH_IL.0,5_IC1154170_KO6,12.htm
employee_dataframe_lawyer = employee_dataframe[employee_dataframe['Title'].str.lower().str.contains('attorney', na=False)]
# average salary of attorney positions
average_lawyer_salary = employee_dataframe_lawyer['AnnualSalary'].mean()
print(f"The average salary of an attorney in Miami-Dade County govt is: {round(average_lawyer_salary, 0)}")


# compare with average policy salary in the us and florida
# https://www.bls.gov/oes/current/oes333051.htm


# compare with average firefighter salary in the us and florida
# https://www.bls.gov/oes/current/oes332011.htm


# Find "DATA" or "ANALYST" positions and see how much they're paid.
employee_dataframe_data_and_analyst_jobs = employee_dataframe_title_and_dept_summ[
    employee_dataframe_title_and_dept_summ['Title'].str.lower().str.contains('analyst|data', na=False)]
employee_dataframe_data_and_analyst = employee_dataframe[
    employee_dataframe['Title'].str.lower().str.contains('analyst|data', na=False)]
# average salary of "DATA" or "ANALYST" positions
average_data_analyst_salary = employee_dataframe_data_and_analyst['AnnualSalary'].mean()
print(f"The average salary of a data or analyst employee in Miami-Dade County govt is: {round(average_data_analyst_salary, 0)}")

# Export to CSV
employee_dataframe_data_and_analyst_jobs.to_csv('~/Miami_Dade_County_Data_And_Analyst_Employee_Pay_Information_By_Title_And_Dept_PYTHON.csv', index=False)


# End time
end_time = pd.Timestamp.now()

# How long did it take
duration = end_time - start_time
print("The process took")
print(duration)

The post Miami-Dade County Public Employee Salary Research – An Analysis in R, Python, and Julia appeared first on Remix Institute.

How To Say No To Useless Data Science Projects And Start Working On What You Want

Douglas Pestana - @DougVegas — Fri, 20 Jan 2023 01:02:00 +0000

Data Scientists and Analysts are starting to overflow their cups with mostly non-value added work at the request of stakeholders. And many don’t have the courage or confidence to say “No” to that. This also applies to many working and business professionals in other verticals and professions. Put simply, you need to build confidence, stop being a wimp, and start saying “No” to pointless work.

Data Scientists can become the new decision-makers at companies and should be picking their own value-added projects

When a company hires a data science team, the customers of that team are typically non-technical stakeholders. They have a general idea of what they want to get out of data science, but they don’t know specifically what they want. Most of the time, the business units think of “data science and analytics” as just the team that runs reports and fetches them data when they need it (like a dog or a monkey). A data science and analytics team’s core responsibility isn’t building reports in Tableau and PowerBI; that’s the job for the Business Intelligence team (oddly, many BI professionals have started to call themselves analysts or data scientists). The job of a Data Scientist is basically:

Synthesize and mine the data to create actionable business recommendations from it
Automate decision-making using statistical, machine learning, and AI models
Create data science products and tools for end-users (what we at Remix Institute like to call Prefab AI or AI Augs – similar to the Deux Ex video game series).
Boost profit by finding opportunities for revenue growth, cost savings, operational efficiencies, and business process optimization

If the business unit stakeholders have your data science team working on anything else not on that list or anything that’s not providing value or growth, then you should not be working on it. Data Science teams should be picking their own projects based on the feedback and pain points of stakeholders. Data Science is not IT or BI and should not be building reports or taking ticket requests. (More on this below)

Reasons Why Non-Value Added Work Keeps Piling Up

Many data scientists and non-technical stakeholders are unaware of the Planning Fallacy. The Planning Fallacy is a psychological phenomenon which states that we are unrealistically optimistic about predictions on how much time it will take to complete a task and underestimate the actual time it will take. Since we’re quite bad at estimating how long things take, it means we’re saying “Yes” to projects more often than we should because we think it’ll take less time to complete. This causes us to be stretched thin and hit burnout and desperately try to find one of the most scarce and valuable resources: time. Saying “Yes” to projects feels good as it may strengthen the bond between you and the stakeholders, but sometimes people say “Yes” out of fear of being construed as unproductive, negative, a slow worker, and a non-team player.

5 Ways To Say No and Pick Your Own Projects

There are many practical ways of saying No to non-value added projects while at the same time being able to still be construed as a team player and being able to pick the projects you’d like to work on. Below are 5 practical ways of saying No:

Establish Hofstadter’s Law as one of the guiding operating principles of your data science team. Hofstadter’s Law is a recursive axiom that states it always takes longer than you expect, even when you factor in Hofstadter’s Law. As a simple heuristic, you should probably just double or triple the initial estimated time you gave to complete a project. This serves as a countermeasure to the Planning Fallacy.

Say “No” to project requests politely, but stand firm and don’t be a doormat. Business stakeholders will come from many departments, with competing priorities and incentives. When they approach you with some project that’s “high priority” for them, it may not be high priority or high growth potential for the company’s overall business objectives. An email or message saying something like this would be a polite way of saying No:

“Hello, we understand the unique desire for this project, and it looks like an interesting business problem. After reviewing the requirements, we’ve estimated the project completion times for it. Unfortunately, this is not something our team can work on at this time as we’re currently focused on other high priority growth initiatives, but we can have a planning discussion in the future. If you have an ROI analysis for this project, it’ll help us put it into the top of the queue.”

It’s important to stand firm because many times the stakeholders don’t like taking No for an answer, and they’ll try to sweet talk and convince you why you should actually work on their project. Just hear them out but still say No politely.
Prioritize projects by putting all of them on a PICK Chart. A PICK Chart is a Lean Six Sigma tool, developed by Lockheed Martin, for organizing and prioritizing project ideas. PICK stands for Possible, Implement, Challenge, Kill. Below is the breakdown of each:

- (P) Possible. Possibly work on the project if it’s Easy To Do but Low Payoff
- (I) Implement. Definitely work on the project and make it a priority if it’s Easy To Do and High Payoff
- (C) Challenge. Challenge the project if it’s Hard To Do but High Payoff
- (K) Kill. Kill and throw in the trash bin all projects that are Hard To Do and Low Payoff

PICK Chart

The best approach is choosing to focus on 1 or 2 key projects in the top left quadrant. This type of project prioritization is what we at Remix Institute call “Pareto Optimal Project Management.”

For this, you will need to outsource project prioritization to your manager for political cover. Of course, give him/her your ideas on what should be prioritized (Top Left Quadrant) in order to work on the projects you want to work on. As a data scientist, you should recommend prioritizing projects that are AI automation or machine learning use cases and de-prioritize analysis and reporting requests.

Pareto Optimal Project Management

Push Back Any Arbitrary Deadlines. Most deadlines are just pulled out of thin air and just provide unnecessary pressure. In the past, I’ve seen project requests come in from senior management, with an arbitrary deadline attached, and after a while of not hearing about the status of the project, senior management forgot they assigned it in the first place. In that case, the project was definitely useless busywork with a pointless deadline.

This is one of the untold truths about upper management: many times, they are just knowingly assigning busywork to you.

However, this is not something you can tell your stakeholders even though it’s true. Instead, you’ll need to communicate this in a more politically palatable manner. An email or message saying something like this would be more suitable to push back arbitrary deadlines:

“Hello, I understand that it’s important to meet this deadline. Would you be able to extend this by X days/weeks as we’re currently working on [PROJECTS IN TOP LEFT QUADRANT OF PICK CHART] which has been of high interest to the executives?”

Or

“If you want X done by Y, I’ll be able to deliver A and B of X, but not C and D of X.”

Begin working on what you want. Innovative and high ROI projects.

Establish Occam’s Razor as one of the guiding operating principles of your machine learning model development. This topic would require another blog post in of itself, but Occam’s Razor (or the Principle of Parsimony) posits that a model should be minimally adequate and the simplest model is the best model possible. Remember, machine learning is supposed to simplify business processes and automate decision-making so making a behemoth model with a large code base only means it’s a naval ship with low agility and high maintenance. This would mean that you’re spending a lot of your free time maintaining it, which reduces the amount of time you can work on other things. You should also introduce Occam’s Razor to your stakeholders so they don’t overcomplicate the machine learning model development either.

As stated, the Principle of Parsimony is an interesting topic and would be another blog post in itself. Simplest doesn’t always mean least technical debt or lowest maintenance. Sometimes a project does require complexity. That’s why it’s important to identify the “minimally adequate” solution that is right-sized. If all that’s needed is a trend line in Excel, do you really need a deep learning model?

Conclusion

Data Scientists can become the new decision-makers at a company and its most valuable department as long as the C-Suite starts to see their impact on the P&L. In order to do that, Data Science teams must stand firm, build confidence, and efficiently prioritize high-value work while de-prioritizing and even killing non-value added work. This includes learning the skills of effective project management and politely saying “No” to stakeholders. In doing so, Data Science teams can focus on high ROI, revenue boosting projects such as machine learning and automation products that can be integrated into business processes. This gets the attention of the C-Suite and elevates the status of Data Science.

The post How To Say No To Useless Data Science Projects And Start Working On What You Want appeared first on Remix Institute.