Using PowerQuery (part 1)

What is PowerQuery?
Power Query is a data connection technology in Excel that allows users to connect, combine, and refine data from various sources.
Key Features
- Data Importing: Pulls data from multiple sources (databases, web pages, files).
- Data Transformation: Offers tools to clean and reshape data, such as filtering, merging, and aggregating.
- User-Friendly Interface: Provides a visual interface for users to apply transformations without coding.
- Integration: Seamlessly integrates with Excel, allowing transformed data to be loaded into worksheets for analysis.
In essence, Power Query simplifies data preparation, enabling users to focus on analysis and insights.
PowerQuery vs. VLookup
| Feature | Power Query | VLOOKUP |
|---|---|---|
| Purpose | Data transformation and preparation | Look up and retrieve specific data values |
| Data Sources | Connects to multiple sources (Excel, SQL, web, etc.) | Limited to data within the worksheet |
| Complexity | More complex with advanced capabilities | Simpler and straightforward |
| Output | Creates tables or queries in Excel | Returns a single value based on a match |
| Flexibility | Highly versatile, allows for data shaping | Less flexible; limited to a single match type |
| Use Cases | Ideal for ETL processes and bulk data manipulation | Best for simple lookups within a single dataset |
| Maintenance | Requires more setup but easier to update over time | Simple to set up but can be error-prone if data changes |
Use-cases
Your workflow may not require PowerQuery, it may be overkill. But given how much fiddling with Excel formulas can happen, and how that can eat up time, PowerQuery honestly seems like it should be more standard in our field - especially because of how reproducible any given process is - the steps being recorded automatically so they could be reused or made into a template.
Abstract Example
You have data in your CRM that you want to combine with another dataset you’ve found or created. For example, perhaps you’ve created a custom list of individuals who have won awards, but have not updated each of their individual profiles in the database yet. So you are looking to add this supplemental info to help segment a list in an interesting way.
Steps to Achieve This
- Export Data from CRM: Start by exporting the relevant data from your CRM into a CSV or Excel format.
- Import Data into Power Query: Open Excel, go to the Data tab, and select “Get Data” to import your CRM data and your custom dataset.
- Combine Datasets: Use Power Query tools to merge or append the datasets based on a common identifier.
- Transform Data: Clean, filter, or reshape the data as necessary.
- Load the Combined Data: Once satisfied with the combined data, load it back into Excel for analysis or reporting.
More complex data engineering can be done with PowerQuery, and business intelligence analytics with PowerBI
Example from the Fundraising Job Title dataset project
Power Query M Code for Filtering Job Titles with Dynamic Keywords
This document outlines a Power Query M script that dynamically loads keyword lists from parameters and filters job title datasets across multiple sectors. The goal is to identify roles related to Major Gifts while excluding irrelevant ones.
Dynamic Keyword Parameters
Instead of hardcoding the keywords, we define them as parameters:
let
IncludeKeywords = Text.Split("Major Gift,Director Of Development,Development Officer,Major Gift Officer,Fundraiser,Philanthropy Officer,Philanthropy Manager", ","),
ExcludeKeywords = Text.Split("Prospect Research,Event", ",")Load and Filter Nonprofit Dataset
We load the nonprofit dataset and filter it using the dynamic keyword lists:
Source = Csv.Document(File.Contents("..\jobtitlesdataset\Figure 4\dataset\titles\map\55198385_nonprofit.csv"), [Delimiter=",", Columns=7, Encoding=65001, QuoteStyle=QuoteStyle.None]),
PromotedHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ChangedType = Table.TransformColumnTypes(PromotedHeaders, {
{"sector", type text},
{"title_cleaned", type text},
{"frequency_cleaned", Int64.Type},
{"title_generalized", type text},
{"frequency_generalized", Int64.Type},
{"title_simplified", type text},
{"frequency_simplified", Int64.Type}
}),
IncludedRows = Table.SelectRows(ChangedType, each List.AnyTrue(List.Transform(IncludeKeywords, (kw) => Text.Contains([title_cleaned], kw)))),
FinalFilteredRows = Table.SelectRows(IncludedRows, each List.AllTrue(List.Transform(ExcludeKeywords, (kw) => not Text.Contains([title_cleaned], kw)))),
TitleList = Table.SelectColumns(FinalFilteredRows, {"title_cleaned"}),Reusable Filtering Function
We define a reusable function to apply the same filtering logic to other datasets:
FilterDataset = (filePath as text) =>
let
Source = Csv.Document(File.Contents(filePath), [Delimiter=",", Columns=7, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ChangedType = Table.TransformColumnTypes(Promoted, {
{"sector", type text},
{"title_cleaned", type text},
{"frequency_cleaned", Int64.Type},
{"title_generalized", type text},
{"frequency_generalized", Int64.Type},
{"title_simplified", type text},
{"frequency_simplified", Int64.Type}
}),
Matched = Table.Join(ChangedType, "title_cleaned", TitleList, "title_cleaned", JoinKind.Inner)
in
Matched,Dataset Paths and Execution
We list all dataset paths and apply the filtering function:
DatasetPaths = {
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198352_travel_and_tourism.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198376_transportation_and_logistics.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55197857_restaurants_bars_and_food_services.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198349_real_estate.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198379_telecommunications.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198388_oil_gas_energy_and_utilities.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198394_retail.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198373_media.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198403_manufacturing.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198358_insurance.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198406_information_technology.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198382_government.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198397_health_care.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198391_education.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198400_finance.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198190_consumer_services.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198364_construction_repair_and_maintenance.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198409_business_services.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55197803_agriculture_and_forestry.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198355_arts_entertainment_and_recreation.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198370_biotech_and_pharmaceuticals.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198361_aerospace_and_defense.csv",
"..\jobtitlesdataset\Figure 4\dataset\titles\map\55198367_accounting_and_legal.csv"
},
FilteredTables = List.Transform(DatasetPaths, each FilterDataset(_)),
CombinedResults = Table.Combine(FilteredTables)
in
CombinedResultsSummary
This script dynamically loads keyword lists and applies consistent filtering across multiple datasets. It improves maintainability and scalability by avoiding hardcoded values.
References
Disclaimer - This content was summarized or partially generated by AI.