{"id":68,"date":"2023-03-20T20:40:52","date_gmt":"2023-03-20T20:40:52","guid":{"rendered":"http:\/\/luislourenco.com\/?p=68"},"modified":"2023-03-26T20:08:48","modified_gmt":"2023-03-26T20:08:48","slug":"articles-data-preparation-with-sql","status":"publish","type":"post","link":"http:\/\/luislourenco.com\/ro\/articles-data-preparation-with-sql\/","title":{"rendered":"Data preparation with SQL"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Common data preparation tasks in SQL<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/luislourenco.com\/wp-content\/uploads\/2023\/03\/funil-de-vendas.webp\" alt=\"\" class=\"wp-image-71\" width=\"740\" height=\"374\" srcset=\"http:\/\/luislourenco.com\/wp-content\/uploads\/2023\/03\/funil-de-vendas.webp 750w, http:\/\/luislourenco.com\/wp-content\/uploads\/2023\/03\/funil-de-vendas-300x152.webp 300w, http:\/\/luislourenco.com\/wp-content\/uploads\/2023\/03\/funil-de-vendas-18x9.webp 18w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Eliminating duplicates:<\/strong><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identify the columns that should be used to determine whether a row is a duplicate.<\/li>\n\n\n\n<li>Use the <code>DISTINCT<\/code> keyword to return only unique rows, or use the <code>GROUP BY<\/code> clause to group rows by the selected columns and use an aggregate function like <code>COUNT()<\/code> to identify duplicate rows.<\/li>\n\n\n\n<li>Use the <code>DELETE<\/code> statement to remove duplicate rows from the table.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to eliminate duplicates based on a single column:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT DISTINCT column_name\nFROM table_name;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><br>Example SQL code to eliminate duplicates based on multiple columns:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT DISTINCT column_name_1, column_name_2\nFROM table_name;\n<code>\n<\/code><\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Ensuring data format consistency:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identify the columns that should have consistent data formats.<\/li>\n\n\n\n<li>Use the <code>CAST()<\/code> function to convert data types to the desired format.<\/li>\n\n\n\n<li>Use the <code>SUBSTR()<\/code> function to extract a substring from a column if needed.<\/li>\n\n\n\n<li>Use the <code>CONCAT()<\/code> function to combine columns if needed.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to convert a string column to an integer column:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>SELECT CAST(column_name AS INTEGER)\nFROM table_name;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to extract a substring from a string column:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>SELECT SUBSTR(column_name, start_index, length)\nFROM table_name;\n<\/code><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to concatenate two columns:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>SELECT CONCAT(column_name_1, ' ', column_name_2)\nFROM table_name;\n<\/code><\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"3\">\n<li><strong>Validating data values:<\/strong><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identify the columns that should have valid data values.<\/li>\n\n\n\n<li>Use the <code>WHERE<\/code> clause to filter out rows with invalid data values.<\/li>\n\n\n\n<li>Use the <code>LIKE<\/code> operator to filter out rows with data values that do not match a specific pattern.<\/li>\n\n\n\n<li>Use the <code>IN<\/code> operator to filter out rows with data values that are not in a specific list.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to filter out rows with invalid data values:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>SELECT column_name\nFROM table_name\nWHERE column_name &gt; 0;\n<\/code><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to filter out rows with data values that do not match a specific pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>SELECT column_name\nFROM table_name\nWHERE column_name LIKE 'A%';\n<\/code><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Example SQL code to filter out rows with data values that are not in a specific list:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>SELECT column_name\nFROM table_name\nWHERE column_name IN ('A', 'B', 'C');\n<\/code><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">By using these SQL code snippets and checklist, you can perform common data preparation tasks to ensure that your data is clean and ready for analysis.<\/p>","protected":false},"excerpt":{"rendered":"<p>Common data preparation tasks in SQL Example SQL code to eliminate duplicates based on a single column: Example SQL code to eliminate duplicates based on multiple columns: Example SQL code to convert a string column to an integer column: SELECT CAST(column_name AS INTEGER) FROM table_name; Example SQL code to extract a substring from a string [&hellip;]<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-68","post","type-post","status-publish","format-standard","hentry","category-sql"],"_links":{"self":[{"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/posts\/68","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/comments?post=68"}],"version-history":[{"count":5,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/posts\/68\/revisions"}],"predecessor-version":[{"id":86,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/posts\/68\/revisions\/86"}],"wp:attachment":[{"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/media?parent=68"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/categories?post=68"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/luislourenco.com\/ro\/wp-json\/wp\/v2\/tags?post=68"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}