Lmst

#TextData

I needed an excuse to make a word cloud shaped like a brain and I finally found one. Who said dreams don’t come true?

But in all seriousness, I learned a couple of things about how to work with text data, especially short text across multiple languages and I’ve shared that here:

https://neurofrontiers.blog/how-people-find-us/

#WordCloud #TextData #DataAnalysis #DataIsBeautiful #multilingual

Do your eyes still get stung by profanity in text data?

Well, there is an R script that enables you to handle it. Really useful for working with social media data.

Check out the script on github:

https://github.com/Ifeanyi55/noProfanity

#rstats #textdata #socialmediadata #github

Master Advanced Analytics with SAS Viya | CoListy
Learn advanced analytics, machine learning, and AI with SAS Viya. Optimize models, forecasting, and more. | CoListy
#freeonlinelearning #colisty #courselist #sasviya #advancedanalytics #predictivemodeling #machinelearning #timeseriesforecasting #optimization #aialgorithms #generativeai #imagedata #textdata #streamingdata #modelops #trustworthyai

https://colisty.netlify.app/courses/sas-analytics_-getting-started/

A script I just wrote to process text files for a reconciliation for a vendor.

#R #RStats #RProgramming #Programming #Coding #TextData

#purrr #map #dplyr #readLines

$wdir <- "W:/path/to/files" fl <- list.files(wdir, pattern = "\\.txt$", full.names = TRUE) fl <- purrr::discard(fl, stringr::str_detect(fl, "updated")) fl <- fl[!fl %in% c("W:path/to/files/bad_file.txt")] fl <- setNames(fl, basename(fl)) txt_to_df <- function(txt_file) { txt_lines <- readLines(txt_file) x <- data.frame(txt_lines) writeLines(txt_lines, "test.csv") data <- read.csv("test.csv", header = TRUE, sep = "|") data <- dplyr::as_tibble(data) |> dplyr::mutate(dplyr::across(.cols = dplyr::everything(), .fns = as.character)) return(data) } add_missing_cols <- function(df) { if (ncol(df) != 102) { missing_cols <- c("MissingColumns1","MissingColumn2","MissingColumn3") df[missing_cols] <- NA } return(df) } ret <- purrr::map(fl, txt_to_df) ret <- purrr::map(ret, add_missing_cols) ret <- purrr::map(ret, janitor::clean_names) bad_cols_txt <- purrr::keep(ret, \(x) ncol(x) != 102) good_files_txt <- purrr::keep(ret, \(x) ncol(x) == 102) good_files_tbl <- good_files_txt |> purrr::map(\(x) x |> dplyr::mutate(dplyr::across(.cols = dplyr::everything(), .fns = as.character))) |> purrr::list_rbind(names_to = "id") if (length(bad_cols_txt) > 0) { bad_files_tbl <- bad_cols_txt |> purrr::map(\(x) x |> dplyr::mutate(dplyr::across(.cols = dplyr::everything(), .fns = as.character))) |> purrr::list_rbind(names_to = "id") }$

A script I just wrote to process text files for a reconciliation for a vendor.

#R #RStats #RProgramming #Programming #Coding #TextData

#purrr #map #dplyr #readLines

In today's post, I discuss using `grep()` in R for extracting substrings from text data.

While `grep()` finds pattern matches, it doesn’t return the substrings directly. I explain combining it with `regexpr()` and `substr()`, or `gregexpr()` and `regmatches()` to achieve this.

Practical examples include filtering email addresses and data frames.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-09/

#R #RStats #Programming #Coding #textdata

In today's post, I discuss using `grep()` in R for extracting substrings from text data.

While `grep()` finds pattern matches, it doesn’t return the substrings directly. I explain combining it with `regexpr()` and `substr()`, or `gregexpr()` and `regmatches()` to achieve this.

Practical examples include filtering email addresses and data frames.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-09/

#R #RStats #Programming #Coding #textdata

In today's blog post, I introduce the `grep()` function in R, a key tool for searching patterns in text data.

It allows case-sensitive searches by default but can perform case-insensitive searches with the `ignore.case` argument.

This flexibility is essential for text mining, data cleaning, and analysis. I outline the basic syntax, usage examples, and common mistakes.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-04/

#R #RStats #RProgramming #Programming #Coding #textdata #stringr #grep

In today's blog post, I introduce the `grep()` function in R, a key tool for searching patterns in text data.

It allows case-sensitive searches by default but can perform case-insensitive searches with the `ignore.case` argument.

This flexibility is essential for text mining, data cleaning, and analysis. I outline the basic syntax, usage examples, and common mistakes.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-04/

#R #RStats #RProgramming #Programming #Coding #textdata #stringr #grep

Today's blog post discusses using OR logic with the `grep()` function in R, which enhances pattern matching in character vectors.

By employing the pipe symbol (`|`), users can search for multiple patterns simultaneously, such as `grep("apple|banana", text_vector)`.

It also highlights the option to ignore case with `ignore.case = TRUE`.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-03/

#R #RStats #RProgramming #Programming #Coding #textdata #grep

Today's blog post discusses using OR logic with the `grep()` function in R, which enhances pattern matching in character vectors.

By employing the pipe symbol (`|`), users can search for multiple patterns simultaneously, such as `grep("apple|banana", text_vector)`.

It also highlights the option to ignore case with `ignore.case = TRUE`.

Post: https://www.spsanderson.com/steveondata/posts/2024-09-03/

#R #RStats #RProgramming #Programming #Coding #textdata #grep

Another post on #grep in #baseR

enjoy! :)

Post: https://www.spsanderson.com/steveondata/posts/2024-08-30/

#R #RStats #RProgramming #Coding #Programming #textdata #textmining

Another post on #grep in #baseR

enjoy! :)

Post: https://www.spsanderson.com/steveondata/posts/2024-08-30/

#R #RStats #RProgramming #Coding #Programming #textdata #textmining

In today's post, I explain how to use the `grepl` function in base R to search for multiple patterns in strings. We break down the syntax and show how to combine patterns using the OR operator (`|`) for simultaneous searches.

A practical example demonstrates searching for "cat" or "dog" in a list of phrases, highlighting case-insensitive searching and extracting results.

Post: https://www.spsanderson.com/steveondata/posts/2024-08-16/

#R #RStats #Coding #textdata #grepl #regex