Slide 106 causes "undefined columns selected" error

Data Society – Home Forums Clustering and finding patterns Slide 106 causes "undefined columns selected" error

This topic contains 6 replies, has 2 voices, and was last updated by  Merav Yuravlivker 1 year, 8 months ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #18106

    JayC
    Participant

    My Achievements

    Concept review passed! IconSection completed! IconConcept review passed! IconCourse completed! IconConcept review passed! IconSection completed! IconExercise set done! IconSection completed! IconAnswered a post IconPost published! IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    On slide 106, the clust_data_NBA call shown in the slide causes an error.

    the first error is easy to fix because the slide uses “STLPG” rather than “STPG”. However, once that is corrected, the command still refuses to run, and I’ve no idea why. I’ve gone over the column names multiple times but still can’t get past this step.

    My code:
    clust.data.NBA <- NBA[, c(“MGP”, “FG_PC”, “X3P_PC”, “FT_PC”,
    “APG”, “PPG”, “STPG”, “BLKPG”)]

    the error:
    Error in [.data.frame(NBA, , c(“MGP”, “FG_PC”, “3P_PC”, “FT_PC”, “APG”, :
    undefined columns selected

    NBA.csv has an APG column and this code is basically identical to the slide.

    #18107

    Merav Yuravlivker
    Keymaster

    My Achievements

    Concept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconExercise set done! IconConcept review passed! IconConcept review passed! IconSection completed! IconConcept review passed! IconSection completed! IconConcept review passed! IconConcept review passed! IconExercise set done! IconExercise set done! IconSection completed! IconConcept review passed! IconSection completed! IconAnswered a post IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    Hi JayC,

    Thanks for your comment! Usually this error “undefined columns selected” means that your code is trying to access a column that doesn’t exist. It looks like your code has a column “X3P_PC”, and the error message has a column name “3P_PC”. I recommend to recheck your column names, as it may be that there is a typo in one of them. You can always use the code “colnames(NBA)” to see a list of all the column names in the console and then copy and paste the names into your code.

    There are some other ways to subset the columns, such as identifying the column numbers instead of names or using the subset function. You can always test those out to see if they work.

    Let me know if that helps!

    Best,
    Merav

    #18110

    JayC
    Participant

    My Achievements

    Concept review passed! IconSection completed! IconConcept review passed! IconCourse completed! IconConcept review passed! IconSection completed! IconExercise set done! IconSection completed! IconAnswered a post IconPost published! IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    The colnames command produces:
    > colnames(NBA)
    [1] “NAME” “TEAM” “SALARY.M.” “GP” “MPG”
    [6] “PPG” “FG_PC” “X3P_PC” “FT_PC” “RPG”
    [11] “APG” “STPG” “BLKPG” “POSITION”

    Note that the file NBA.csv has a column titled “3P_PC”, and R doesn’t allow columns to start with a number (so it adds the “X”). This was covered in the data viz course I believe.

    My point is that the code on your slide does not work and I do not understand why.

    #18112

    Merav Yuravlivker
    Keymaster

    My Achievements

    Concept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconExercise set done! IconConcept review passed! IconConcept review passed! IconSection completed! IconConcept review passed! IconSection completed! IconConcept review passed! IconConcept review passed! IconExercise set done! IconExercise set done! IconSection completed! IconConcept review passed! IconSection completed! IconAnswered a post IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    Hi JayC,

    I just ran the code

    NBA = read.csv(“NBA.csv”)
    clust_data_NBA = NBA[, c(“MPG”, “PPG”, “FG_PC”, “X3P_PC”,
    “FT_PC”, “RPG”, “APG”, “STPG”, “BLKPG”)]

    I also ran the slide from the code (as copied below):

    clust_data_NBA = NBA[, c(“MPG”, “FG_PC”, “X3P_PC”, “FT_PC”,
    “APG”, “PPG”, “STPG”, “BLKPG”)]

    Both of these variations worked when I ran them. I recommend restarting R and RStudio, as this tends to solve problems like this, where the code does not seem to be working correctly.

    Let me know if that worked!

    Best,
    Merav

    #18231

    JayC
    Participant

    My Achievements

    Concept review passed! IconSection completed! IconConcept review passed! IconCourse completed! IconConcept review passed! IconSection completed! IconExercise set done! IconSection completed! IconAnswered a post IconPost published! IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    So the restart didn’t work, and neither did the copy/paste of your code into my code.

    But I basically rebuilt the entire expression one variable at a time and for some mysterious reason it worked.

    Weird are the ways of R.

    #18232

    JayC
    Participant

    My Achievements

    Concept review passed! IconSection completed! IconConcept review passed! IconCourse completed! IconConcept review passed! IconSection completed! IconExercise set done! IconSection completed! IconAnswered a post IconPost published! IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    Also, for some reason my code is spitting out that the best number of clusters is 3, not 2.

    #18233

    Merav Yuravlivker
    Keymaster

    My Achievements

    Concept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconConcept review passed! IconExercise set done! IconExercise set done! IconConcept review passed! IconConcept review passed! IconSection completed! IconConcept review passed! IconSection completed! IconConcept review passed! IconConcept review passed! IconExercise set done! IconExercise set done! IconSection completed! IconConcept review passed! IconSection completed! IconAnswered a post IconFeedback given! IconExercise done! IconConcept review passed! IconQuiz completed IconCourse completed! IconSetup completed! Icon

    Hi JayC,

    You are correct in that R can work in some weird ways! And for the NBA data set, I believe that we state the best number of clusters in 3, although we go over what kmeans looks like with only 2 centers to illustrate the differences.

    Let me know if you have any other questions!

    Best,
    Merav

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.