Archive

Excel question about weeding out special characters

  • O-Trap
    I have a spreadsheet with about 86,000 rows. I'm trying to weed out rows where any of the columns contain anything except A-Z or a-z.

    The problem is, there are some accented letters (ex. Ã) all throughout, and I would rather not scour the 86K records looking for all the accented letters.

    Is there a formula (preferrably not VBA) that will check each character in a cell against a list and return one value if it finds a character not in that list?

    Thanks!
  • derek bomar
    Why don't you just apply a filter?
  • O-Trap
    I'm working with a filter, but the cells with odd characters also have normal letters, so I'm not sure how I would go about finding if a cell has anything except A-Z and a-z.
  • derek bomar
    Can you do a find/replace? If so then you can duplicate your data and then do a replace on the 2nd set. Then do an true formula or something to see what changed (hypothetically it'd be just the special characters...)
  • derek bomar
    I'm on an ipad now or I'd play around with it for you. There's also some shit you can do with the Clean function but Idk if it works for accents
  • O-Trap
    Yeah, I tried the CLEAN function. It leaves the accents in there.

    I could do a find/replace, but the problem is that I'm not sure what all characters I would need to be looking for. I was even finding fraction symbols and such in the cells, which I would have never thought to look for had I not seen one right near the top.

    So in theory, I could still miss some characters if I don't know to search for those characters, specifically.

    I don't know what all special characters and accent characters I'd find if I were to scour all 86K rows. I just know that I don't want any row containing anything except the base English alphabet.

    Yay for dirty marketing data ...
  • derek bomar
    You can do it in vba I think
  • O-Trap
    I figured. I'm not familiar with VBA, and when I've had to use it in the past, it's ended up being somewhat time-intensive to take something I find on the web and tinker with it to fit my needs.

    I'm essentially trying to come up with something like this (I'll use non-accent characters for the example ... it's just easier for explanation purposes):
    FIELD A ACCEPTABLE CHARACTERS ANY OTHER CHARACTERS?
    John h,n,o,J No
    Johnny h,n,o,J Yes
    Johnson h,n,o,J Yes
    Since the latter two contain other characters, they'd spit out a "Yes" in the third column. Since the first one doesn't contain anything but the permitted characters, it wouldn't.

    I know you're on a tablet, so no worries on not digging more. Just seems like there should be a way to do it without setting up a VBA. But if not, I'll have to tinker with it tonight.
  • Pick6
    Excel is the greatest thing ever. Nothing is coming to my head right away though. I bet you could find what you need on this board if it exists (sure it does).
    http://www.mrexcel.com/forum/forum.php
  • O-Trap
    I actually found a stupidly inefficient workaround that can be done by someone who isn't familiar with VBA.

    It's so inefficient that it's embarrassing, but it works.

    Still, thanks for the help, guys.
  • gut
    something like if(isnumber(find("1",A1,1,)),1,0)....do a column for each special character (i.e. 1, 2, *, @, etc)...even link in to the special character in the top row. Copy the formulas down. In the final row, sum the values. Then filter everything in that column >0
  • O-Trap
    Thing is, I didn't know what all characters I might find. I saw a lot of characters in the method I used that I would have NEVER actually thought of using.

    What I did was, I found out what the longest entry was (13 characters), and I made 13 columns, putting each character in its own column. Then, I created 13 more columns that checked each character against a column of the acceptable characters with VLOOKUP. Then, I filtered and sorted by which ones came up with "#N/A" in any of the fields.

    Very inefficient, but the trouble was that I wasn't checking the fields against a list of what wasn't allowed (ie. searching for a symbol or accent character in a field). Only what was allowed.
  • gut
    I don't know any way of doing this without checking against a list of characters to exclude. You can use the "or" function to check multiple characters in each statement, but you'll just have to iterate until you've identified all the characters to remove.

    actually, I think char(row()) copied from row 1 to 255 will list all characters in excel. Then you have your list, and just remove the ones that aren't an issue
  • O-Trap
    gut;1657620 wrote:I don't know any way of doing this without checking against a list of characters to exclude. You can use the "or" function to check multiple characters in each statement, but you'll just have to iterate until you've identified all the characters to remove.
    Yeah, I ended up doing what I posted above.

    I'll explain (It's REALLY inefficient):

    I turned a single column into 13 columns, so that each character in the original column could be placed in its own column

    So, a cell with "Stephanie Lew" becomes 13 cells in a single row with "S" in one, "t" in the next one, "e" in the next one, and so on.
    S t é p h a n i e L e w
    Then, I made a column on the next sheet with all the acceptable characters. The column was basically something like this:
    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
    R
    S
    T
    U
    V
    W
    X
    Y
    Z
    a
    b
    c
    d
    e
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z
    -
    '
    .

    I then check each cell in the "Stephanie Lew" row against that column using VLOOKUP.

    If I see any cell that has a '#N/A', I know there's a character in that row that doesn't show up in the column it's checked against.
    S t #N/A p h a n i e #N/A L e w

    That's basically how I did it. Took out about 2K bad entries with it.