Excel question about weeding out special characters
-
O-TrapI have a spreadsheet with about 86,000 rows. I'm trying to weed out rows where any of the columns contain anything except A-Z or a-z.
The problem is, there are some accented letters (ex. Ã) all throughout, and I would rather not scour the 86K records looking for all the accented letters.
Is there a formula (preferrably not VBA) that will check each character in a cell against a list and return one value if it finds a character not in that list?
Thanks! -
derek bomarWhy don't you just apply a filter?
-
O-TrapI'm working with a filter, but the cells with odd characters also have normal letters, so I'm not sure how I would go about finding if a cell has anything except A-Z and a-z.
-
derek bomarCan you do a find/replace? If so then you can duplicate your data and then do a replace on the 2nd set. Then do an true formula or something to see what changed (hypothetically it'd be just the special characters...)
-
derek bomarI'm on an ipad now or I'd play around with it for you. There's also some shit you can do with the Clean function but Idk if it works for accents
-
O-TrapYeah, I tried the CLEAN function. It leaves the accents in there.
I could do a find/replace, but the problem is that I'm not sure what all characters I would need to be looking for. I was even finding fraction symbols and such in the cells, which I would have never thought to look for had I not seen one right near the top.
So in theory, I could still miss some characters if I don't know to search for those characters, specifically.
I don't know what all special characters and accent characters I'd find if I were to scour all 86K rows. I just know that I don't want any row containing anything except the base English alphabet.
Yay for dirty marketing data ... -
derek bomarYou can do it in vba I think
-
O-TrapI figured. I'm not familiar with VBA, and when I've had to use it in the past, it's ended up being somewhat time-intensive to take something I find on the web and tinker with it to fit my needs.
I'm essentially trying to come up with something like this (I'll use non-accent characters for the example ... it's just easier for explanation purposes):
FIELD A ACCEPTABLE CHARACTERS ANY OTHER CHARACTERS? John h,n,o,J No Johnny h,n,o,J Yes Johnson h,n,o,J Yes
I know you're on a tablet, so no worries on not digging more. Just seems like there should be a way to do it without setting up a VBA. But if not, I'll have to tinker with it tonight. -
Pick6Excel is the greatest thing ever. Nothing is coming to my head right away though. I bet you could find what you need on this board if it exists (sure it does).
http://www.mrexcel.com/forum/forum.php -
O-TrapI actually found a stupidly inefficient workaround that can be done by someone who isn't familiar with VBA.
It's so inefficient that it's embarrassing, but it works.
Still, thanks for the help, guys. -
gutsomething like if(isnumber(find("1",A1,1,)),1,0)....do a column for each special character (i.e. 1, 2, *, @, etc)...even link in to the special character in the top row. Copy the formulas down. In the final row, sum the values. Then filter everything in that column >0
-
O-TrapThing is, I didn't know what all characters I might find. I saw a lot of characters in the method I used that I would have NEVER actually thought of using.
What I did was, I found out what the longest entry was (13 characters), and I made 13 columns, putting each character in its own column. Then, I created 13 more columns that checked each character against a column of the acceptable characters with VLOOKUP. Then, I filtered and sorted by which ones came up with "#N/A" in any of the fields.
Very inefficient, but the trouble was that I wasn't checking the fields against a list of what wasn't allowed (ie. searching for a symbol or accent character in a field). Only what was allowed. -
gutI don't know any way of doing this without checking against a list of characters to exclude. You can use the "or" function to check multiple characters in each statement, but you'll just have to iterate until you've identified all the characters to remove.
actually, I think char(row()) copied from row 1 to 255 will list all characters in excel. Then you have your list, and just remove the ones that aren't an issue -
O-Trap
Yeah, I ended up doing what I posted above.gut;1657620 wrote:I don't know any way of doing this without checking against a list of characters to exclude. You can use the "or" function to check multiple characters in each statement, but you'll just have to iterate until you've identified all the characters to remove.
I'll explain (It's REALLY inefficient):
I turned a single column into 13 columns, so that each character in the original column could be placed in its own column
So, a cell with "Stephanie Lew" becomes 13 cells in a single row with "S" in one, "t" in the next one, "e" in the next one, and so on.
S t é p h a n i e L e w
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z - ' .
I then check each cell in the "Stephanie Lew" row against that column using VLOOKUP.
If I see any cell that has a '#N/A', I know there's a character in that row that doesn't show up in the column it's checked against.
S t #N/A p h a n i e #N/A L e w
That's basically how I did it. Took out about 2K bad entries with it.