Diacritic: Also called a diacritical mark or accent, this is a mark or glyph added to a letter or character.
You might not know the term, but you've used characters with diacritic accents. Think about é and è and ě and ê and ė and ē and ë. All "e" characters, but all different.
The wonderful concept of Unicode makes it possible for us to handle all of these in our systems. But it's not always simple. Most of us have had to deal with a character that displays as something strange.
Unicode for the non-programmers
In the world of computers, every character is represented by a number. This number, in turn, is represented in binary as a series of 0s and 1s. Think of it as a James Bond not-so-secret code.
The ideal is for every system to use the same code. Unicode is the standard for that code. But it needs to cater for all characters and languages and alphabets, not just the 26 letters that we know. Unicode can map over 1 million different characters. Most of the time we don't need a million characters, so we use subsets of Unicode.
A recent (and familiar?) experience
When we moved to virtual training, I took over some of the course admin that needed to change format. Things like Zoom invitations, registration forms, evaluations and attendance certificates. And I evolved a process for it over time. We've now automated most of it. Charles developed the system for me between giving courses. And I've tested it and added new requirements between other work. I admit that it has not been carefully designed or documented. But it's a work in progress that is being constantly refactored. (I believe in documentation, so that is on my to-do list.)
Enter a delegate with é in his name. Oops. I hadn't tested the system for characters with accents. Neither had Charles. And of course the HTML encoding from the front-end didn't work on the PDF certificate. So Charles rewrote part of the program to encode and decode correctly. It worked fine on our local test system. And broke on the live system, which is hosted by our ISP. Because that's a programmer's life.
My problem with pumpkin
I am a vegetarian who doesn't really like vegetables. Odd, but true. Despite years of being a vegetarian, the list of vegetables I want to eat is quite short. I'm trying to slowly expand the list.
And it doesn't help that a vegetable is not always a vegetable. Take pumpkin. It's healthy and nutritious. But I don't really like pumpkin. I do like pumpkin seeds, but apparently they don't count as vegetables. And I like pumpkin fritters with caramel sauce, but that doesn't count as healthy food. Do you see my problem?
What's this got to do with pumpkin?
I was staring at some cooked pumpkin last night. And I decided that vegetables are like Unicode. If you don't eat the vegetable in the right form, it isn't healthy. And if you don't handle the encoding correctly, it isn't the same character. Like Cinderella's carriage, your string will turn into a pumpkin.
(Completely unrelated: I wonder why Cinderella's carraige turned into a pumpkin, and not some other vegetable.)
Lewis pulled a face at my analogy. But it made sense to me.
I looked for some programmer jokes about Unicode. The few I found weren't funny. Like pumpkin, it's probably not a funny topic. So please share any Unicode jokes or stories that you know.