AQA Computer Science GCSE
Programming Concepts - String handling
String Handling
Strings are one of the main data types you need to know about. They represent words - sets of characters. We show that data is a string by putting it inside quite marks, like: "Boris Budge".
A character is a specific data type. It's just a single keyboard character. We put these inside quotes as well, although sometimes just single quotes are used: 'B'.
You can read more about data types on Variables & Data Types page.
Some of this is linked to character encoding, which is part of Unit 3 and deals with ASCII code and Unicode and the idea of character sets.
Dealing with Strings
Strings should be thought of as sequences of characters.
This means that the string "Asparagus" is made up of a sequence of 9 characters in the right order. We can write the sequence as ['A', 's', 'p', 'a', 'r', 'a', 'g', 'u', 's'].
Note that this is exactly the same way of writing that we use when we're dealing with arrays. A string is really just an array of characters - but they are such important data types that we simplify the way we deal with them by writing them in a simpler way (like, "Asparagus").
There are six main sets of operations you need to be able to do on strings:
- Find the length
- Find the position of a character
- Concatenate strings (join them together)
- Create and use substrings
- Convert strings to numbers (and back)
- Convert to and from character codes (ASCII code)
Length
Strings have a length. This is just the number of characters in the string - including any spaces.
In Pseudocode you'd see:
In Python this becomes:
The length of a string is helpful to know when you want to iterate over it - to use a loop to work through character by character.
Position
Just like arrays, each character in a string can be identified using the index of the element. This uses a number to identify each character in the string.
Just like with arrays, the index of a string usually starts from 0. So, in the code below, the first character of the string (the 'A') is index 0, the second (the first 's') is index 1 and so on. The string has 9 characters and so a length of 9, but the last letter in the string (the second 's') is index 8.
As with arrays, square brackets are used to access individual characters.
OUTPUT theString[0] # outputs 'A'
OUTPUT theString[1] # outputs 's'
OUTPUT theString[7] # outputs 'u'
OUTPUT theString[9] # index out of range error
It is also possible to find the position in the string of a particular character. This uses the command POSITION and will find the first time that the character appears in the string. If the character doesn't appear at all in the string then -1 is returned.
posg <- POSITION(theString, "g") # find the position of g
posa <- POSITION(theString, "a") # find the position of a
posA <- POSITION(theString, "A") # find the position of a
posz <- POSITION(theString, "z") # find the position of z
The values returned by POSITION in each case would be:
- posg = 6 - don't forget that indexes start from 0
- posa = 3 - the first 'a' (not 'A')
- posA = 0
- posz = –1 - there is no z, so we get –1
IF posAtSymbol = -1 THEN
The Python equivalent of POSITION is the built in function find(). It also returns -1 if the character isn't in the string.
posd = theString.find("d")
print(posd)
Remember that you can use "d" or 'd' to represent a character. I prefer to use "d" in Python as it causes less problems when I want to use an apostrophe in a word like "can't".
Concatenation
Concatenation involves joining two or more strings together. This uses the operator +.
stringTwo <- "Butter"
stringThree <- stringOne + stringTwo
This produces the string "AsparagusButter". If you want to add a space you need to say so!
stringTwo <- "Butter"
stringThree <- stringOne + " " + stringTwo
The major problem with concatenation comes when you try and concatenate a string variable with a number variable of some kind. This won't work - you need to convert the number variable to a string first:
score <- 42
stringOne <- aString + score # will not work
stringOne <- aString + INT_TO_STRING(score) # convert the integer first
You can also use REAL_TO_STRING if necessary to convert a decimal number to a string.
The Python code to do this conversion to a string is simpler:
This works whether the variable score is an integer or a real number.
Substrings
Substrings are strings created from part of a longer string. There are built in commands which make this easy to do. They can be useful for all sorts of things.
stringOne <- SUBSTRING(0, 3, aString)
OUTPUT(stringOne)
This will output the string "Amaz" - the characters from element 0 to element 3.
If you wanted the substring of just the word "shoes" you'd use:
Take a look at each of these are work them out:
stringOne <- SUBSTRING(1, 3, aString)
stringTwo <- SUBSTRING(5, 8, aString)
stringThree <- SUBSTRING(3, 3, aString)
These evaluate to:
- stringOne = "maz"
- stringTwo = "ng s" - don't forget the space
- stringThree = "z" - starts at 3 and ends at 3
In Python you do this slightly differently using a technique called slicing.
stringOne = myString[0:6] # returns "banana"
stringTwo = myString[7:15] # returns "republic"
stringThree = myString[3:8] # returns "ana R"
Just like for loops, in Python the last number in the square brackets is the one after the last character you want. I know this is annoying, but it's the way substrings (and for loops) work in Python.
Converting Strings to/from Numbers
Sometimes you need to be able to convert a number into a string and vice-versa.
For example, when using input() data is always stored as a string - even if you enter "42".
I mentioned this on the page dealing with using input and it's covered in the concatenation section above as well, but here's a summary of the methods to use:
theFloat = float(input("Enter a number: ")) # to real (float)
# convert to a string to concatenate
print("The number is " + str(theInteger))
print("The number is " + str(theFloat))
Note that both Integers and Floats are converted to a String using the same method.
The Pseudocode to do the same thing is here. You might see this in an exam:
INT_TO_STRING(anInteger) - the opposite
STRING_TO_REAL(aString) - converts string to real number
REAL_TO_STRING(aRealNumber) - the opposite
You're more likely to see this written in pseudocode and have to know what they mean - which isn't too difficult.
Converting Characters to ASCII Codes:
You can convert a character to its ASCII code representation in most programming languages. This can be helpful sometimes and is something you need to know for exams.
Here's the pseudocode to do this:
theChar <- theName[3]
theChar <- CHAR_TO_CODE(theName[i])
This converts the next character at index 3 in theName (i) to its ASCII code number (i is ASCII code 105 - which is different to I which is ASCII code 73). A space would be converted to 32 - the ASCII code value for a space character.
The Python to do this:
theChar = theName[3]
theChar = ord(theName[i])
ord() is the Python equivalent of CHAR_TO_CODE. Perhaps the only time that Pseudocode is easier to use than Python!
Converting ASCII Codes to Characters:
You can also convert from a character code to a character using CODE_TO_CHAR:
theChar <- CODE_TO_CHAR(theCode)
OUTPUT theChar
This code uses USERINPUT to allow you to enter a character code. When you enter a number in Python it gets stored as a string, so when using Python we need to make sure we convert to an integer first.
theChar = chr(theCode)
print(theChar)
This time Python uses the built in function chr() to convert from an integer to a character.