That Blue Square Thing

AQA Computer Science GCSE

February 2019: this area of the site is being built just now. There are areas where there is no content yet. That will get added over the next 6 months or so.

String Handling

Strings are one of the main data types you need to know about. They represent words - sets of characters. We show that data is a string by putting it inside quite marks, like: "Boris Budge".

A character is a specific data type. It's just a single keyboard character. We put these inside quotes as well, although sometimes just single quotes are used: 'B'.

You can read more about data types on the Unit 2 page Variables & Data Types.

Note that a lot of this page is really dealing with Unit 2 ideas - in particular 3.2.8. It's part of the programming section, but I like to teach it directly here to help remind you how to program.

Dealing with Strings

Strings should be thought of as sequences of characters.

This means that the string "Asparagus" is made up of a sequence of 9 characters in the right order. We can write the sequence as ['A', 's', 'p', 'a', 'r', 'a', 'g', 'u', 's'].

Note that this is exactly the same way of writing that we use when we're dealing with arrays. A string is really just an array of characters - but they are such important data types that we simplify the way we deal with them by writing them in a simpler way (like, "Asparagus").

There are six main sets of operations you need to be able to do on strings:

  1. Find the length
  2. Find the position of a character
  3. Concatenate strings (join them together)
  4. Create and use substrings

You also need to be able to convert characters to ASCII codes and vice versa and convert strings to integers and real numbers. These have already been dealt with on the character encoding page.

Length

Strings have a length. This is just the number of characters in the string - including any spaces.

In Pseudocode you'd see:

theLength <- LEN(theString)

In Python this becomes:

theLength = len(theString)

The length of a string is helpful to know when you want to iterate over it - to use a loop to work through character by character.

Position

Just like arrays, each character in a string can be identified using the index of the element. This uses a number to identify each character in the string.

Just like with arrays, the index of a string usually starts from 0. So, in the code below, the first character of the string (the 'A') is index 0, the second (the first 's') is index 1 and so on. The string has 9 characters and so a length of 9, but the last letter in the string (the second 's') is index 8.

As with arrays, square brackets are used to access individual characters.

theString <- "Asparagus"

OUTPUT theString[0] # outputs 'A'
OUTPUT theString[1] # outputs 's'
OUTPUT theString[7] # outputs 'u'
OUTPUT theString[9] # index out of range error

It is also possible to find the position in the string of a particular character. This uses the command POSITION and will find the first time that the character appears in the string. If the character doesn't appear at all in the string then -1 is returned.

theString <- "Asparagus"

posg <- POSITION(theString, "g") # find the position of g
posa <- POSITION(theString, "a") # find the position of a
posA <- POSITION(theString, "A") # find the position of a
posz <- POSITION(theString, "z") # find the position of z

The values returned by POSITION in each case would be:

The fact that -1 is returned if the value isn't in the string can be very helpful. Say, for example, you wanted to check a user had entered a valid e-mail address. One of things you could check would be if the character '@' is included. You can then use the logic:
posAtSymbol = POSITION(theEMail, "@")
IF posAtSymbol = -1 THEN
OUTPUT "That is not a valid e-mail address"
ENDIF

The Python equivalent of POSITION is the built in function find(). It also returns -1 if the character isn't in the string.

theString = "Boris Budge"

posd = theString.find("d")
print(posd)

Remember that you can use "d" or 'd' to represent a character. I prefer to use "d" in Python as it causes less problems when I want to use an apostrophe in a word like "can't".

Concatenation

Concatenation involves joining two or more strings together. This uses the operator +.

stringOne <- "Asparagus"
stringTwo <- "Butter"

stringThree <- stringOne + stringTwo

This produces the string "AsparagusButter". If you want to add a space you need to say so!

stringOne <- "Asparagus"
stringTwo <- "Butter"

stringThree <- stringOne + " " + stringTwo

The major problem with concatenation comes when you try and concatenate a string variable with a number variable of some kind. This won't work - you need to convert the number variable to a string first:

aString <- "Exam mark"
score <- 42

stringOne <- aString + score # will not work

stringOne <- aString + INT_TO_STRING(score) # convert the integer first

You can also use REAL_TO_STRING if necessary to convert a decimal number to a string.

Substrings

Substrings are strings created from part of a longer string. There are built in commands which make this easy to do. They can be useful for all sorts of things.

aString <- "Amazing shoes"

stringOne <- SUBSTRING(0, 3, aString)
OUTPUT(stringOne)

This will output the string "Amaz" - the characters from element 0 to element 3.

If you wanted the substring of just the word "shoes" you'd use:

stringOne <- SUBSTRING(8, 12, aString)

Take a look at each of these are work them out:

aString <- "Amazing shoes"

stringOne <- SUBSTRING(2, 4, aString)
stringTwo <- SUBSTRING(5, 8, aString)
stringThree <- SUBSTRING(4, 4, aString)

These evaluate to:

In Python you do this slightly differently using a technique called slicing. I'll add some slides about this at some point...