Sampling Before I sample, I need to consider how much data I need and from where. Firstly I will get more specific onto the data I will get. For my main hypothesis, I will use all the data I will get from my minor hypotheses. The data I will get for the gender effect on memories will be from two places, one from the girls in the high school, and secondly boys from John Hampden Grammar school of the same age. I do not need to get so specific with boys from the school because I will definitely include one of the groups that will have data used in the hypothesis, just mentioned, in the age effect on memory.
In this part of the hypothesis, I shall look into boys from Key Stage 2, boys from Key Stage 3, which includes students from Years 7, 8 and 9. I shall also look into Year 10 students and students in the Sixth Form. I shall consider male adults working in the school. Therefore my population is defined as the students in the school except those who are in Year 11 because they will be stressed with the GCSEs and because some of them will also be doing the same experiment and the male adult school teachers, the girls from the high school, and the students from a class in a middle school.
The whole main school population is 919. The Year 11 batch has around a 129 students in it, meaning that part of the population now is 790. Out of this 790, we know that 129 of them are in Upper Sixth Form, 106 of them are in Lower Sixth Form, 129 are in Year 10, 147 are in Year 9, 147 are in Year 8 and 132 are in Year 7. We also know that there are around 50 teachers in the main school and that 30 of them would be male. From the high school, we can assume that the number of girls from there that are part of the population would be from the Sixth Form, and the number should be the same, around 235.
The students in the Key Stage 2 class would be 52. However as we only want to consider the males, we take 31 of them which are the male. So the total population adds up to be 1081. There are a few main types of sampling. Random sampling is when every person of the population has an equal chance of being chosen. This would not be the correct one to use as this is done on the trust of luck and as we are comparing the female to the male, we might by the chance of luck not end up with no females at all and it would be difficult comparing male data to female data especially where there is no data for the female.
Systematic sampling is the next type of method that can be used. This is when every one in the population is numbered and every xth person is chosen from the numbers. This is not very good as much more people from the Year 8 and Sixth Form might be chosen and only a few of other groups might be left out. And as the numbering is done at random, some groups of people might be completely left out. Cluster sampling is when a group is chosen at random from the population and then a simple random sample is taken from this.
This has the problems of random sampling, but might be more accurate but there is no scientific reason it needs to because the group is chosen at random and the second group is also chosen at random. Quota sampling is when the person who has to compile the data is given a set of instructions to fulfil a certain quota. As the place where we are compiling the data is a school, we do not need to fulfil quotas due to the organisation of the school. This is similar to stratified sampling as is shown below. Stratified Sampling seems to be the most useful type of sampling in this case.
We divide the population into different stratus and based on the percentage the stratus is upon the population, we multiply by that by the amount of data I need. We get this amount of data from the stratus through random sampling. This allows us to get a fair share of amount of data from each stratus. Now we need to know how much data I actually need. Since I have found out that my population is 1081 people big, I need to know what percentage of that population I need. I have settled on a 300 people as sensible number.
I can show the working that will allow me to do my sampling peacefully. It is also near 30% of the whole data.Total Sample = 9 + 37 + 41 + 41 + 36 +29 + 36 + 65 + 8 = 302 This means the data needs to be edited somewhere to provide a total of 300 meaning less 1 one (40) each in the Year 8 and 9 samples. The respective number of people will be chosen at random with a random number generator from their respective stratus. My mathematics teacher, Mr. N.
Hutchinson will help me obtaining secondary information regarding this project for the people in Years 7, 10, the Lowe Sixth and girls from the High School. I will have to ask the adults, the other students from other years and Key Stage 2 Students by myself. As the data used is discrete, with only marks in considerations, I would like the accuracy to be exact. However the accuracy is not only affected by the item used to measure. Accuracy is also affected by the bias or partiality that might be caused. I will check the accuracy of my data against other data so anomalous results can be plucked out.
Bias I have taken all the measures I can possibly take to avoid bias in my results. For example, I am using the class-case scenario of students and teachers, where the teachers are usually fair with everyone so as not to show partiality for a certain person, where or not he has any. A teacher would probably be fair. Therefore, teachers will probably make sure no one cheats or has extra time and such. This is unlike individual memory tests, where charm will allow a few seconds to memorise or other factors to make it unseen that the person next to you is cheating.
Also if done at home, people are not going to take the test to seriously and might not care if they cheat. I also have reduced the chances of cheating as all papers and writing material have to be on the floor when the projector is switched on like test conditions. I have also eliminated bias as I introduced anonymity as people would not want to cheat that badly as they would not be known and as others marking their paper might not be so desperate as to help their friends because, it is very likely that they would not know whose paper they are marking.
I have also avoided selection bias because I have stratified sampling and the rest will be randomly chosen using a random number generator. There is no way that I can control a random number generator and the random number generator will select people randomly. Therefore, I have no need to worry about choosing someone’s answers over somebody else’s because I only have limited choice within each stratus and this choice will be done randomly. I also can avoid other bias by removing all stereotypes from my mind. For example, I think that old people have poorer memories.
This is a thought that seemed liked logic to me but I have not been yet to prove this. However the thought sticks with me so hard, that it might affect the results. Another way is that I can be more favoured towards the girls’ side to make sure my hypothesis is true. There are many other reasons I can do things like this and I must try to avoid as much of this possible. This can be done by clearing the mind of such thoughts. Use of Data The data will be used to investigate the tasks and find out about the validity of the hypotheses.
However, raw data would not be sufficient. After dividing the data, I need to do more with it, such as presenting it. I can present data in a variety of ways, including bar charts, bar line graphs, scatter diagrams (with lines of best fit), box and whisker diagrams, histograms, back to back stem and leaf diagrams, frequency polygons, cumulative frequency graphs and moving averages. With these diagrams, I have to make sure they are relevant to the task. I also need to analyse the data and to analyse the data I need a measure of central tendency and measure of spread.
The measure of central tendency is usually the mean, median or mode and the measure of spread includes the range, interquartile range, standard deviation or interpercentile range, also known as percentiles. All of these must be used relevantly and must not be used for the sake of being used. For example the back to back stem and leaf diagrams can be used when two sets of data need to be compared and histograms must be used when there are class widths which are unequal. Cumulative frequency graphs must be used if the median and interquartile ranges need to be found out and if the total mark is of use.
I have thought of how to use data relevantly. I cannot use diagrams for the sake of using diagrams and calculations for the sake of using calculations. I will state for what, how, where and why I will use certain calculations. To analyse the data for the different results between male and female, I shall draw two stem-and-leaf diagrams and a combined back-to-back stem-and-leaf diagram. This will allow me to compare the two sets of results to enable me to have a better understanding about my hypothesis. The data used will be the score of the pictures. I will also repeat for the score of the numbers and words.
I will then repeat for the total scores. Also I need to have a measure of central tendency and a measure of spread in order to really understand the data. Because, I use back-to-back stem-and-leaf diagrams, it would be most sensible of me to use mode as it is easy to point out from the diagrams. Also, this will give me the scores of the average males and the average females and crossing out the extremes. Since we are using stem-and-leaf diagrams, it is easy to point out the maximums and minimums and therefore, would be sensible to point out the range.
Bar charts can also be used to show the number of people that got each score for each gender. Also comparative pie charts which has one main chart for the comparison of the total scores of males and females, which of each a sub-pie chart will be formed, in which scores for the pictures, words, and numbers are compared with the total scores of the gender. To analyse the data that has been gathered between the different ages, I have decided to use histograms because I am then able to use uneven class widths. I will have histograms for each of my set of data for each age group.