Computer Science (9618) A Levels Paper 3 Short Notes

Computer Science (9618) A Levels Paper 3 Short Notes

Author:
Price:

Read more

                                               Download PDF: 9618 Computer Science Paper 3 Short Notes

Chapter 16: Data representation

Chapter 16: Data types (Section 16.01)

Chapter 16 expands on the concept of data types introduced in Chapter 13.

Built-in Data Types For each built-in data type, the programming language defines the range of possible values that can be assigned to a variable, and the operations available for manipulating those values.

User-Defined Data Types A user-defined data type is one where the programmer includes the definition within the program. It is important not to confuse user-defined types with abstract data types (ADTs) defined in Chapter 13.

  1. Non-composite data types: These are data types defined without reference to another data type.
    • Enumerated data type: This is an example of a user-defined non-composite type. When defined, every possible value is explicitly identified.
      • The defined values are ordinal, meaning they have an implied order.
      • The values look like strings but must not be enclosed in quotes.
      • This type is necessary because the possible values cannot be generically defined beforehand, only when the programmer identifies them.
      • Example: TDirections = (North, East, South, West).
  2. Composite user-defined data types: These types refer to at least one other type in their definition.
    • Record data type: This type (introduced in Chapter 13) is typically user-defined, allowing the programmer to create structures that precisely match the program's data needs. Python is noted as a language that does not support the record data type. A record is also a component of a binary file.
    • Class: A class is a data type used for an object in object-oriented programming. Programmers often create user-defined classes to utilize the benefits of OOP.
    • Pointer data type: A pointer variable is one whose value is the address in memory of a different variable. The symbolism used for pointers (e.g., ^ for dereferenced value, @ for address) varies by programming language. Because arithmetic can be performed on pointer variables, they can be used to construct dynamically varying data structures.
    • Set data type: A set is a collection of data items that lacks any internal structure or organization. It does not allow duplicate values. Operations include adding or removing values, checking for value existence, and adding sets together. A set is useful for eliminating duplicates from structures like lists or arrays.

Chapter 16: File Organisation (Section 16.02)

Files used for program input or output are defined as either a text file or a binary file. A text file stores data using character codes (as discussed in Chapter 1), while a binary file stores data in its internal binary representation (e.g., two's complement for an integer).

The organization of a binary file is based on the concept of a record, which is a collection of fields containing data values.

  1. Serial files: These files contain records that have not been organised in any defined order. New records are simply appended to the file, resulting only in time-order of entry. A serial file organisation is suitable for batch processing or backing up data on magnetic tape.
    • File access: Normal usage is to read the whole file record by record from the beginning.
  2. Sequential files: These files have ordered records based on a key field (which must be unique and sequential, though not necessarily consecutive).
    • Adding a new record involves copying records from the old file to a new file sequentially until the correct position for the new record is reached, then copying the remaining records. The same method is used for editing or deleting records.
    • Sequential file organisation is suitable for applications where multiple records are required from one search.
    • File access: If the key field value is known, the search is faster as only key field values need to be read sequentially.
  3. Direct-access files (sometimes called random-access files): These allow access to any record in the file without sequential reading.
    • Direct access can be achieved by creating a separate index file containing the key field value and the record's position in the main file.
    • Alternatively, a hashing algorithm can be used. This algorithm calculates a file address (the remainder from an integer division) from the numeric key field value.
    • A key issue is the collision problem, where different key field values produce the same address. Collision resolution strategies include having overflow addresses or linked lists accessible from each address.
    • Direct access is advantageous if rapid access to an individual record in a large file is required, such as checking user passwords during login.
    • File access: The key field value is submitted to the hashing algorithm to find the record's position directly. Records can be edited in place. Deleting a record requires setting a flag in the record, which is then skipped during subsequent reading.

It is important to note that unlike primary keys in a database, key fields in files are not mandatory to have unique values.

Chapter 16: Real Numbers (Section 16.03)

The internal coding of non-integer numeric values (real numbers) is considered in this section. When representing a real number, either a fixed-point or a floating-point representation can be used.

Floating-Point Representation

A floating-point representation stores a real number using the formatpm Mtimes R^E:

  • pm M is the mantissa (or significand), for which a defined number of bits are used.
  • E is the exponent (or exrad), using the remaining bits.
  • R (radix) is implicitly 2 and is not stored.

Both the mantissa and exponent are typically stored using two’s complement representation. Floating-point representation offers a greater range of values than fixed-point representation.

Precision and Normalisation

A programmer must decide on the total number of bits and the split between the mantissa and the exponent.

  • Increasing the number of bits for the mantissa improves precision but reduces the range of possible values (by reducing the bits available for the exponent).
  • Normalisation is required to achieve maximum precision by ensuring the largest possible magnitude for the value represented by the mantissa is used. This use of normalisation is unrelated to the process associated with designing a database.
  • A number is in a normalised representation when the two most significant bits are different.
    • For a positive number, the bits are shifted left until the most significant bits are 01.
    • For a negative number, the bits are shifted left until the most significant bits are 10.
    • For each left shift, the exponent value is reduced by 1.

Conversion and Consequences of Approximation

Conversion of a denary real value to a binary representation often results in a degree of approximation, as binary fractional parts represent sums of halves, quarters, eighths, etc., and most denary fractions (like 0.1 through 0.9, except 0.5) do not convert accurately. This problem was previously mentioned regarding storing currency values accurately.

  • Rounding errors: Because the stored floating-point values are often approximations, repeated calculations (as in mathematical modelling for forecasting) can accumulate these slight approximations, leading to significant rounding errors.
  • To prevent serious rounding errors, programmers can increase the precision by allocating more bits to the mantissa, known as working in ‘double precision’ or ‘quadruple precision’.
  • Overflow and Underflow: Floating-point representations can still suffer from overflow (if the result is too large for the exponent range) or underflow (if the result is too small to be stored, possibly being converted to zero).

Chapter 16: Data representation Short Points

Chapter 16, titled Data representation, covers user-defined data types, file organization methods, and the representation of real numbers using floating-point format. This chapter is related to the key concept of Data representation and structures.

1. Data Types (Section 16.01)

This section discusses different categories of data types:

  • Non-Composite User-Defined Data Types: These are defined without reference to another data type. An example is the enumerated data type, where the definition explicitly identifies all possible values. Another example is the pointer variable, where the value is the address in memory of a different variable.
  • Composite User-Defined Data Types: These include records, sets, and classes.
    • A Record is a collection of fields containing data values. It is a user-defined data type.
    • A Set is a collection of data items that lacks any structure, contains no duplicates, and has a number of defined operations.

2. File Organisation (Section 16.02)

The sources distinguish between a text file and a binary file. A binary file is designed for storing data to be used by a computer program.

File organisation methods include:

  • Serial file organisation: Suitable for batch processing or backing up data on magnetic tape.
  • Sequential file organisation: Suitable for applications where multiple records are required from one search. Searching may be done by reading records from the beginning until the target record is found.
  • Direct-access files (sometimes called random-access files): Allow access to any record in the file without sequential reading. This organisation is used if rapid access to an individual record is required, such as on a system used to check passwords when users log in.
    • Direct access can be achieved by using a separate index file which holds the key field value and the position of that key field value in the main file.
    • Alternatively, direct access can use hashing algorithms, which calculate the file address from the key field value. Since hashing can cause collisions, overflow addresses or linked lists may be needed.

3. Data Representation of Real Numbers (Section 16.03)

This section covers the internal representation of non-integer numeric values.

  • Fixed-Point Representation: Divides the bits into a whole number part and a fractional part. The sign bit can be included in the most significant bit.
  • Floating-Point Representation: Stores a value for the mantissa and a value for the exponent. This representation allows a wider range of values compared to fixed-point.
    • Precision and Normalisation: Increasing the number of bits for the mantissa gives better precision but reduces the range of values due to fewer bits for the exponent. Normalisation involves shifting the mantissa until the most significant bit differs from the sign bit (01 or 10). A normalised representation achieves the best precision.
    • Consequences: Binary floating-point representations are often only an approximation to the real number they represent. This can lead to rounding errors.

 

 


Chapter 17: Communication and Internet technologies

 Part 3: Advanced theory of the textbook.

The learning objectives for this chapter include showing understanding of:

  • Circuit switching and packet switching.
  • The essential nature of a protocol for computer communication.
  • The concept of a protocol implementation viewed as a layered stack.
  • The TCP/IP protocol suite.
  • Specific application layer protocols: HTTP, FTP, POP3, IMAP, SMTP, and BitTorrent.

17.01 Transmission modes

For communication over an internetwork, there are two main approaches: circuit switching and packet switching.

  1. Circuit Switching
    • This is the traditional method used in telephone systems (PSTNs).
    • A sequence of dedicated links (channels in shared transmission media) is established across the network (through local exchanges and intermediate nodes) to guarantee unimpeded transmission.
    • Data transfer occurs once the links are established.
    • The links are typically removed when the data transfer is complete (though a permanent circuit may be established for services like leased lines).
  2. Packet Switching
    • Data is packaged into portions called packets.
    • A packet includes a header containing instructions for delivery and a data body.
    • The transmission does not require a circuit to be established, as the links used are not defined when the sender transmits the packet.
    • Packet switching can provide two types of service:
      • Connectionless service: A packet is dispatched with no knowledge of whether the receiver succeeded.
      • Connection-oriented service: The initial packet requests an acknowledgement; if received, the sender transmits further packets; if not, the sender attempts to send the first packet again.

17.02 Protocols and 17.03 A protocol stack

A protocol is defined as a set of rules for data transmission that must be agreed upon by both the sender and receiver. Network complexity necessitates using a protocol suite, which comprises multiple individual protocols.

Protocol implementation is often viewed as a protocol stack consisting of layers:

  • Each layer interacts only with the layer directly above or below it (via a defined interface).
  • A layer is serviced by the actions of the lower layers.
  • User interaction occurs at the highest layer.
  • Software handles the functioning of most layers, except possibly the lowest layer, which accesses hardware directly.

17.04 The TCP/IP Protocol Suite

The TCP/IP protocol suite is the basis for Internet usage, occupying the top three layers of a five-layer network model: Application, Transport, and Network.

The TCP/IP suite includes protocols such as:

  • Application layer: HTTP, SMTP, DNS, FTP, POP3, IMAP.
  • Transport layer: TCP, UDP, SCTP.
  • Network layer: IP, IGMP, ICMP, ARP.

TCP (Transmission Control Protocol)

  • Operates in the transport layer and is responsible for ensuring the safe delivery of the message to the receiver.
  • It creates packets, each including a header and user data.
  • The header contains a port number to identify the application layer protocol (e.g., port 80 for HTTP).
  • TCP includes a sequence number to ensure the eventual correct reassembly of the user data.
  • TCP is connection-oriented and uses acknowledgements to identify and resend missing packets.

IP (Internet Protocol)

  • Operates in the network layer and ensures correct routing over the Internet.
  • It takes the packet from the transport layer and adds a further header containing the sender's and receiver's IP addresses.
  • The IP packet is often called a datagram and is sent to the data-link layer.
  • IP functions as a connectionless service, meaning it has no knowledge of whether a packet reached its destination once sent.

The Router

  • Routers are devices that act as nodes on the Internet.
  • When a frame arrives at a router, its datagram content is returned to IP.
  • The router's software chooses the best route for the transmission by accessing a routing table.
  • Unlike a switch (which operates at the data-link layer and transmits frames without making routing decisions), a router operates at the network layer.

17.05 The Ethernet Protocol Stack

Ethernet is a protocol suite typically used for Local Area Networks (LANs). It provides the necessary functionality for the two lower layers (Data link and Physical) that support the TCP/IP suite.

The Ethernet protocol suite comprises four sub-layers:

  1. Logical Link Control (LCC) protocol: Manages data transmissions and integrity but, as a connectionless protocol, does not check for successful delivery.
  2. Medium Access Control (MAC) protocol: Assembles the Ethernet packet (called a frame), initiates transmission, and manages collision recovery (e.g., using CSMA/CD).
  3. Physical Coding Sublayer (PCS) protocol: Codes data for transmission or decodes received data.
  4. Physical Medium Attachment (PMA) protocol: Handles signal transmitting and receiving.

MAC Addresses

  • The addresses used in the Ethernet frame are physical or MAC addresses.
  • A MAC address uniquely defines one Network Interface Card (NIC).
  • The 48-bit addresses are generally written in hexadecimal notation (e.g., 4A:30:12:24:1A:10).

17.06 Application-layer protocols

Application-layer protocols work at the highest level of the TCP/IP stack.

  • HTTP (HyperText Transfer Protocol): This is the most critical application-layer protocol, underpinning the World Wide Web (WWW). It is a transaction-oriented, client–server protocol. The client sends a 'request' message (often using the GET method), and the server sends back a 'response' message.
  • Email Protocols:
    • SMTP (Simple Mail Transfer Protocol): A 'push' protocol used traditionally for sending mail from client to server, and still used for transfer between mail servers.
    • POP3 (Post Office Protocol version 3): A 'pull' protocol where emails are downloaded and stored on the client machine, which can improve security if emails are locally stored.
    • IMAP (Internet Message Access Protocol): Emails are stored on the server but remain accessible from the client. This is advantageous for users who need access from multiple systems/locations.
  • FTP (File Transfer Protocol): An application-layer protocol designed to handle file transfers between two end-systems, particularly useful when systems have different operating systems and file systems.

17.07 Peer-to-peer (P2P) File Sharing

P2P is a network architecture where there is no central control, and peers (end-systems) function simultaneously as both clients and servers. When a peer acts as a server, it is called a seed.

The BitTorrent protocol is commonly used for P2P file sharing because it enables fast transfer of files:

  • Content providers supply a torrent (a description file).
  • The torrent file contains the name of a tracker (a server that leads peers to the content).
  • The tracker maintains a list of all active peers (swarm) downloading or uploading content.
  • Peers download and upload file chunks simultaneously.
  • The protocol addresses "free-riders" or "leechers" by ensuring that peers continue uploading only to those who provide regular downloads, with others eventually being isolated or "choked".

 

Chapter 17: Communication and Internet technologies Short Points

1. Transmission Modes (Section 17.01)

There are two main approaches for communication over an internetwork:

  • Circuit Switching: A dedicated circuit is established between the sender and receiver before transmission starts. This is the method used in the traditional telephone system (PSTNs).
  • Packet Switching: Data is sent as packets without a dedicated circuit. It can provide a connectionless service (no acknowledgement of receipt) or a connection-oriented service (requests acknowledgement for packets).

2. Protocols and Protocol Stacks (Sections 17.02–17.04)

  • Protocol: A set of rules for data transmission agreed upon by the sender and receiver. Protocols are often grouped into a protocol suite.
  • Protocol Stack: Protocols can be viewed as layered stacks, where each layer has defined functionality and interfaces only with the layer immediately above or below it.
  • TCP/IP Protocol Suite: This suite underpins Internet usage. It typically occupies the top three layers of a five-layer stack.
  • Ethernet Protocol Stack: Ethernet is a protocol suite primarily for LANs. The TCP/IP suite is supported by the lower two layers of the network stack, which typically use Ethernet protocols.

3. Addresses and Application-Layer Protocols (Sections 17.05, 17.06)

  • MAC Addresses: These are physical addresses that uniquely identify a Network Interface Card (NIC). They are typically 48 bits long and often written in hexadecimal notation.
  • Application-Layer Protocols: These are used for user interaction. Examples include:
    • HTTP (HyperText Transfer Protocol): Underpins the World Wide Web (WWW).
    • Email Protocols: Include SMTP (Simple Mail Transfer Protocol, for sending emails and transfer between mail servers), POP3 (Post Office Protocol version 3, for downloading emails from a server), and IMAP (Internet Message Access Protocol, for downloading emails while keeping them synchronized on the server).
    • FTP (File Transfer Protocol): Used for transferring files between a client and a server.

4. Peer-to-Peer (P2P) File Sharing (Section 17.07)

  • P2P Architecture: A structure without central control where peers act as both clients and servers.
  • BitTorrent Protocol: Used for P2P file sharing on the Internet.

Chapter 18: Hardware and virtual machines

 Part 3: Advanced theory

The learning objectives for Chapter 18 include understanding:

  • Reduced Instruction Set Computers (RISC) and Complex Instruction Set Computers (CISC) processors.
  • The importance and use of pipelining and registers in RISC processors.
  • The four basic computer architectures: SISD, SIMD, MISD, MIMD.
  • The characteristics of massively parallel computers.
  • The concept of a virtual machine.

18.01 The control unit

The control unit within the CPU is responsible for ensuring that each machine instruction is handled correctly during program execution. There are two primary methods for designing a control unit to perform this function:

  1. Hard-wired solution: The control unit is constructed as a logic circuit where machine-code instructions are handled directly by hardware.
  2. Microprogramming: The control unit contains a ROM component that stores microinstructions or microcode, often referred to as firmware.

18.02 CISC and RISC processors

A processor has an instruction set architecture, which dictates the instruction set, instruction format, addressing modes, and accessible registers. The choice of the instruction set distinguishes one architecture from another.

Complex Instruction Set Computer (CISC)

In early computing, the focus was on instruction sets that eased the writing of a compiler for a high-level language, leading to the development of the Complex Instruction Set Computer (CISC) architecture.

  • A CISC processor typically contains many specialised instructions.
  • These specialised instructions often require multiple memory accesses, which are significantly slower than register accesses.
  • The complexity of many CISC instructions generally necessitates the use of microprogramming for the control unit.
  • CISC typically has multi-cycle and variable-length instructions.

Reduced Instruction Set Computer (RISC)

The philosophy of using a Reduced Instruction Set Computer (RISC) emerged in the late 1970s. The term 'reduced' applies to more than just the number of instructions; the complexity of instructions is a key feature.

  • RISC processors feature fewer and simpler instructions.
  • Instructions are typically fixed-length and, whenever possible, single-cycle.
  • They use multiple register sets.
  • The simplicity of RISC instructions allows data manipulation directly within registers, with memory access limited primarily to load and store instructions.
  • The simple instructions make it easier to use hard-wiring for the control unit.

Pipelining

Pipelining is a form of instruction-level parallelism that was a major motivation for creating RISC processors.

  • The underlying principle is that the fetch–decode–execute cycle (described in Chapter 5) can be separated into multiple independent stages.
  • A common model is a five-stage pipeline (Instruction fetch, Instruction decode, Operand fetch, Instruction execute, Result write back).
  • For pipelining to work efficiently, the processor must have independent units for each stage, which explains why RISC needs many register sets—each unit requires access to its own set of registers.
  • Once running, the pipeline handles multiple instructions simultaneously, potentially completing the processing of one instruction at every clock cycle, making the execution time significantly faster than sequential processing.

Interrupt handling in a pipelined system is complex, as multiple instructions are within the pipeline when an interrupt occurs. Options for resolution include:

  • Erasing the contents of the pipeline for the latest instructions and applying the normal interrupt-handling routine to the remaining instruction.
  • Constructing the individual processor units with separate program counter registers to store current data for all instructions in the pipeline while the interrupt is processed.

18.03 The basic computer architectures

Computer architectures can be classified based on the number of instruction streams and data streams.

  1. Single Instruction Stream Single Data Stream (SISD):
    • This is the typical arrangement found in early computers and the earliest microprocessors.
    • It involves purely sequential functioning with no parallelism.
    • An SISD system consists of a single processor accessing one memory.
  2. Single Instruction Stream Multiple Data Stream (SIMD):
    • This applies parallelism to the data stream.
    • A single instruction is simultaneously executed on different data streams.
    • The architecture requires one control unit instructing multiple processing units.
    • Implementation examples include array or vector processors (which might use large registers to store multiple data values) or multi-core processors working in parallel.
  3. Multiple Instruction Stream Single Data Stream (MISD):
    • This architecture is generally not evidenced in any individual computer design.
    • A concept related to this architecture might be applied in a fault-tolerant system, where the same data stream is fed into multiple processors, and the output is accepted only if all processors produce the same result.
  4. Multiple Instruction Stream Multiple Data Stream (MIMD):
    • This involves multiple processors asynchronously processing parallel data input.
    • Each Processing Unit can execute a different instruction.
    • The multiple data streams can be provided by partitioned memory, where each Processing Unit might have a dedicated cache.

Massively Parallel Computer Systems

The MIMD architecture can be implemented in massively parallel computers, which are multicomputer systems often referred to as 'supercomputers'.

  • These systems use an extremely large number of individual processors working in parallel.
  • Instead of a bus structure, they use a network infrastructure to support multiple computer units.
  • Programs running on different computers communicate by passing messages over the network.
  • Cluster computing using PCs (or 'server farms') is another type of multicomputer system.

18.04 Virtual machines

A virtual machine is software, not hardware. The most common type is the system virtual machine, which is software that emulates the hardware of a real computer system.

  • In a normal system, an application program interacts with the hardware via an operating system (OS).
  • In a system virtual machine implementation, the application interacts with a software interface provided by an OS.

Logical Structure of a System Virtual Machine:

  • Host Hardware runs a Host OS.
  • Virtual Machine Implementation Software (a utility program) runs, supported by the Host OS.
  • This software emulates multiple virtual machines (e.g., VM1, VM2).
  • Each virtual machine runs a Guest OS, which supports the running of application programs.
  • The Guest OS interacts with the virtual machine as though it were the actual hardware.

Advantages and Drawbacks:

  • Main Advantage: More than one different operating system can be made available on one computer system. This is beneficial for organizations using legacy software but wishing to retire old hardware.
  • It also allows companies offering server consolidation facilities to provide different companies with their own virtual machine running as a server on a large mainframe computer.
  • Drawbacks: Implementation requires time and effort. Performance will not be the same as that obtained on a normal system.

The Java virtual machine (discussed in Chapter 8, Section 8.05) is an example of a process virtual machine. A process virtual machine is specialized software that provides a platform-independent programming environment to support a program running the same way on any platform; it only supports a specific application (like running a Java program), whereas a system virtual machine supports any application.

 

Chapter 18: Hardware and virtual machines short notes

Chapter 18, titled Hardware and virtual machines, explores processor architecture, instruction sets, parallelism, and the concept of virtual machines.

1. Processor Design (Sections 18.01, 18.02)

The design of a control unit can be hard-wired (constructed as a logic circuit) or use microprogramming (microinstructions stored in ROM/firmware).

  • Instruction Set Architecture: Distinguishes processors based on the instruction set, format, addressing modes, and accessible registers.
  • CISC vs. RISC:
    • Complex Instruction Set Computer (CISC): Has more instructions, complex, multi-cycle, variable-length instructions, many addressing modes, and often uses microprogrammed control units.
    • Reduced Instruction Set Computer (RISC): Has fewer, simpler, single-cycle, fixed-length instructions, fewer addressing modes, multiple register sets, and often uses hard-wired control units. RISC architectures facilitate pipelining.
  • Pipelining: A form of instruction-level parallelism where the fetch–decode–execute cycle is broken into stages (e.g., five stages: fetch, decode, execute, access operand, write result).

2. Basic Computer Architectures (Section 18.03)

Architectures are classified by the number of instruction streams (I) and data streams (D):

  • SISD (Single Instruction Stream Single Data Stream): The typical sequential architecture of early computers, with no parallelism.
  • SIMD (Single Instruction Stream Multiple Data Stream): Parallelism applied to the data stream where one instruction is executed simultaneously on multiple data streams using multiple Processing Units (PUs).
  • MISD (Multiple Instruction Stream Single Data Stream): Not evidenced in individual computer architecture design, but could be implemented in fault-tolerant systems where the same data stream is fed to multiple processors.
  • MIMD (Multiple Instruction Stream Multiple Data Stream): Multiple processors asynchronously process parallel data input, each executing a different instruction. This architecture is used in massively parallel computer systems (supercomputers).

3. Virtual Machines (Section 18.04)

  • Virtual Machine (VM): Software that emulates the hardware of a real computer system. The most common type is the system virtual machine.
  • Functionality: A VM allows multiple different operating systems (Guest OS) to run simultaneously on the same hardware controlled by a Host OS.
  • Advantages: Useful for running legacy systems or for server consolidation. The Java Virtual Machine (JVM) is a specific example of a process virtual machine.

Chapter 19: Logic circuits and Boolean algebra

 Part 3: Advanced theory

The learning objectives for Chapter 19 include showing understanding of:

  • Producing truth tables for logic circuits, specifically including half adders and full adders.
  • The concept of a flip-flop (SR, JK).
  • Boolean algebra.
  • Karnaugh Maps.

19.01 Logic Circuits

Chapter 4 introduced the symbols for logic gates, their associated truth tables, and their relationship with logic expressions. Chapter 19 focuses on specific circuits used to construct computer hardware components.

The Half Adder

The half adder is the simplest circuit used for binary addition. Binary addition is a fundamental operation in computing.

  • It takes two input bits and outputs a sum bit (S) and a carry bit (C).
  • When adding 1 and 1, the result is 0 with a carry bit of 1.
  • The truth table for the half adder (Table 19.01 in the source) shows:
    • C (Carry Out) is 1 only when both inputs (A and B) are 1. This means the C output matches the operation of an AND gate.
    • S (Sum) matches the operation of an XOR operator.
  • A half adder can be implemented using an AND gate and an XOR gate, both receiving input from A and B.
  • Manufacturers often prefer using NAND or NOR gates (which are considered universal gates because any logic circuit can be built from them) due to ease of manufacture. A half adder circuit can be constructed entirely from NAND gates (Figure 19.02 in the source).

A half adder circuit performs binary addition of two individual bits.

The Full Adder

If two multi-bit binary numbers are being added, the carry bit from the previous addition must be included in the current addition.

  • The full adder is a circuit that has three inputs, including the previous carry bit (C_in).
  • One possible implementation of a full adder contains two half adder circuits and an OR gate (Figure 19.03 in the source).
  • It is also possible to construct the full adder circuit entirely from NAND gates (Figure 19.04 in the source).

A full adder circuit performs binary addition of two individual bits and an input carry bit.

19.02 Sequential Logic Circuits

All circuits discussed in earlier chapters were combinational circuits, meaning the output depends solely on the input values. In contrast, a sequential circuit is one where the output depends on both the input values and the previous output.

The SR Flip-Flop

The SR flip-flop (or 'latch') is a simple example of a sequential circuit.

  • It can be constructed using two NAND gates or two NOR gates.
  • The circuit uses a cross-coupled structure with output feedback (Figure 19.05 in the source, showing a NOR gate version).
  • The flip-flop is a two-state device; either it is in the set state (Q=1 and Q'=0) or the unset state (Q=0 and Q'=1).
  • The S and R inputs stand for 'set' and 'reset'.
    • S=0 and R=1 converts a set state to an unset state.
    • S=1 and R=0 converts an unset state to a set state.
  • The SR flip-flop can be used as a storage device for 1 bit, making it a suitable component for RAM.
  • A key constraint is that the input combination R=1 and S=1 leads to an invalid state where Q and Q' are both 0, meaning the circuit must be protected from simultaneous R and S inputs.

The JK Flip-Flop

The JK flip-flop addresses issues of invalid states and the potential for uncertainty if inputs are not synchronised perfectly.

  • It often includes a clock pulse input for better synchronisation.
  • The J input acts as the set input, and the K input acts as the clear input, similar to the SR flip-flop.
  • An important difference is that if both J and K are input as 1, the values for Q and Q' are toggled (switched).
  • This feature makes the JK flip-flop a ** more reliable device** because there is no combination of input states that results in uncertainty regarding the stored values.
  • The truth table shows:
    • J=0, K=0: Q is unchanged.
    • J=1, K=0: Q becomes 1.
    • J=0, K=1: Q becomes 0.
    • J=1, K=1: Q toggles.

19.03 Boolean Algebra Basics

Boolean algebra provides a simplified notation for writing a logic expression and a set of rules (laws or identities) for manipulating that expression.

  • In Boolean algebra, 1 represents TRUE and 0 represents FALSE.
  • The addition symbol (+) represents the OR operation; thus, 1 + 1 = 1 (TRUE OR TRUE is TRUE).
  • The notation used includes:
    • A.B or AB for A AND B (using the dot notation in this text).
    • barA for NOT A (the inverse of A).

Boolean Algebra Identities (Laws)

Boolean algebra identities (laws) list the formal rules for manipulation. They include:

Identity/Law

AND form

OR form

Identity

1.A = A

0 + A = A

Null

0.A = 0

1 + A = 1

Idempotent

A.A = A

A + A = A

Inverse

A.barA = 0

A +barA = 1

Commutative

A.B = B.A

A + B = B + A

Associative

(A.B).C = A.(B.C)

(A + B) + C = A + (B + C)

Distributive

A + B.C = (A + B).(A + C)

A.(B + C) = A.B + A.C

Absorption

A.(A + B) = A

A + A.B = A

De Morgan’s

overlineA.B =barA +barB

overlineA + B =barA.barB

Double Complement

barbarA = A


Note: All but one of the identities have an AND form and an OR form, which can be interchanged by swapping 0 and 1, and AND and OR.

19.04 Boolean Algebra Applications

Creating an Expression from a Truth Table (Sum of Products)

One formal way to create a Boolean algebra expression is by applying the sum of products method, which creates a minterm for each row of the truth table that results in a 1 output.

  • This approach was used to define the half adder:
    • C = A.B (since only one row yields C=1, when A=1 and B=1).
    • S = A.barB +barA.B (since two rows yield S=1; 0 is represented by the inverse of the input symbol, e.g.,barA).

Simplification

Boolean algebra laws can be used to simplify expressions. For example, A + A.B can be simplified to A + B.

19.05 Karnaugh Maps (K-maps)

A Karnaugh map (K-map) is a graphical representation of a truth table that makes deriving a Boolean algebra expression easier than using the sum-of-products method alone. If applied correctly, a K-map produces the simplest possible form for the Boolean algebra expression.

K-map Interpretation Rules:

  1. Only cells containing a 1 are considered.
  2. Groups of cells containing 1s are identified (rows, columns, or rectangles).
  3. Groups must contain 2, 4, 8, and so on cells.
  4. Each group should be as large as possible.
  5. Groups can overlap, and all overlapping groups must be used.
  6. If a cell cannot be contained in any group, it is treated as a group itself.
  7. Within each group, only the input values that retain a constant value throughout the group are retained.
  8. The final Boolean algebra expression is the sum of the retained values (e.g., X = A + B).

When constructing a K-map for three or more inputs, the column labelling must follow the Gray coding sequence, where only one bit changes value each time (Figure 19.09 in the source).

 

Chapter 19: Logic circuits and Boolean algebra short notes

1. Logic Circuits (Section 19.01)

  • Half Adder Circuit: Performs the binary addition of two individual bits (A and B), producing a Sum bit (S) and a Carry bit (C). The logic for S matches the XOR operator, and the logic for C matches the AND operator.
  • Full Adder Circuit: Performs binary addition of two individual bits (A and B) and an input Carry bit (C_in), producing a Sum bit (S) and an output Carry bit (C_out).

2. Sequential Logic Circuits (Section 19.02)

  • Combinational Circuit: The output is dependent only on the input values.
  • Sequential Circuit: The output depends on the input values and the previous output.
  • SR Flip-flop: A sequential circuit (often constructed using NOR or NAND gates) that can be used as a storage device for one bit. It has inputs for Set (S) and Reset (R). The state S=1, R=1 must be avoided as it leads to an invalid state.
  • JK Flip-flop: Similar to the SR flip-flop but more reliable because the condition where both inputs are 1 (J=1 and K=1) causes the state to toggle (switch value), removing the uncertain state.

3. Boolean Algebra (Sections 19.03, 19.04)

Boolean algebra provides a simplified way of writing a logic expression and a set of rules for manipulation.

  • Notation: TRUE is represented by 1, FALSE by 0. OR is represented by +, and AND may be represented by a dot (cdot) or concatenation. NOT A is represented bybarA.
  • Identities/Laws: A set of rules (e.g., Identity, Null, Idempotent, Inverse, Commutative, Associative, Distributive, Absorption, De Morgan's) are used for algebraic simplification.
  • Sum-of-Products Method: Establishes a "minterm" (a product term) for each row of the truth table that results in a 1 output. These minterms are then combined using OR operators (sum) to create the Boolean expression.

4. Karnaugh Maps (K-maps) (Section 19.05)

A Karnaugh map is a representation of a truth table that allows a simplified logic expression to be derived without extensive algebra.

  • Rules for Grouping: Only cells containing a 1 are considered. Groups (rows, columns, or rectangles) must contain 2, 4, 8, etc., cells. Groups should be as large as possible and can overlap. The simplest expression is derived by retaining only the input variables that remain constant within the identified groups.
  • Gray Coding: K-maps for three or more inputs use Gray coding (where only one bit changes between adjacent columns/rows) to ensure correct wrapping and simplification.

Chapter 20: System software

Part 3: Advanced theory

The learning objectives for Chapter 20 include understanding:

  • How an operating system (OS) can maximise the use of resources.
  • The ways in which the user interface hides the complexities of the hardware from the user.
  • Process management.
  • Virtual memory, paging, and segmentation for memory management.
  • How an interpreter can execute programs without producing a translated version.
  • The various stages in the compilation of a program.
  • How the grammar of a language can be expressed using syntax diagrams or Backus-Naur Form (BNF) notation.
  • How Reverse Polish Notation (RPN) can be used to carry out the evaluation of expressions.

20.01 The Purposes of an Operating System (OS)

An OS provides an environment where programs can run for the benefit of a user. Chapter 8 summarised the activities of an OS, and Chapter 20 discusses them in greater detail.

OS Initialization and Multiprogramming

  1. Bootstrapping: When a computer is switched on, the OS programs are on disk, meaning no OS is running initially. The computer uses a basic input/output system (BIOS) stored in ROM to start a bootstrap program, which then loads the OS into memory and starts it running.
  2. Resource Maximisation: An OS maximises resource use, particularly the CPU, memory, and I/O system. It can provide facilities for more than one program to be stored in memory.
    • Multi-programming occurs when multiple programs are ready in memory, though only one accesses the CPU at a time.
    • A time-sharing system is designed for many users simultaneously logged in.

Operating System Structure The OS structure is designed to support both resource management and user facilities. It operates in two modes:

  1. User mode: Available for the user or an application program.
  2. Kernel mode (or alternative names): Has sole access to part of the memory and certain system functions that user mode cannot access. The kernel runs all the time.

The OS can be structured in two ways:

  • Layered structure: Each higher layer is serviced by a lower layer (similar to a network protocol stack). Application programs or utility programs make system calls to the kernel. A layered structure is difficult to achieve in practice.
  • Modular structure: This is a more flexible approach where the kernel calls individual services (like memory management or scheduling management) when required. A micro-kernel structure reduces the kernel's functionality to the absolute minimum.

20.02 The Input/Output (I/O) System

The I/O system manages input and output, including activity involving the user (like a keyboard) and activity involving storage devices (like a disk) while a program is running.

I/O Issues and Speed:

  • A schematic diagram shows the I/O system involves the CPU, Memory, and devices connected via buses, often using device drivers (e.g., screen device driver, disk device driver).
  • Data transfer between an I/O device and memory can optionally bypass the CPU via a bus structure, particularly for large quantities of data.
  • I/O speeds are very slow compared to the CPU clock cycle (which operates at GHz frequencies). The management of CPU usage is critical to ensure the CPU does not remain idle while I/O takes place.

20.03 Process Scheduling

Resource management primarily involves scheduling to ensure efficient CPU usage.

The Role of Schedulers Programs start on disk. A user submits a program as a 'job'.

  • High-level scheduler (or long-term scheduler): Controls the selection of a program stored on disk to be moved into main memory.
  • Medium-term scheduler: Occasionally moves a program back to disk if memory becomes overcrowded.
  • Low-level scheduler (or short-term scheduler): Controls when a process stored in memory is given access to the CPU.

Process States A process is a program in memory that has an associated process control block (PCB). The PCB is a complex data structure storing all data relevant to the running of a process. Once in memory, a process can transition between five states:

  • New: A new process arrives in memory and a PCB is created; it transitions to Ready.
  • Ready: The process is ready to use the CPU.
  • Running: The process has access to the CPU via the dispatcher.
  • Waiting (or Suspended/Blocked): The process cannot progress until an event (e.g., I/O completion) occurs.
  • Terminated: The process completes execution.

If a process is separated into parts called threads, each thread is handled as an individual process.

Interrupts: Interrupts may be caused by errors that terminate a process prematurely. Otherwise, interrupts occur for two main reasons:

  1. I/O operations: When a running process makes a system call for an I/O operation and must change to the Waiting state.
  2. Scheduling decisions: When the scheduler halts a process.

When an interrupt occurs, the OS kernel must invoke an interrupt-handling routine. This routine records the current register values in the PCB, allowing the process to resume later.

Scheduling Algorithms A scheduling algorithm can be pre-emptive (can halt a process) or non-pre-emptive (a process runs undisturbed unless it voluntarily yields the CPU).

  • First Come First Served (FCFS): A non-pre-emptive algorithm implemented using a First-in First-out (FIFO) queue. It is inefficient if used exclusively but can be part of a more complex algorithm.
  • Prioritisation: Pre-emptive algorithms often involve assigning priorities to processes.

20.04 Memory Management

Memory management encompasses several aspects:

  • Provision of protected memory space for the OS kernel.
  • Defining memory addresses for a program, its data, and associated procedures when loading a program.
  • Decisions on how much memory space is allocated to individual processes.
  • Managing memory fragmentation that occurs in multiprogramming systems, sometimes requiring the medium-term scheduler to move a process out of memory.

Partitions and Segments

  • Partitions: An early approach where memory was divided into fixed-size partitions, ideally loading a whole process into one partition.
  • Dynamic partitioning: An improvement where partition size adjusted to match process size (one process per partition).
  • Segmentation: Large processes were divided into segments (of unequal size), which were loaded into dynamic partitions. This was inefficient due to fragmentation of both memory and disk storage and because segments had to be moved frequently between disk and memory.

Paging and Virtual Memory The modern approach uses paging.

  • The process is divided into equal-sized pages, and memory is divided into frames of the same size. Secondary storage can also be divided into frames.
  • Virtual memory is a special case of paging where a program uses more memory addresses than are available in main memory.
  • Paging requires the CPU to transfer address values to a memory management unit. An address consists of a page number and an offset.
  • A page table is used by the memory management unit for address translation. It lists the page number (as an index), a presence flag (to indicate if the page is in memory), and often the page frame number.
  • A page fault occurs when a running process requires access to a page that is not in memory. A page replacement algorithm is used to decide which page to remove from memory before loading the required page from secondary storage.
  • A potential problem is disk thrashing, where the system spends almost all its time loading and unloading pages repeatedly.

20.05 Operating System Facilities Provided for the User

The OS provides various facilities for the user:

  • User Interface: Made available as a command line, graphical display, or voice recognition system, allowing interaction with running programs.
  • Device Driver: Provided by the OS for every device used by a program.
  • File System: Provided for users to store data and programs.
  • Programming Environment: Supported by the OS, allowing programs to be created and run without the programmer needing detailed knowledge of processor functions.
  • System Calls: A set of calls that provide an interface to OS services, such as reading data from a file.
  • Application Programming Interface (API): An extension of system calls where each API call performs a specific function (e.g., creating a screen icon). APIs aim to provide portability for a program across different operating systems.

20.06 Translation Software (Compilers and Interpreters)

This section details the workings of compilers and interpreters, which are used to translate program code.

The Compiler Model A compiler is often described as having a front end (performing analysis and producing intermediate code) and a back end (performing synthesis and producing object code). An interpreter uses the same analysis front end but executes the intermediate code line-by-line immediately after it is created.

Front-End Analysis Stages The four stages of front-end analysis are:

  1. Lexical analysis: Identifies lexemes (meaningful individual characters or character collections, such as identifiers, keywords, or operators). It typically removes comments and whitespace first. It outputs a tokenised version of the source code.
    • A symbol table (or identifier table) is maintained for recognized identifiers, recording attributes like data type and where it is declared.
  2. Syntax analysis (or parsing): Analysis program constructs and records the results as a syntax or parse tree. This stage is critical for ensuring the hierarchical structure and operator precedence in expressions.
  3. Semantic analysis: Checks the meaning (semantics) of the program, such as whether data types are correctly used (e.g., ensuring a real number is not assigned to an integer variable). If successful, it produces an abstract syntax tree.
  4. Intermediate code generation: Converts the abstract syntax tree into an intermediate code. This code often uses a simplified format (e.g., statements requiring at most three addresses).

Representation of the Grammar of a Language The defined grammar of a programming language must be understood by the compiler writer.

  • Syntax diagram: One method for presenting grammar rules graphically.
  • Backus-Naur Form (BNF): Another method for representing grammar rules.

Back-End Synthesis Stages The main stage is machine code generation from the intermediate code. This involves optimisation to create an efficient program:

  • Source-code inherent optimisation: Focusing on features propagated from the source code (e.g., avoiding recalculation of sub-expressions).
  • Machine-code initiated optimisation: Focusing on efficient use of registers or memory after machine code has been generated.

Evaluation of Expressions Assignment statements often require algebraic expressions to be evaluated.

  • Reverse Polish Notation (RPN): The infix representation in the code can be converted to RPN (a postfix representation). RPN never requires brackets and has no rules of precedence.
  • Conversion: RPN conversion manually requires considering operator precedence. A syntax tree can be used to convert an expression to RPN by a post-order traversal. The shunting-yard algorithm uses a stack to convert an infix expression to RPN.
  • Evaluation: An RPN expression can be evaluated using a stack. Values are pushed onto the stack. When an operator is encountered, the top two values are popped, the operation is performed, and the new result is pushed back onto the stack. Modern processors often have instructions that handle stack operations, allowing compilers to utilize RPN for efficient program execution.

 

Chapter 20: System software short notes

1. Operating System Purposes and Structure (Sections 20.01, 20.02)

  • Start-up: When a system is switched on, the BIOS (stored in ROM) starts a bootstrap program, which loads the OS into memory.
  • Structure: The OS often uses a modular structure, where the kernel calls on individual services (memory management, file management, etc.).
  • I/O System: Manages all input/output, including storage devices. The OS ensures that I/O, which is slow compared to the CPU clock cycle, does not leave the CPU idle.

2. Process and Memory Management (Sections 20.03, 20.04)

  • Process Scheduling: Ensures efficient CPU usage.
    • A High-level scheduler moves programs from disk to memory.
    • A Low-level scheduler decides which process in memory accesses the CPU.
    • Process States: Processes exist in five states: New, Ready, Running, Waiting, and Terminated.
  • Memory Management: In a multiprogramming system, memory fragmentation can occur.
    • Paging: Divides the process into equal-sized pages and memory into equal-sized frames.
    • Virtual Memory: Uses paging to allow a program to use more memory addresses than are physically available in main memory. A problem known as disk thrashing can occur if pages are constantly being loaded.
    • Segmentation: Divides a process into logical segments that are not constrained to be the same size.

3. Translation Software (Section 20.06)

Translation software (compilers and interpreters) uses an analysis–synthesis model.

  • Compiler Operation:
    • Front End (Analysis): Reads source code and generates intermediate code. Stages include lexical analysis (creating tokens), syntax analysis (parsing and creating a parse tree), semantic analysis, and intermediate code generation.
    • Back End (Synthesis): Takes intermediate code and generates object code (machine code). This stage performs optimisation.
  • Grammar Representation: The grammar of a programming language can be expressed using Backus-Naur Form (BNF) or syntax diagrams.
  • Evaluation of Expressions: Expressions are often evaluated by converting the infix representation to Reverse Polish Notation (RPN). RPN (a postfix representation) eliminates the need for brackets and rules of precedence. A stack is used to evaluate RPN expressions.

Chapter 21, titled Security

Part 3: Advanced theory

The learning objectives for Chapter 21 include showing understanding of:

  • How encryption works.
  • Digital certification.
  • The Secure Socket Layer (SSL) / Transport Layer Security (TLS) protocols.

21.01 Encryption Fundamentals

Encryption is a key focus for securing data, particularly when transmitting data over a network, although it can also be used as a routine procedure when storing data within a computer system.

The encryption process involves three main components:

  1. Plaintext: The original data, whatever form it takes.
  2. Encryption Algorithm: A process that uses a key to encrypt the plaintext. The encryption algorithm must not be a secret; it must be in the public domain.
  3. Key: Used by the encryption algorithm. The encryption key must be secret.
  4. Ciphertext: The product of the encryption, which is transmitted to the recipient.

The recipient reverses this process through decryption, using a decryption algorithm and a key to produce the original plaintext.

Security Concerns in Transmission

There are several security concerns related to data transmission that encryption helps address:

  • Confidentiality: Ensuring that only the intended recipient can decrypt the ciphertext. This concern arises because a message could be intercepted and read by an unauthorised person during transmission.
  • Authenticity: The receiver must be certain who sent the ciphertext.
  • Integrity: The ciphertext must not be modified during transmission. This reflects the potential for deliberate interference or accidental data corruption.
  • Non-repudiation: Neither the sender nor the receiver should be able to deny involvement in the transmission.
  • Availability: Nothing should prevent the receiver from receiving the transmission.

Chapter 21 focuses primarily on confidentiality, authenticity, and integrity.

Encryption Methods (Symmetric vs. Asymmetric)

There are two main approaches to encryption:

  1. Symmetric Key Encryption:
    • Uses just one key.
    • This single key is a secret shared by the sender and the receiver.
    • The sender uses the key for encryption; the receiver uses the same key for decryption.
    • The main issue is the secure delivery of the secret key to the receiver.
  2. Asymmetric Key Encryption (Public Key Encryption):
    • Uses two different keys, one for encryption and the other for decryption.
    • Only one of these keys is a secret (the private key).
    • The process is initiated by an individual who possesses both keys (a public key and a private key).
    • The public key is sent to anyone who will partake in an encrypted communication.
    • The secret private key is never sent to anyone.
    • If the key holder wishes to receive a transmission, the sender uses the receiver's public key to encrypt the plaintext. Only the receiver can decrypt the message using their private key because the keys are a matched pair.
    • To ensure confidentiality, the encryption algorithm must be complex, and the key length (number of bits) must be large.

If two people require two-way communication, both must have a private key and must exchange their matching public keys.

A single method of encryption used by an unauthorised person is called a brute-force attack, where all possible key values are tried.

21.02 Digital Signatures and Digital Certificates

Asymmetric encryption can also be used in reverse: an individual can encrypt a message with their private key, and a recipient with the corresponding public key can decrypt it. This approach is not used for confidential content but is important for verifying the sender's identity because only the sender has the private key that works with that specific public key. If decryption is successful, the message has been received with a digital signature identifying the sender.

Using Hash Functions for Digital Signatures

A more efficient method for digital signatures involves using a cryptographic one-way hash function (which is public) to create a digest (a unique number) for the message.

  • The sender uses their private key to encrypt the digest; this encrypted digest is the digital signature.
  • The message (plaintext) and the digital signature (encrypted digest) can be transmitted separately.
  • Because the digest is much smaller than the whole message, encryption and transmission are faster than encrypting the entire message.

At the receiver's end:

  • The receiver uses the same public one-way hash function on the received message to create a new digest.
  • The encrypted digital signature is decrypted using the sender's public key.
  • If the two digests (the one created by the receiver and the one decrypted from the signature) are identical, the receiver can be confident that the message is authentic and unaltered.

The digital signature is different every time because the hash function is uniquely defined by the message content.

Digital Certificates

Authentication is strengthened by using a Certification Authority (CA) as part of a Public Key Infrastructure (PKI). This addresses the issue of someone creating a public key and pretending to be someone else.

The process for a receiver (Person A) to obtain a digital certificate involves:

  1. Person A, having a public–private key pair, contacts a local CA.
  2. The CA confirms the identity of Person A.
  3. Person A's public key is given to the CA.
  4. The CA creates a public-key certificate (a digital certificate) and writes Person A's public key into this document.
  5. The CA adds a digital signature to the document using encryption with the CA's private key.
  6. The digital certificate is given to Person A.
  7. Person A posts the digital certificate on a website or a website specifically designed for digital certificate data.

Anyone who wants to extract the public key from this certificate must use the CA's public key.

21.03 Symmetric Key Encryption Methods

The history of symmetric key encryption includes the Data Encryption Standard (DES), which was replaced by Triple DES due to weaknesses, and subsequently by the Advanced Encryption Standard (AES) in 2001.

  • S-DES (Simplified DES): A simplified, educational example of a block cipher that encrypts blocks of bits (8-bit blocks in S-DES). It uses a 10-bit key to generate two 8-bit keys using permutations and circular left shifts. The encryption process involves a five-step process using these two 8-bit keys.
  • AES: Defines a block length of 128 bits. Users can choose key lengths of 128, 192, or 256 bits.

The security concern for symmetric key encryption is the safe transfer of the key to both the sender and the receiver.

21.04 Public Key Encryption Methods

RSA (Rivest-Shamir-Adleman) is the standard method for public key encryption.

  • Key Generation: Involves choosing two very large prime numbers (p and q), calculating their product (n), and deriving a pair of keys, (n, e) as the public key and (n, d) as the private key.
  • Security: The security relies on the fact that finding the factors (p and q) of a very large number (n) is computationally infeasible within a reasonable timescale.
  • Functionality: RSA works on numbers, so text must first be converted into a numerical coding scheme.
    • Encryption of a number x to y follows the relationship: y = x^epmod n.
  • Usage: Public key encryption is inherently more secure but slower than symmetric key encryption. Therefore, public key encryption is often used solely to securely deliver a key that can then be used for the faster symmetric key encryption.

21.05 SSL and TLS

The Secure Socket Layer (SSL) protocol was created to provide assurance to users accessing a website, addressing concerns about whether the website is genuine and whether sensitive data can be safely transferred.

  • Functionality: SSL acts as an additional layer between TCP (in the transport layer) and the application layer. When SSL is implemented, HTTP becomes HTTPS.
  • Structure: SSL is a protocol suite, containing a Record Protocol (for data transmission format) and a Handshake Protocol (for security).
  • Handshake Process: The client browser invokes the Handshake Protocol after a TCP connection is established.
    • The Handshake Protocol requests the server's SSL certificate (a digital certificate confirming its identity) and its public key.
    • The browser uses this public key to encrypt a one-off session key for symmetric key encryption.
    • The session key is then used for the symmetric encryption of the data transfer during that specific session.

SSL was standardized and subsequently upgraded to Transport Layer Security (TLS), which is the recommended version due to security concerns with older SSL versions, though SSL is still widely used.

21.06 Quantum Cryptography

Quantum Key Distribution (QKD) systems offer an alternative method for key transfer that relies on the laws of quantum mechanics.

  • QKD uses photons (light particles) to represent bits based on their polarisation.
  • This method allows a sender and receiver to create a 'shared secret' code (key).
  • The key transfer does not involve defined values, only photons.
  • The advantage of QKD is that anyone attempting to intercept the photons to determine their polarisation will, by quantum mechanics, destroy the photons.
  • The main drawback is that QKD requires a dedicated, special-purpose 'quantum channel,' which is very costly.

_______________________________________________________________________________________

Chapter 21: Security short notes

1. Encryption Fundamentals (Section 21.01)

Encryption converts original data (plaintext) into encoded data (ciphertext) using an encryption algorithm and a key. Decryption reverses this process.

Security concerns during transmission include:

  • Confidentiality: Only the intended recipient can decrypt the message.
  • Authenticity: The receiver must be certain of the sender's identity.
  • Integrity: The ciphertext must not be modified during transmission.
  • Non-repudiation: Neither party can deny involvement.

2. Symmetric and Asymmetric Key Encryption (Section 21.02, 21.03, 21.04)

  • Symmetric Key Encryption: Uses one private key (or secret key) known by both the sender and receiver for both encryption and decryption. Examples include DES and AES.
  • Asymmetric Key Encryption (Public Key Encryption): Uses a pair of keys—a public key and a mathematically related private key—where one key encrypts and the other decrypts. RSA is the usual method.

3. Digital Signatures and Certificates (Section 21.02)

  • Digital Signature: Used to ensure authenticity (that the message was sent by the claimed sender) and integrity (that the content has not been changed). It is created by applying the sender's private key to a message digest (hash) of the plaintext.
  • Digital Certificate: Used for strict authentication. It is provided by a Certification Authority (CA) as part of a Public Key Infrastructure (PKI). A digital certificate contains data items such as the sender's public key and is signed by the CA's private key.

4. SSL/TLS (Section 21.05)

  • Secure Socket Layer (SSL), and its successor, Transport Layer Security (TLS), is a protocol suite that provides security assurance when accessing a website.
  • Function: SSL functions as an additional layer between TCP in the transport layer and the application layer. When SSL is in place, HTTP becomes HTTPS. A Handshake Protocol is responsible for security establishment.

Chapter 22: Artificial Intelligence (AI)

Part 3: Advanced theory

The learning objectives for Chapter 22 include showing understanding of:

  • How graphs can be used to aid Artificial Intelligence (AI).
  • How artificial neural networks have helped with machine learning.
  • Deep Learning, Machine Learning, and Reinforcement Learning and the reasons for using these methods.
  • Back propagation of errors and regression methods in machine learning.

22.01 An Overview

It is challenging to find a wholly satisfactory definition for Artificial Intelligence due to the difficulty in defining "intelligence" itself. A calculator, while performing complex mental arithmetic (like 43times 13), is not described as having artificial intelligence, rendering a definition like "AI involves the automation of intelligent behaviour" unsatisfactory.

A common, slightly vague definition of AI is that it is concerned with "how to make computers do things at which, at the moment, people are better". AI is agreed to be a component of computer science.

22.02 How Graphs Can Be Used in AI

A graph is a structure composed of nodes (or vertices) and edges connecting them. Edges often have associated labels, which are numerical values. Graphs can represent various scenarios, such as nodes representing places and edge labels representing the distances between them.

AI algorithms use graphs to solve problems, such as finding the shortest route between two non-adjacent nodes.

Dijkstra's Algorithm

Dijkstra's algorithm is an AI algorithm designed to find the shortest path to every other node starting from a specified node.

The simplified Structured English design of the algorithm involves:

  1. Maintaining a set of nodes for which the shortest path has been found (ShortestPath set) and a set of remaining nodes (RemainingNodes).
  2. Storing the current shortest distance found so far for all nodes in the RemainingNodes set, along with the sequence of nodes used to obtain this distance.
  3. Selecting the node (N) in the RemainingNodes set that has the currently stored shortest distance value.
  4. Moving node N from the RemainingNodes set to the ShortestPath set.
  5. Updating the stored distance values for all adjacent nodes to N that are still in the RemainingNodes set. If a new, shorter value is found, the sequence of nodes used to reach it is also stored.
  6. The algorithm continues until the ShortestPath set contains all nodes.

The A* Algorithm

The A algorithm* is a version of Dijkstra's algorithm that improves efficiency, particularly in situations where a large amount of time is spent exploring nodes that are not on the shortest path to the target node.

The A* algorithm uses a heuristic function (which is lower than the actual value) to estimate the distance still to be travelled. This calculation helps prioritise which nodes to explore next.

The algorithm uses three lists: the initial list, the open list, and the closed list. The open list stores nodes that might be involved in the final shortest path and is key for efficiency because it is stored in order of the estimated total distance.

22.03 Machine Learning

Machine learning is an approach within AI where a system improves its performance as it gains experience in performing a defined task. This ability to learn from experience is viewed as an indication of intelligence.

The requirements for machine learning are summarized as:

  • A computer-based system performing one or more defined tasks.
  • Knowledge is acquired through the system's experience performing those tasks.
  • The system's performance on future tasks improves due to the acquired knowledge and experience.

There are several ways to describe how machine learning takes place:

  1. Unsupervised Learning: The system must draw its own conclusions from the results of its tasks by analysing and categorizing the available data. Algorithms identify "conceptual clusters" often arranged in a hierarchical tree structure. This approach is dominant today, often exploiting massive data banks collected from online activity (like tracking actions on the World Wide Web) to recommend products or services. There is usually no underlying theoretical framework; the intelligence is seemingly built into the data.
  2. Supervised Learning: The system is provided with sample data alongside associated classification or outcome data. An example is feeding answers to exam questions with associated grades to an AI program being developed for marking. A special case is the expert system, which focuses on a narrowly defined domain of knowledge and is fed data and conclusions by human experts. An expert system is generally not considered an example of machine learning, as it is not expected to improve its performance unaided.
  3. Reinforcement Learning: This method combines features of both supervised and unsupervised learning. Key concepts used include:
    • An agent learning how to perform best within an environment.
    • The environment has defined states.
    • The agent takes an action at each step, guided by a policy.
    • The policy is influenced by the current state and recorded history.
    • An action leads to a new state and the agent receives a reward that measures the action's effectiveness toward an overall goal.
    • The aim is to maximize reward values by improving the policy.

Reinforcement learning uses a trial-and-error search for optimum performance, often applied to logic games, robotics, or navigation tasks.

Regression Analysis

Regression analysis methods are used when the AI aims to predict and output numerical values for a defined quantity based on inputs of other quantities.

  • The system is first supplied with actual values for both input and corresponding output data.
  • It investigates if there is a correlation between these sets of values.
  • If a correlation is established and represented by a mathematical formula, this formula is then used to predict values when new data is input.
  • The simplest application is predicting a single output quantity based on a single input quantity, expecting a linear relationship (e.g., predicting an A-Level Computer Science mark based on an IGCSE Mathematics mark).
  • More complex analysis can involve multiple input values or non-linear relationships (e.g., predicting exponential growth in sales).

22.04 Artificial Neural Networks

Artificial neural networks are modeled on the neural networks in the human brain.

  • An artificial neural network can be created in hardware or software.
  • It consists of nodes (or artificial neurons), often represented as triangles or circles, which receive input and provide output.
  • The modeling process applies a weighting factor to each input.
  • The weighted inputs are summed, and an activation function computes the node's output value.
  • A simple network typically consists of an input layer, an output layer, and one or more hidden layers (which process information between input and output).

Back Propagation of Errors

When an artificial neural network is first set up, its adjustable factors (weighting factors and activation functions) must be tuned to achieve optimum predictive capability. This tuning process is performed using the back propagation of errors algorithm.

  • The system is run with input data matching real-use data using a set of trial adjustable factors.
  • An error is identified for each output node (the difference between the output value and the real-use value).
  • The system is re-run with different adjustable factor values to determine how the output's accuracy depends on the factors.
  • The algorithm is applied first to the output layer nodes, and then works backward through the hidden layers until the input nodes are considered.
  • This process allows the system to learn and improve performance when new real-use data becomes available.

Deep Learning

Deep Learning systems are artificial neural networks characterized by an exceptionally large number of hidden layers. They attempt to mimic the layer structure of neurons in the brain, where higher layers are concerned with more abstract information processing than lower layers. Deep Learning is enabled by increasing computing power.

 

Chapter 22: Artificial Intelligence (AI) short notes

1. Graphs and Path-Finding (Section 22.02)

A graph is an Abstract Data Type (ADT) used to record relationships between entities. It consists of nodes (vertices) and edges, where edges can have numerical labels (weights).

  • Algorithms: AI uses algorithms to find the shortest route between nodes.
    • Dijkstra's algorithm: Finds the shortest path to every other node starting from a chosen node.

2. Machine Learning (Section 22.03)

Machine learning is where a system improves its performance through the analysis of previous performance.

  • Supervised Learning: Knowledge is acquired when the system is fed sample data with associated classification (e.g., providing exam answers with corresponding grades).
  • Unsupervised Learning: The system draws its own conclusions by analysing and categorising available data to identify patterns (e.g., identifying conceptual clusters).
  • Reinforcement Learning: An agent learns by receiving graded rewards for actions taken in an environment.
  • Regression Analysis: Involves finding a mathematical equation that provides the best fit to the actual outcomes based on previous inputs.

3. Artificial Neural Networks and Deep Learning (Section 22.04)

  • Artificial Neural Networks (ANN): Systems modelled using connected nodes (artificial neurons) in software or hardware.
    • Function: Nodes receive weighted inputs, which are summed, and an activation function computes the output.
    • Structure: Networks often have an input layer, an output layer, and one or more hidden layers.
  • Deep Learning: Systems that use ANNs with an exceptionally large number of hidden layers.
  • Back Propagation of Errors: An algorithm used for machine learning in ANNs that optimizes the values for parameters by working backward through the hidden layers from the output layer to the input layer.

 

 

0 Reviews