chapter 03 -- internet applications and network programming

(rev. 02/08/2017)

Notes On Chapter Three -- Internet Applications And Network Programming

Chapter Three: Internet Applications And Network Programming

3.1 Introduction
- Virtually all of the many and different services available through the Internet are built upon a single, general purpose communication mechanism.
- This chapter explains conceptual paradigms and the socket application programming Interface (API).
- The programmer doesn't need to think about or understand how the network actually works.
- Library functions provide the programmer's interface to the network. The programmer simply "operates" the network by calling the library functions.
- However, understanding how the network is implemented allows a programmer to write better applications.
3.2 Two Basic Internet Communication Paradigms
- 3.2.1 Stream Transport in The Internet
  - A byte is sometimes defined as a group of 8 bits, and sometimes as the amount of data required to represent one character.
  - A stream is simply a sequence of bytes that flows from one application program to another.
  - An application like a web browser can place requests to a web server on an outbound stream, and accept web pages from the server on an inbound stream.
  - The Internet can support streams between applications. It guarantees that bytes are delivered in the order they were put onto the stream. However there is no guarantee that groupings of bytes will be preserved or have any significance. For example, a group of bytes that were written all at once may not be received all at once. Conversely, if a group of bytes arrive together as one "chunk," that does not mean the bytes were sent all at once.
- 3.2.2 Message Transport in The Internet
  - Message transport is message-at-a-time service.
  - The sender transmits a message containing a certain number of bytes, and when the receiver gets the message, it always gets that specific message - that specific group of bytes that the sender transmitted in the message, not more or less.
  - The network never delivers part of a message, and never joins multiple messages or parts of messages together.
  - Using Internet message transport, and application can send a message to every computer on a network (broadcast), to just one application on another computer (unicast), or to a specified group of recipients (multicast).
  - The message transport service is "best effort" -- an attempt is made, but there is no guarantee that a message will be delivered.
  - There is also no guarantee that messages will be delivered in the order that they were sent.
  - Also, messages may be duplicated.
  - Higher level protocols may be used to make message transport more reliable.
- It is generally much easier to write programs that use stream service than programs that use message service. Consequently programmers tend to use message service only when they must, for performance reasons. About 95% of Internet traffic is stream service.
3.3 Connection-oriented Communication
- Internet stream service is connection-oriented and bidirectional. It requires initial connection set up & final connection closing.
- Algorithm 3.1
  - Purpose: Interaction using the Internet's stream service
  - Method:
    1. A pair of applications requests a connection
    2. The pair uses the connection to exchange data
    3. The pair requests that the connection be terminated
3.4 The Client-Server Model of Interaction
- Servers start up first and wait on a well-known port.
- Then a client sends a message to a server to initiate establishment of a connection.
- The clients and servers, not the Internet, are the actors that initiate and accept connections.
3.5 Characteristics of Clients and Servers
- Clients tend to be processes on the machines of 'regular people' - usually such a process contacts servers one at a time, as service needs arise.
- Servers tend to be special-purpose programs running all the time on powerful machines, accepting connections from multiple clients.
3.6 Server Programs and Server-Class Computers
- Actually, a server is a process that provides a service to other processes.
- It has come to be fairly common for many folks to refer to the (software) server's platform machine as a server.
3.7 Requests, Responses, And Direction of Data Flow
- Information can flow in both direction between clients and servers.
- A lot of variation is possible in the pattern of interaction between clients and servers.
3.8 Multiple Clients and Multiple Servers
- Modern computers can execute many processes at the same time, so multiple clients and servers can execute concurrently on a single computer.
- This explains why a user can have four windows open, one running email, two running browsers, and one streaming music.
- A computer can run, say, a web server plus other servers at the same time -- for example one of them could be a file server. The same machine can run client processes too -- all at the same time.
- Sharing can help save on hardware and administration costs.
3.9 Server Identification and Demultiplexing
- When a client contacts a server, it submits two numbers to the network protocol software: one is a number that identifies the computer on which the server is executing, and the other identifies which service the server offers. Typically this is [IP number; Port number].
- Network software on the receiving machine uses the port number to direct the data from the client to the correct server program on the host machine.
- Details of what clients do:
  - start up (after server)
  - get server name from user (e.g. user puts a host name in a URL)
  - use DNS to translate server name into IP number
  - send a message or a connection request with both IP number and desired port number.
- Details of what servers do:
  - start before any of the clients
  - register with their local system that they are to be contacted on some particular port number 'N'.
  - wait for contact from a client
  - interact with client as long as service requires
  - wait for contact from other clients
3.10 Concurrent Servers
- Most servers can handle many clients simultaneously by employing multiple threads of control.
- Service is provided this way so that clients don't have to wait too long to get some response. A client that wants to download a small photo doesn't have to wait for another client to download an entire movie.
- Typically a main thread accepts connections from clients and hands each client off to a service thread that interacts with the client and performs the service.
3.11 Circular Dependencies among Servers
- A server may have to act as a client of another service, e.g. to get some information from a database or to authenticate a client.
- The overall design of client-server interactions must be monitored to avoid "circular dependencies."
3.12 Peer-To-Peer Interactions
- If one server is responsible for providing service to many clients the server and its connection to the network can become a bottleneck.
- Let's say the server is a file server. A possible mitigation technique is to distribute the data more-or-less equally among N servers at different locations in the network. Now each server is responsible only to respond to a fraction (e.g. 1/N) of the client requests, and there may be less of a bottleneck.
- It may be feasible for some or all of the machines involved to function both as clients and as servers.
3.13 Network Programming and the Socket API
- The socket application programing interface (socket API) is a set of commands that programmers can use to create networking software. It is a standard that was developed along with unix networking software.
3.14 Sockets, Descriptors, and Network I/O
- Unix and the socket API treat network I/O almost identically to file I/O.
- In particular, sockets that are endpoints of network communication are are "opened" and "closed". Processes refer to both sockets and files by using small integers called file descriptors.
3.15 Parameters and the Socket API
- There are some special functions that have to be used with sockets that aren't used with files.
- The design involves several functions each of which has only a few parameters. (See chart on page 37pf the 6th edition.)
3.16 Socket Calls in a Client and Server
- See figure 3.8 for illustration of a typical sequence of calls for client and server.
3.17 Socket Functions Used by Both Client and Server
- 3.17.1 The Socket Function
  - this call creates a socket and returns an integer descriptor that is used as a 'handle' for accessing the socket.
  - Form of call:
    descriptor=socket(protofamily, type, protocol)
  - protofamily = protocol family, e.g. TCP/IP, denoted by PF_INET
  - type = type of communication, e.g. stream or connectionless, denoted SOCK_STREAM or SOCK_DGRAM.
  - protocol is the specific transport protocol to be used - there may be more than one in the family.
- 3.17.2 The Send Function
  - clients and servers use send to transmit data to one another
  - Form of call:
    bytesSent = send(socket, data, length, flags)
  - socket = descriptor of socket to use
  - data = memory base address of array segment of data bytes
  - length = number of bytes in array segment
  - flags = bits to request special options
  - bytesSent = number of bytes actually sent by the call, or -1 on error
- 3.17.3 The Recv Function
  - clients and servers use recv to receive data from one another
  - Form of call:
    bytesReceived=recv(socket, buffer, length, flags)
  - socket= descriptor of socket to use
  - buffer = base memory address of array in which to place received data.
  - length = buffer size
  - flags = control of details
  - bytesReceived = number of bytes actually received by the call, or -1 on error
  - the caller blocks if it is necessary to wait for more data to come from the other end of the connection.
- 3.17.4 Read and Write with Sockets
  - Many operating systems, but not all, allow the use of 'read' and 'write' with sockets.
  - Form of call:
    bytesSent = write(descriptor, data, length)
  - Form of call:
    bytesReceived = read(descriptor, data, length)
  - Use of read and write gives flexibility - the descriptor can denote a socket or a file (or other things).
- 3.17.5 The Close Function
  - use close to terminate use of a socket
  - Form of call:
    success = close(socket)
  - socket = descriptor of socket
  - success = 0 if the call is successful, else -1 on error.
  - after calling close, the caller can't send or receive anything further using the socket
  - an eof is sent to the other side of the connection, to tell that process that no more information will be sent.
3.18 The Connection Function Used Only By A Client
- a client calls connect to initiate a (virtual) connection with a server (used in the case of stream service)
- Form of call:
  success = connect(socket, saddress, saddresslen)
- socket = the descriptor of the socket that the client will use for the connection
- saddress = a sockaddr structure to hold the server's address and protocol port number
- saddresslen = length of the server's address in bytes
- success = 0 if the call is successful, else -1 on error.
3.19 Socket Functions Used Only by a Server
- 3.19.1 The Bind Function (see pages 40-42 in Comer, ed 6)
  - use bind to associate a service port number with a socket
  - Form of call:
    success = bind(socket, localaddr, addrlen)
  - socket = desriptor of socket to use
  - localaddr = a pointer to a sockaddr structure representing the local port and address(es) to be assigned to the socket
  - typically the caller of bind sets the IP address in the sockaddr equal to the constant INADDR_ANY (if the local host has multiple IP addresses, INADDR_ANY matches any of them.)
  - addrlen = size of the local address in bytes
  - the sockaddr is the generic format for representing an address
  - bind returns 0 for success or -1 for failure
- 3.19.2 The Listen Function
  - server calls listen to put the socket into passive mode (make it ready for awaiting contact from clients
  - Form of call:
    success = listen(socket, queuesize)
  - socket = the socket to put into passive mode
  - queuesize = the size to allow for a queue of clients waiting for service.
  - if the queue is full when a client request arrives, the operating system rejects the request and the connect call of the client returns a failure code.
  - listen returns 0 for success or -1 for failure
- 3.19.3 The Accept Function
  - server calls accept to establish a connection with a client
  - accept will return immediately if the queue is not empty
  - accept will block if the queue is empty, until a client request to connect arrives.
  - Form of call:
    newsock = accept(socket, caddress, caddresslen)
  - newsock= the descriptor of a new socket to use for communication with the client (normally a service thread will interact with the client and the main server thread will call accept again on the 'old' socket).
  - socket = the socket used to accept new connections
  - caddress = pointer to a sockaddr for storing the address of the new client
  - caddresslen = pointer to an integer to store the length of the client address
3.20 Socket Functions Used with the Message Paradigm
- 3.20.1 Sendto and Sendmsg Socket Functions
  - Form of call:
    bytesSent = sendto(socket, data, length, flags, destaddress, addresslen)
  - Form of call:
    bytesSent = sendmsg(socket, msgstruct, flags)
  - In both cases above, bytesSent = number of bytes actually sent by the call, or -1 on error
  - msgstruct combines data, length and destaddress info
- 3.20.2 Recvfrom and Recvmsg Functions
  - use recvfrom or recvmsg with an unconnected socket to receive a message from an arbitrary set of clients
  - Form of call:
    bytesReceived = recvfrom(socket, buffer, length, flags, sndraddr, saddrlen)
  - the first four arguments are as in recv
  - the last two arguments are for 'capturing' the address of the sender and the length of the sender's address.
  - recvfrom records the sender's address in the same form needed for use by sendto.
  - recvmsg is the counterpart of sendmsg
  - Form of call:
    bytesReceived = recvmsg(socket, msgstruct, flags)
  - In both cases above, bytesReceived = number of bytes actually received by the call, or -1 on error
  - msgstruct holds address from which message was received and the array of received data bytes.
  - recvmsg records the information in msgstruct in the same form needed for use by sendmsg.
3.21 Other Socket Functions
- getpeername (server uses to get address of client that has connected)
- gethostname (client or server uses it to get info about the host on which it is executing)
- setsockopt (stores values in a socket's options, e.g. buffer size)
- getsockopt (obtains current socket options)
- gethostbyname (hostname to IP number translation)
- gethostbyaddr (IP number to hostname translation - to obtain a name to display for a human reader)
3.22 Sockets, Threads, and Inheritance
- typically child threads inherit copies of all open descriptors
- reference count method - an open increments, a close decrements. If the count goes down to zero, the socket is deleted.
- reference count method explains why service threads close the main socket and why main server threads close the new socket returned by a call to accept.