Last Updated: Tuesday September 4 2001
According to the socket(3) manpage, the socket(3) call was first found in 4.2BSD.
If you don't know C, you shouldn't be reading this because you'll go blind and swear off programming for good ("Holy sh*t, I have to learn all this crap to start a dot-com?!"), possibly ruining a perfectly good CS undergrad. Didn't your mommy tell you what happens to computer science dropouts? Hint: they end up working for Fry's...
In case you were wondering, yes, your web browser really does do all this stuff. A web browser is just a big fat socket program that pays a lot of attention to the data it gets. Yes, even on Windows. They call it Windows Sockets, but it's still good ol' bind(2), select(2), and accept(2).
Now that that's over with, here we go...
The point of sockets is to provide a way to get information from one place to another via a computer network (which could use any of a hundred protocols) in a way that doesn't require that you rewrite your program every time it gets ported to a different platform. Here's how to do it, step by step.
Create a socket. A socket is a lot like a filehandle, but instead of doing things like:
FILE * fp;
or
int fd;
you use the socket(3) call to create it. The socket(3) prototype on most true-blue UNIXes is
#include <sys/types.h> #include <sys/socket.h> int socket(int domain, int type, int protocol);
The domain is a namespace, or address family. Probably the most popular domain argument value is the Internet domain, PF_INET (the PF stands for protocol family). Other less commonly used arguments are PF_UNIX or PF_LOCAL for UNIX domain sockets in the local filesystem, PF_IPX (Novell IPX), PF_APPLETALK (Appletalk DDP), and PF_PACKET (for raw socket access on Linux).
The second argument (int type) is the type of connection desired. This and the domain argument are used to map to the actual layer 4 protocol (such as UDP or SPX). If you're running some kind of UNIX variant, you must have the protocols in question compiled into your kernel. Here's the protocols I get with different arguments on my Linux 2.2 system (with IPX and Appletalk DDP compiled into the kernel) at home:
|
SOCK_STREAM |
SOCK_DGRAM |
SOCK_SEQPACKET |
SOCK_RAW |
SOCK_RDM |
SOCK_PACKET |
|
|---|---|---|---|---|---|---|
|
PF_INET |
TCP/IP |
UDP/IP |
Nope |
Nope |
Nope |
Nope |
|
PF_IPX |
Nope |
Yes, ? |
Nope |
Nope |
Nope |
Nope |
|
PF_APPLETALK |
Nope |
Yes, DDP? |
Nope |
Nope |
Nope |
Nope |
/* TCP socket */ s = socket(PF_INET, SOCK_STREAM, 0); /* UDP socket */ s = socket(PF_INET, SOCK_DGRAM, 0);
Socket(3) returns -1 if an error occurs, or a socket handle if it succeeds.
The third argument, protocol, is used to specify a specific layer-4 protocol if the combination of the domain and type arguments results in some ambiguity as to which layer-4 protocol should actually be used. For TCP and UDP, this is never used, and the value of 0 should be used to tell the socket(3) function to let it decide. The Linux socket(2) man page has this to say about it:
The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is specific to the "communication domain" in which communication is to take place; see protocols(5). See getprotoent(3) on how to map protocol name strings to protocol numbers.
Sample code for creating a TCP socket:
int s;
int ret;
s = socket(PF_INET, SOCK_STREAM, 0);
if(s == -1) {
perror("socket");
}
/* Do stuff here. */
ret = close(s);
if(ret == -1) {
perror("closing socket");
}
| Always check the return value of system calls. Really. You'll thank me for it in the long run, even though it means opening up those horrible man pages and grepping for ERRORS section, horrors. For fun, go punch "always check the return value" into a search engine and you can find hours of entertainment with the documents that say things like "it took me 5 hours to realize that I wasn't checking the return value of open("/dev/null");" and "gosh, who would have thought that calling setsockopt(2) would fail?" |
Don't forget to call close(2) on the socket when you're finished with it! And don't forget to check the return value of close(2).
(optional) Create the source address data structure, if you care about the "return address" of your packets. If you skip this step, the operating system will usually fill in reasonable defaults. For example, a TCP connection to a web server works fine if you use the defaults.
However, NFS and other old-school services (like the r* family of services) usually refuse connections unless they come from a source port below 1024. If you are, it's assumed that you have root on one of the boxes on the network (or root is ok with you pretending you do), because only root can create connections from a port below 1024. Also, certain DNS implementations require that incoming DNS requests have a source port of 53. In these cases (and probably many others), you must fill in the source address structure in order to specify the correct return address for the connection, or the server on the other end will refuse it.
The source address data structure is (for PF_INET) sockaddr_in. Other protocols use different structures. Here are some examples:
|
IPX |
Appletalk DDP |
ATM |
NetBEUI |
|
|---|---|---|---|---|
|
Windows sockets 2 |
sockaddr_ipx |
sockaddr_at (?) |
sockaddr_atm |
? |
|
Linux 2.4 |
sockaddr_ipx |
sockaddr_atalk |
sockaddr_atmpvc, sockaddr_atmsvc |
sockaddr_netbeui |
|
FreeBSD 4.1-STABLE |
sockaddr_ipx |
Nope |
sockaddr_atm |
Nope |
Typical code:
#include <sys/types.h> #include <sys/socket.h> struct sockaddr_in src; src.sin_family = AF_INET; src.sin_port = htons(INADDR_ANY); src.sin_addr.s_addr = htons(INADDR_ANY);
What I've seen is that the C library headers define a structure called struct sockaddr, which is defined (on Linux, at least) as:
typedef unsigned short sa_family_t;
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
You can think of struct sockaddr as a kind of container which the other sockaddr_* structs fit inside of. Many of the library routines return a struct sockaddr, and it's much easier to just cast whatever you need to do to struct sockaddr instead of having a zillion different library routines for the many protocols the library can handle. Here are how some of the sockaddr structs are defined:
struct sockaddr_in {
short int sin_family; // 1: address family
struct in_addr sin_addr { // 2: source address
unsigned long int s_addr;
}
unsigned short int sin_port; // 3: source port
}
#define IPX_NODE_LEN 6
struct sockaddr_ipx
{
sa_family_t sipx_family;
__u16 sipx_port;
__u32 sipx_network;
unsigned char sipx_node[IPX_NODE_LEN];
__u8 sipx_type;
unsigned char sipx_zero; /* 16 byte fill */
};
#include <netdb.h>
struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyname2(const char *name, int af);
#include <sys/socket.h> /* for AF_INET */
struct hostent *gethostbyaddr(const char *addr, int len, int type);
void sethostent(int stayopen);
void endhostent(void);
void herror(const char *s);
const char * hstrerror(int err);
struct hostent {
char h_name;
char **h_aliases;
int h_addrtype;
int h_length;
char **h_addr_list;
char *h_addr;
};
h_name
h_name is the official name of the host, sometimes called the Fully Qualified Domain Name (or FQDN). This is the true name of the host, according to the DNS database.
h_aliases
h_aliases is an array of strings which contain any aliases for the host. For example, if the host you're looking up has multiple PTR records, you'll get all of them here. The last entry is NULL.
h_addrtype
h_addrtype is one of the AF_XXX constants. This is probably going to be AF_INET for almost all code you write. One other possibility is AF_INET6, if the DNS server is serving IPv6 addresses, but not too many people run v6 yet. See /usr/include/linux/socket.h on Linux systems to marvel at how open-sourcing your operating system can make it speak 14 languages too!
h_length
h_length is the length of the addresses in the following field in bytes. Should be 4 bytes for IPv4 and 16 bytes for IPv6.
h_addr_list
h_addr_list is a zero-terminated array of network addresses for the host. "More than one?" you say? Well, yeah, you can have multiple A records for a given name. Try resolving yahoo sometime. What's going is that your operating system's DNS client gets back the dozen or so addresses that Yahoo uses for the address www.yahoo.com. Your operating system picks one at random. So that way yahoo can serve their site from a dozen low-powered boxes and they don't have to buy some serious iron to serve their front page up. This technique of using multiple A records to distribute lots of hits among many servers is called "DNS-based load balancing".
/*
* resolve-yahoo.c
*/
#include <stdio.h>
#include <netdb.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(void) {
int i;
struct hostent *h;
h = gethostbyname("www.yahoo.com");
if(h == NULL) {
perror("gethostbyname");
return 1;
}
if(h->h_length != 4) {
fprintf(stderr, "Error: non-IPv4 address returned by resolver.\n");
return 1;
}
printf("official name: %s\n\n", h->h_name);
for(i = 0; h->h_aliases[i] != NULL; ++i) {
printf("alias %2i: %2s\n", i, h->h_aliases[i]);
}
printf("\n");
for(i = 0; h->h_addr_list[i] != NULL; ++i) {
printf("addr %2i: %2s\n", i, inet_ntoa(*((struct in_addr *)h->h_addr_list[i])));
}
return 0;
}
#include <netdb.h>
/* for inet_network(3) */
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
struct sockaddr_in src;
struct servent * service;
int port = 0, ret = 0;
char int_interface[] = "192.168.0.1";
service = getservbyname("domain", "tcp");
if(service == NULL) {
perror("getservbyname");
exit(1);
}
/* need to convert from network to host byte order using ntohs(3) */
port = ntohs(service->s_port);
addr = htonl(inet_network(int_interface));
src.sin_family = AF_INET;
src.sin_port = htons(port);
src.sin_addr.s_addr = htonl(addr);
This is required only if you executed step 2. Use bind(3) to bind the source address data structure to the socket handle (the return value of the socket(3) call). The usual prototype is:
#include <sys/types.h>
#include <sys/socket.h>
int bind(int sockfd, struct sockaddr *my_addr, int addrlen);
Typical usage is:
bind(s, (struct sockaddr_in *) &source,
(optional) set SO_LINGER if you don't want your socket to hang unreasonably. For example, if you're connecting to a computer which may not be present, and the default TCP timeouts are longer than you really need, you can set SO_LINGER to speed things up quite a bit.
Create destination address data structure
Connect with connect(3).
Send data using send(3).
Close the socket using shutdown(2).
Ok, so now you know how to connect to a system, great. But just where do you get those network addresses? When you use your web browser, you type in things like "www.foo.com", not "123.456.789.123". Well, it turns out that this task of having easy to use dot-com names for addresses of servers on the Internet is a really, really hard problem. The solution is the DNS, or Domain Name System.
The Domain Name System is physically a set of servers on the Internet, the most important of which are managed by big organizations like IBM, NASA, and the ISC. They maintain what's called the "top-level" or "root" domain name servers. Those servers are responsible for simply referring you to another organization's name server when you look up "lightconsulting.com", for example.
FIXME graphic
To make a TCP server in Perl, you need to do the following steps:
sub reaper {
$waitedpid = wait; # wait until the child dies then clean it up
$SIG{CHLD} = \&reaper; # signal handlers are one-shot; this resets it
}
$SIG{CHLD} = \&reaper # set up the signal handler
accept(Client, Server) -> 1035, 1.2.3.4
gethostbyaddr(1.2.3.4) -> www.whitehouse.gov
gethostbyname(www.whitehouse.gov) -> 5.6.7.8, 5.6.7.9, 5.6.7.10, 5.6.7.11
If you can't find a match, someone is spoofing their reverse DNS. Log a warning.
Home | Site Index | Email me