Web page structure

Structure of a Web page

The main elements that comprise a web page are :

  • DOCTYPE: This lets the browser know the type of markup language the page is written in.
  • Document Tree: We can consider a page as a document tree that contain any number of branches.
  • HTML: This is the root element of the document tree and everything that follows is a child node. HTML has two descendants – HEAD and BODY
  • HEAD: It contains the title and the information of the page.
  • BODY: It contains the data displayed by the page.

ElementTree

The Element type [6] is a data object that can contain tree-like data structures.

The ElementTree wrapper [6] type adds code to load web pages as trees of Element objects. An element consists of properties like a tag(identify the element type), number of attributes, text string holding the textual content and the number of child nodes.

To create a tree, we create the root element and add children elements to the root element. A method called Subelement can be used for creating and adding an element to the parent element. Few methods that are provided to search for Subelements are as follows:

  • find(pattern) – Return the first subelement matching the pattern
  • findtext(pattern) – Returns the value of the text attribute of the first subelement matching the pattern
  • findall(pattern) – Return a list matching the pattern
  • getiterator(tag) - Return a list matching the tag attribute
  • getiterator() – Return a list of all the Subelements