In order to parse HTML on any web page I use Jsoup library. It light and very useful tool which allows to perform complex operations for getting and processing data from HTML.
You can get actual version of the library from official site. If you use Maven just place the following into your POM's
In case you have standart Java project you should to convert it to Maven Project like figure below:
After that you will see POM.xml in your work directory.
For example I will create JsoupParserExample class which will be get and output all links from google.com web page.
JsoupParserExample.java
You can get actual version of the library from official site. If you use Maven just place the following into your POM's
<dependencies> section:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.8.2</version>
</dependency>
In case you have standart Java project you should to convert it to Maven Project like figure below:
After that you will see POM.xml in your work directory.
For example I will create JsoupParserExample class which will be get and output all links from google.com web page.
JsoupParserExample.java
package com.gabdev.jsoup;After running the code you will see in output window something like this:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupParserExample {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://google.com").get();
Elements links = doc.getElementsByTag("a");
for (Element link : links) {
System.out.println(link.text());
}
}
}
Images
Maps
Play
YouTube
News
Drive
More »
It is the links (anchor texts) from google.com.
Комментариев нет:
Отправить комментарий