Friday, July 17, 2015

Android: Extract HTML image element using Jsoup

I was using json service to get data from a certain web site. But the problem was it didn't give all the information I wanted. So I decided to extract elements manually from the site. Here I demonstrate how to get image src from a web site.

Why Jsoup?

Jsoup is a java library which support to extract and manipulate HTML elements.  We can use this library parse HTML in android. The specialty is that, jsoup allows you to extract the info instead of rendering it.

Setup the android project

Insert following code to your grandle.build file and sync the project to add the jsoup library to android workplace.

1
compile 'org.jsoup:jsoup:1.7.3'

Create the app

First of all you have to identify what exactly do you want to extract from the web site. If you are using google chrome simple press F12 to open devTool, in the Element find the HTML view of the web page you want. Take some time and identify right elements. I'm gonna extract img src from the "div.post-thumbnail".


Add these lines to onCreate.
1
2
3
4
5
6
7
protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        //insert your URL here
        url="http://cyberpanhinda.com/category/%E0%B6%B1%E0%B7%80%E0%B6%9A%E0%B6%AD%E0%B7%8F/";
        (new getURL()).execute(new String[]{url});
    }
Here I have used AsyncTask to pick up HTML elements in background. If you use the UIThread to execute this jsoup, no other work cannot be done in the meanwhile.And its leads to bad user experience because user has to wait until the task is finished.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class getURL extends AsyncTask<String, Void, String> {
    StringBuffer buffer = new StringBuffer();
    @Override
    protected String doInBackground(String... params) {
        try {
            Document doc = Jsoup.connect(params[0]).get();
            buffer.append(doc.title() + "\n");
            // select element from dev class="post-thumbnail"
            Elements topics = doc.select("div.post-thumbnail");
            buffer.append("Topics list \n");
            for (Element topic : topics) {
                buffer.append(topic.select("img[src]").attr("src") + "\n");
                // "img[src]" - search for every img tags and "attr" method return the attribute
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return buffer.toString();
    }

    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        Log.v("Extracted:", s);
    }
}