Day 8/100 100 days of Code

Info Hunter

I created a new function to include the code that I wanted to run in another thread.

void MainFrame::StartScraping(int amount, int counter, 
                              std::vector<std::string> keywords,
                              std::vector<std::string> getUrls)
{
    std::vector<std::string> scraperKeywords;
    for (int j = 0; j < amount; j++)
    {
        scraperKeywords.push_back(keywords[j]);
    }

    Scraper scraper;
    scraper.SetupScraper(scraperKeywords, getUrls[counter]);
    AnalyzePages pageAnalyzer;

    // Get info from website
    cpr::Response r = scraper.request_info(scraper.baseURL);

//    std::cout << r.text << std::endl;

    // Parse it
    std::vector<std::string> urls = scraper.ParseContent(r.text,
                                                         (char *) "href",
                                                         (char *) "/");

    // Iterate through them
    for (const std::string &item: urls) {
        std::cout << item << std::endl;
       pageAnalyzer.analyzeEntry(item, scraperKeywords, scraper);
    }
}

Then I added the following code to my program to run the function in a new thread:

        std::thread t(StartScraping,amount, counter, getSettingsKeywords, getUrls);

        if (t.joinable())
        {
            t.detach();
        }

But this caused the following code to cause a bad access error:

 lxb_char_t html[content.length() + 1];

After experimenting with it, I discovered the problem was with content.length() + 1. Adding a simple number in the length of the HTML lxb_char_t fixes this but the value that should be inserted there varies.

I had a lot of problems finding a solution. In the end I found out that allocating memory was the solution to the problem. The following code fixed the problem.

 lxb_char_t *html = new lxb_char_t[content.size() + 1];

After making sure that this part was ok, I found out that a similar issue appeared somewhere else. Time to go there and fix! Now that I know the solution, it shouldn't take long.

The question remains though. What caused this issue? I was unable to find a good result googling around that might explain it. If anyone knows anything I would love to know.