Comments

You must log in or register to comment.

Anjum48 t1_j40cts5 wrote

  1. do you have a dataset? 2) how accurate do each of these outputs need to be for the task they are going to be used for? (See Zillow)
5

CuriousCesarr OP t1_j40f0i1 wrote

1). Not really. Data would have to be processed, but probably it can be introduced via a well structured form:

  • Nr. of rooms/ bathrooms/ etc.
  • Living space (square meters)
  • pictures (probably another ML to gauge how furnished and well maintained/ dilapidated the whole place is)
  • etc.

2). A great question! Zillow seems like a great example. As accurate as possible I guess. But a good starting point would be a price range.

1

Malignant-Koala t1_j40fflp wrote

This is an "idea guy" post, isn't it. ;)

Like, one of those "I have no idea how incredibly hard it would be to even gather the necessary data for this" ideas? A, "I need a quote to give to my guy in two days despite being unable to provide you with any more guidance than a vaguely worded pseudo-concept" proposal?

Best of luck dude.

9

CuriousCesarr OP t1_j40flj8 wrote

A fair question. What information would you like me to provide?

This isn't an idea guy post. But to pitch something for funding, you need a rough estimate of time and costs and some deliverable milestones.

−2

ZeroBearing t1_j40ggvt wrote

Could be done but would require about 6 months for data collection and experimentation. And I think the going rate for 6 month contracts for AI engineers in the UK is 30K, US is upwards of 60k.

Good luck bro

1

Anjum48 t1_j40gm5q wrote

Ah ok. On the first point I guess whoever you are looking for will need to spend a considerable amount of time building/finding a dataset to train a model.

On the second point, I might have incorrectly assumed you were familiar with the Zillow controversy around price prediction.

The TL;DR is that the ML team used a model to forecast prices using a tool made by Facebook called Prophet. The model was probably accurate enough for displaying a rough prediction on a website. Another team in Zillow started using these price predictions to flip houses and lost a whole bunch of money since the model was not designed to do this.

A lot of armchair data scientists quickly pointed the finger at Prophet for being a "bad" model. The reality is all models are bad if they are used for the wrong reason. In this case, the team flipping houses likely didn't listen to the data science team when they said the model shouldn't be used for that purpose.

This is why it's a good idea to know how the model outputs are going to be used. The obvious answer is always "as accurate as possible" but sometimes that might not be accurate enough...

Hope this helps!

4

CuriousCesarr OP t1_j40hegs wrote

No, I'm European so I have no idea about the Zillow debacle sadly.

The outputs would probably be used as a price evaluator for the living space (my friend works as a registrar of new/ bought homes). Honestly I think the Zillow usecase might be desired ultimately.

Would you be interested? :)

1

CVxTz t1_j40ohf4 wrote

You need a dataset of a few thousands or a few millions examples of input (documents + other contextual info like location data) and outputs ( estimates, other attributes like number of bedrooms and stuff) in order to build such feature. Depending on the quality and amount of data that you have and the perfomance requirements that you have, this can go from a few months projetcs to nearly impossible to do. (note, if you have no data like you said or expect 0 error, then this is impossible to do)

1

BitterAd9531 t1_j411ihw wrote

I'm not even convinced it's possible based on the requirements. You're not going to get structured data. Just pictures of the outside and inside of the house I assume. How are you going to reliably estimate livable space, current state, or even number of rooms when not even all rooms might be properly pictured. You're banking on extracting these features from what I assume to be suboptimal images with high accuracy (very doubtful tbh) and then estimating price based on the features, which is useless if the features aren't extracted properly from the images.

Even if this was possible with high enough accuracy, the dataset you would need for this has be absolutely huge. I really don't believe someone can gather enough in 6 months while simultaneously developing the nn.

And then we're not even talking about the legality of scraping competitors websites to compare them to.

I'm not convinced I could do this in 6 months and I wouldn't do it for that price.

2

tsgiannis t1_j412nv9 wrote

The real question is : can you get accurate data...you need to scan pics for a gazillion different kind of things and in the end provide a number..but in order for this to work you need a ton of data to provide the correct object matches

1

Legitimate_Light7143 t1_j415leh wrote

I’m pretty sure that I will have students who are willing to do this . But just how do you plan on getting the data ? Or is it something the guy you hire would have to sort out .

Also not to be pessimistic but I absolutely do not think it would be possible to make a deep learning model that predicts how many square meters a property is based on some pictures alone . This is a mammoth task .

2

CuriousCesarr OP t1_j418cwn wrote

Well, the thing is that my friend doesn't pitch ideas to people with money as a job. He's just friends with them and they go out for coffee/ dinner, sometimes they make a deal, etc.. So a "formal approach" for VC funding doesn't apply here.

Truly, a back-of-the-napkin idea won't catch anyone's eyes, that's why I'm searching for someone that can give some feasible milestones/ a timeframe and budgets for them and he will present that.

0

CuriousCesarr OP t1_j41aojj wrote

I dunno if my English is that bad or people are in a rush when reading my post but: you get the images AND information about the residence itself (nr of rooms, total living space, space of each room, a sketch of the place, etc.).

1

NamerNotLiteral t1_j41cbv0 wrote

A few hundreds is way too little. I would be comfortable with a few thousand homes' data, and more comfortable yet if I could scrape Zillow or something on top of that.

(but that has its own issues, both legally and in terms of data drift, since Zillow data would be American while you're European).

1

NamerNotLiteral t1_j41fy66 wrote

So as far as I understand the project, you want to estimate the price of real estate. There're a few ways to do this. Forget pictures for the moment, just go with listed/numeric information.

You have information like Area/Square Footage, Listed Amenities, Age, Location, etc.If you have existing data of this sort, where it lists all the above and then a price, then it is fairly straightforward to pull off – but no guarantees on the accuracy. This has been done by plenty of people, so if you just do this your investors will probably ask you about how you're going to compete with established Real Estate companies who have much bigger teams and much more data.

Now let's consider images: you have pictures of the house, and you want to use those pictures as a way to measure how broken-down/upscale the house is and use that as a parameter to base the price of. You are going to combine this with the above, of course, because it's ridiculous otherwise. I'll say this frankly – this hasn't really been done, and it's a research problem. Not a 'product problem'. You could do a whole PhD thesis on this alone. There are so many different ways to approach this.

  • You can use ML to extract furniture from the picture individually, then assign a value to each item of furniture. Aggregate that value to get how well furnished the place.
    • Massive Pitfall - How do you assign a value to a furniture? A minimalistic luxury sofa and an antique cabinet could be worth equally high. Designing this NN would be a huge challenge to start with.
    • Second Pitfall - You need labelled data. You would need a whole team manually annotating the data by looking through hundreds/thousands of furniture images and assigning a value to them.
  • You can use ML to determine the quality of the whole room. Forget individual objects, just rate the whole picture from "broken down" to "fancy af" on a scale from 1 to 10 or something.
    • Pitfall - Again, you need labelled data. You'd need a whole team going through images of rooms and marking them. And since you're applying the model into such a very abstracted and broad problem, your results are not really going to be reliable.
  • You can use ML at a more micro level. Maybe you could detect broken or damaged furniture.
    • Massive Pitfall - There is very little data available for this, and moreover detecting such issues is still an issue for state of the art models. Some research has been done, such as detecting defects in wooden surfaces and stuff, but it's still at a fairly basic level. Making an algorithm that would detect, say, a crack on a chair, a stain on a cushion, scratches on glass, etc is possible... individually, by zooming in on that thing specifically. Doing this for a whole room on low-mid resolution images would be a nightmare.

Honestly I've given you the entire business plan you're looking for here lmao. Only reason I'm comfortable doing this is because what you're imagining is not really a feasible business plan except for at the very, very basic level.

Like, if you had a team that could pull any of these off, they would be working at AirBnB, Zillow or some other major real estate company already.

If those investors are feeling particularly generous and give you several years and an 7-figure budget, then this might be worth considering. Otherwise...

2

BitterAd9531 t1_j41gjo4 wrote

Ah my bad. I think you could make it a bit more clear in your post but it's definitely on me for misunderstanding. If the information about the residence was given in the document itself then it becomes a lot more doable.

I still see quite few problems such as neighbourhood, etc. influencing the price, which means you'd need an absolutely huge dataset with very detailed features. And even then I think the accuracy will still not be optimal. Then there's still the issue with scraping competitors data from their website, which I doubt is legal.

It really depends on what this will be used for. Want to use this to recommend houses to potential buyers in a certain price range? Absolutely doable, but it seems completely overkill for an application like that. Want to use it to replace humans who's job it is to give price estimations? Probably not a good idea.

1

CuriousCesarr OP t1_j54l9iq wrote

Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)

1

CuriousCesarr OP t1_j54lapx wrote

Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)

2

CuriousCesarr OP t1_j54lden wrote

Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)

1

CuriousCesarr OP t1_j54lfq2 wrote

Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)

1